Data Decal Spring 2020

Staff application

Student application – NOTE: Enrollment codes are in the form.

Class is held Wednesdays 7 – 9pm, VLSB 2030 . Class begins 2/12. If you’re interested: please come, or email us if you cannot make it!

Please email decal (at) with any questions!

STAT 198: Data DeCal

Term: Spring 2020

Units: 2, P/NP

Course Prerequisites: Data 100, Stat 133, or equivalent

Course Description: The Data DeCal provides the skills and mentorship required to bridge the gap between class projects and personal projects. Over the course of the semester, students will learn research methods and complete a series of labs, culminating in a final project exploring an area of their interest.

Methods of Instruction: Weekly — 2 consecutive hours (1 hour of lecture, 1 hour of lab)

Course Control Number: TBD

Location and Meeting Times: TBD

Instructor and Teaching Assistant Office Hours: TBD

Textbooks and Required Materials: N/A

Instructor of Record: Will Fithian

Facilitators: Madeleine Liu, Alan Pham

Enrollment Guidelines: A preliminary application will be available at the start of Spring 2020. Students who attend the first two weeks of class will be given the Course Control Number to officially enroll in the course.

Weekly Schedule

Week 3 – Feb 12Lecture: Data Set 1, 2Lab: Data Set 3
Week 4 – Feb 19Lecture: Data Set 4Lab: Data Set 5
Week 5 – Feb 26Submit Final Project ProposalClass: Personalized Mentoring Sessions
Week 6 – Mar 4Lecture: Multiple Linear Regression Lab: Activity
Week 7 – Mar 11Midterm Examination
Lab: Peer Mentoring
Week 8 – Mar 18Lecture: Regression Discontinuity (Will Fithian)Lab: Personalized Mentoring Sessions
Week 9 – Mar 25Final Project Checkpoint 1Lecture: Guest SpeakerLab: Personalized Mentoring Sessions
Week 10 – Apr 1Lecture: Applications of Data Science for BusinessLab: Personalized Mentoring Sessions
Week 11 – Apr 8Final Project Checkpoint 2Lecture: Final Project MentorshipLab: Personalized Mentoring Sessions
Week 12 – Apr 15Lecture: Cross-ValidationLab: Personalized Mentoring Sessions
Week 13 – Apr 22Lecture: Applications of Data Science Beyond STEMLab: Final Project Symposium 

Evaluation of Student Performance

2 Units; Pass/No Pass; 70% Needed to Pass

15% Attendance

15% Guest Lecture Attendance

10% Midterm Examination

10% Final Project Proposal

10% Final Project Checkpoint 1

10% Final Project Checkpoint 2

30% Final Project

Key Learning Outcomes

  • Learn about data science research methods.
  • Learn how to build a data science project from scratch.
  • Produce a project that applies data science to a particular domain, connecting data to real world problems.

How does the course fit within the program of study?

  • Bridges the gap between data science in the classroom and data science in the real world.
  • Encourages students to apply data science skills to real world inquiries.
  • Showcases the utility of data science applications in wide-ranging fields.

Discussion Activities

  • Consulting (Secret Plan)
    • In groups of 5, students work together to present their ideas and solutions on a data science case.
  • Community-Enriched Learning Activities (Smoke Bomb)
    • Using a dataset, students will make predictions using multilabel classification.
  • Survey and Statistical Inference Activities (Misguided Missile)
    • With data obtained from survey answers, students will use classification to predict similarities in features.
  • Competitive (MindSpike)
    • Begin with examining a dataset that is used in an analysis or journal. Students split into teams to compete to come up with the best interpretation of a dataset given a guiding question. At the end, students review the journal or article to see how they approached answering the question.
    • Some competition rules:
      • Each team is given a set of cards that they must award to other teams when an insightful comment or analysis is heard.

Interactive Project Activities

  • 30 mins of student-directed discussion among their groups; 30 mins of discussion with students group + mentor.
  • Break out into groups of 3 students. 1 person acts as the client while the other 2 act as consultants. The client presents their work and:
    • Proposes a guiding question for presentees to discuss throughout the session, and with consultants, come up with feasible solutions by the end. 
    • Shares a Jupyter notebook with code for review.
    • Explains thought process, rationale for design choices (for pipeline, direction of exploration, model, etc) through a story.
  • Provide students with a questions and feedback sheet.

Case Studies

Candy Power Ranking

Flying Etiquette

Correlations and Confidence

Reading List

The Data Science Landscape


Data Scientist

Team Data Science Process

Research Projects

Next America

Research Themes


House Hunting – the Data Scientist Way

Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity

Proposal Due

Students will submit a plan which:

  • If the dataset was not presented in class, includes:
    • Include a description of the dataset.
    • Limitations of the dataset.
      • For example, coverage/representativity (selected or..) and other factors which will be covered in lecture.
  • Answers the following question:
    • Questions
      • Dataset
        • Is the dataset clean? What data cleaning will be required?
        • Do you need auxiliary data?
        • State the data that you would need to collect to adequately address the problem.
      • Problem statement
        • State the problem you would like to answer with data science. Why is this problem important?
        • State the segments of the population that this problem affects.
        • What skills or topics from the course will you use to complete this project? What language will you use to complete this project?
        • State the assumptions you have going into this project. (What is your null hypothesis, what do you believe about the topic, what do you predict about the outcome?)
        • What are the limitations of your project?
  • Come up with concrete questions to ask:
    • Reach questions (plausibly could answer; ex. show causality)
    • Fallback questions (confident you an do; ex. visualizations)
    • Why does the data bear on your questions? What part of the data is useful?
    • Set at least 3 intermediate goals to meet over the course of the project.
  • Show some preliminary progress
    • Data set downloaded
    • Some plots were made
    • Some cleaning

Final Project Symposium

  • Half of the class presents their projects on their laptops.
  • Half of the class walks around with a form through which they can vote for projects (most interesting, most creative, most socially beneficial).