Data Decal Fall 2019

Data Decal

You’re welcome to contact our instructors at:



Madeleine Liu


Alan Pham


Vinay Maruri


Sean Furuta at his best

Sean Furuta

Curriculum Developers (to be updated)

Alyssa L.
Samantha T.
Anthony L.
Milan B.


STAT 198: Data DeCal

Term: Fall 2019

Units: 2, P/NP

Course Prerequisites: Data 100, Stat 133, or equivalent

Course Description: The Data DeCal provides the skills and mentorship required to bridge the gap between class projects and personal projects. Over the course of the semester, students will learn research methods and complete a series of labs, culminating in a final project exploring an area of their interest.

Methods of Instruction: Weekly — 2 consecutive hours (1 hour of lecture, 1 hour of lab)

Course Control Number: TBD

Location and Meeting Times: TBD

Instructor and Teaching Assistant Office Hours: TBD

Textbooks and Required Materials: N/A

Instructor of Record: Will Fithian

Facilitators: Madeleine Liu, Alan Pham

Enrollment Guidelines: A preliminary application will be available at the start of Fall 2019. Students who attend the first two weeks of class will be given the Course Control Number to officially enroll in the course.

Weekly Schedule

Week 2 Lecture: Introduction to Independent Data Science Projects, Data Collection
Lab: Activity
Week 3 Lecture: Current Research and Methods
Lab: Review:Introduction (Jupyterhub, Importing Libraries, Reading CSV Files, Data Collection and Cleaning)
Week 4 Lecture: Activity
Lab: Pandas Review
Week 5 Lecture: Web Scraping
Lab: Personalized Mentoring Sessions
Week 6 Submit Final Project ProposalLecture: Data Visualization, Multiple Linear Regression
Lab: Activity
Week 7 Lecture: Midterm Examination
Week 8 Lecture: Activity
Lab: Logistic Regression Review
Week 9 Final Project Checkpoint 1Lecture: Case Study — Regression Discontinuity (Will Fithian)
Lab: Activity
Week 10 Lecture: Applications of Data Science for Business
Lab: Personalized Mentoring Sessions
Week 11 Final Project Checkpoint 2Lecture: Final Project Mentorship
Lab: Personalized Mentoring Sessions
Week 12 Lecture: Cross-Validation
Lab: Personalized Mentoring Sessions
Week 13 Lecture: Applications of Data Science Beyond STEM
Lab: Final Project Symposium 


Evaluation of Student Performance

2 Units; Pass/No Pass; 70% Needed to Pass

20% Attendance

15% Lab

10% Midterm Examination

5% Final Project Proposal

10% Final Project Checkpoint 1

10% Final Project Checkpoint 2

30% Final Project

Key Learning Outcomes

  • Learn about data science research methods.
  • Learn how to build a data science project from scratch.
  • Produce a project that applies data science to a particular domain, connecting data to real world problems.

How does the course fit within the program of study?

  • Bridges the gap between data science in the classroom and data science in the real world.
  • Encourages students to apply data science skills to real world inquiries.
  • Showcases the utility of data science applications in wide-ranging fields.


Discussion Activities

  • Consulting (Secret Plan)
    • In groups of 5, students work together to present their ideas and solutions on a data science case.
  • Community-Enriched Learning Activities (Smoke Bomb)
    • Using a dataset, students will make predictions using multilabel classification.
  • Survey and Statistical Inference Activities (Misguided Missile)
    • With data obtained from survey answers, students will use classification to predict similarities in features.
  • Competitive (MindSpike)
    • Begin with examining a dataset that is used in an analysis or journal. Students split into teams to compete to come up with the best interpretation of a dataset given a guiding question. At the end, students review the journal or article to see how they approached answering the question.
    • Some competition rules:
      • Each team is given a set of cards that they must award to other teams when an insightful comment or analysis is heard.

Interactive Project Activities

  • 30 mins of student-directed discussion among their groups; 30 mins of discussion with students group + mentor.
  • Break out into groups of 3 students. 1 person acts as the client while the other 2 act as consultants. The client presents their work and:
    • Proposes a guiding question for presentees to discuss throughout the session, and with consultants, come up with feasible solutions by the end. 
    • Shares a Jupyter notebook with code for review.
    • Explains thought process, rationale for design choices (for pipeline, direction of exploration, model, etc) through a story.
  • Provide students with a questions and feedback sheet.

Case Studies

Candy Power Ranking

Flying Etiquette

Correlations and Confidence

Reading List

The Data Science Landscape


Data Scientist

Team Data Science Process

Research Projects

Next America

Research Themes


House Hunting – the Data Scientist Way

Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity

Final Project Proposal Guidelines

  1. State the problem you would like to answer with data science. Why is this problem important?
  2. State the segments of the population that this problem affects.
  3. State the data that you would need to collect to adequately address the problem.
  4. What skills or topics from the course will you use to complete this project?
  5. State the assumptions you have going into this project. (What is your null hypothesis, what do you believe about the topic, what do you predict about the outcome?)
  6. Set at least 3 intermediate goals to meet over the course of the project.
  7. What are the limitations of your project?

Final Project Symposium

  • Half of the class presents their projects on their laptops.
  • Half of the class walks around with a form through which they can vote for projects (most interesting, most creative, most socially beneficial).