Student application – NOTE: Enrollment codes are in the form.

Class is held Wednesdays 7 – 9pm, VLSB 2030 . Class begins 2/12. If you’re interested: please come, or email us if you cannot make it!

Please email decal (at) db.berkeley.edu with any questions!

**STAT 198: Data DeCal**

**Term**: Spring 2020

**Units: **2, P/NP

**Course Prerequisites**: Data 100, Stat 133, or equivalent

**Course Description: **The Data DeCal provides the skills and mentorship required to bridge the gap between class projects and personal projects. Over the course of the semester, students will learn research methods and complete a series of labs, culminating in a final project exploring an area of their interest.

**Methods of Instruction**: Weekly — 2 consecutive hours (1 hour of lecture, 1 hour of lab)

**Course Control Number: **TBD

**Location and Meeting Times: **TBD

**Instructor and Teaching Assistant Office Hours: **TBD

**Textbooks and Required Materials: **N/A

**Instructor of Record: **Will Fithian

**Facilitators: **Madeleine Liu, Alan Pham

**Enrollment Guidelines**: A preliminary application will be available at the start of Spring 2020. Students who attend the first two weeks of class will be given the Course Control Number to officially enroll in the course.

**Weekly Schedule**

Week 3 – Feb 12 | Lecture: Data Set 1, 2Lab: Data Set 3 |

Week 4 – Feb 19 | Lecture: Data Set 4Lab: Data Set 5 |

Week 5 – Feb 26 | Submit Final Project ProposalClass: Personalized Mentoring Sessions |

Week 6 – Mar 4 | Lecture: Multiple Linear Regression Lab: Activity |

Week 7 – Mar 11 | Midterm ExaminationLab: Peer Mentoring |

Week 8 – Mar 18 | Lecture: Regression Discontinuity (Will Fithian)Lab: Personalized Mentoring Sessions |

Week 9 – Mar 25 | Final Project Checkpoint 1Lecture: Guest SpeakerLab: Personalized Mentoring Sessions |

Week 10 – Apr 1 | Lecture: Applications of Data Science for BusinessLab: Personalized Mentoring Sessions |

Week 11 – Apr 8 | Final Project Checkpoint 2Lecture: Final Project MentorshipLab: Personalized Mentoring Sessions |

Week 12 – Apr 15 | Lecture: Cross-ValidationLab: Personalized Mentoring Sessions |

Week 13 – Apr 22 | Lecture: Applications of Data Science Beyond STEMLab: Final Project Symposium |

**Evaluation of Student Performance**

2 Units; Pass/No Pass; 70% Needed to Pass

15% Attendance

15% Guest Lecture Attendance

10% Midterm Examination

10% Final Project Proposal

10% Final Project Checkpoint 1

10% Final Project Checkpoint 2

30% Final Project

**Key Learning Outcomes**

- Learn about data science research methods.
- Learn how to build a data science project from scratch.
- Produce a project that applies data science to a particular domain, connecting data to real world problems.

**How does the course fit within the program of study?**

- Bridges the gap between data science in the classroom and data science in the real world.
- Encourages students to apply data science skills to real world inquiries.
- Showcases the utility of data science applications in wide-ranging fields.

**Discussion Activities**

- Consulting (Secret Plan)
- In groups of 5, students work together to present their ideas and solutions on a data science case.

- Community-Enriched Learning Activities (Smoke Bomb)
- Using a dataset, students will make predictions using multilabel classification.

- Survey and Statistical Inference Activities (Misguided Missile)
- With data obtained from survey answers, students will use classification to predict similarities in features.

- Competitive (MindSpike)
- Begin with examining a dataset that is used in an analysis or journal. Students split into teams to compete to come up with the best interpretation of a dataset given a guiding question. At the end, students review the journal or article to see how they approached answering the question.
- Some competition rules:
- Each team is given a set of cards that they must award to other teams when an insightful comment or analysis is heard.

**Interactive Project Activities**

- 30 mins of student-directed discussion among their groups; 30 mins of discussion with students group + mentor.
- Break out into groups of 3 students. 1 person acts as the client while the other 2 act as consultants. The client presents their work and:
- Proposes a guiding question for presentees to discuss throughout the session, and with consultants, come up with feasible solutions by the end.
- Shares a Jupyter notebook with code for review.
- Explains thought process, rationale for design choices (for pipeline, direction of exploration, model, etc) through a story.

- Provide students with a questions and feedback sheet.

**Case Studies**

**Reading List**

*The Data Science Landscape*

*Research Projects*

House Hunting – the Data Scientist Way

**Proposal Due**

Students will submit a plan which:

- If the dataset was not presented in class, includes:
- Include a description of the dataset.
- Limitations of the dataset.
- For example, coverage/representativity (selected or..) and other factors which will be covered in lecture.

- Answers the following question:
- Questions
- Dataset
- Is the dataset clean? What data cleaning will be required?
- Do you need auxiliary data?
- State the data that you would need to collect to adequately address the problem.

- Problem statement
- State the problem you would like to answer with data science. Why is this problem important?
- State the segments of the population that this problem affects.
- What skills or topics from the course will you use to complete this project? What language will you use to complete this project?
- State the assumptions you have going into this project. (What is your null hypothesis, what do you believe about the topic, what do you predict about the outcome?)
- What are the limitations of your project?

- Dataset

- Questions
- Come up with concrete questions to ask:
- Reach questions (plausibly could answer; ex. show causality)
- Fallback questions (confident you an do; ex. visualizations)
- Why does the data bear on your questions? What part of the data is useful?
- Set at least 3 intermediate goals to meet over the course of the project.

- Show some preliminary progress
- Data set downloaded
- Some plots were made
- Some cleaning

**Final Project Symposium**

- Half of the class presents their projects on their laptops.
- Half of the class walks around with a form through which they can vote for projects (most interesting, most creative, most socially beneficial).