## Data Decal

You’re welcome to contact our instructors at: `decal@db.berkeley.edu`

## Team

#### Facilitators

Madeleine Liu

Alan Pham

#### Instructors

Vinay Maruri

Sean Furuta

#### Curriculum Developers (to be updated)

Alyssa L.

Samantha T.

Anthony L.

Milan B.

## Syllabus

**Term**: Fall 2019

**Units: **2, P/NP

**Course Prerequisites**: Data 100, Stat 133, or equivalent

**Course Description: **The Data DeCal provides the skills and mentorship required to bridge the gap between class projects and personal projects. Over the course of the semester, students will learn research methods and complete a series of labs, culminating in a final project exploring an area of their interest.

**Methods of Instruction**: Weekly — 2 consecutive hours (1 hour of lecture, 1 hour of lab)

**Course Control Number: **TBD

**Location and Meeting Times: **TBD

**Instructor and Teaching Assistant Office Hours: **TBD

**Textbooks and Required Materials: **N/A

**Instructor of Record: **Will Fithian

**Facilitators: **Madeleine Liu, Alan Pham

**Enrollment Guidelines**: A preliminary application will be available at the start of Fall 2019. Students who attend the first two weeks of class will be given the Course Control Number to officially enroll in the course.

**Weekly Schedule**

Week 2 |
Lecture: Introduction to Independent Data Science Projects, Data CollectionLab: Activity |

Week 3 |
Lecture: Current Research and MethodsLab: Review:Introduction (Jupyterhub, Importing Libraries, Reading CSV Files, Data Collection and Cleaning) |

Week 4 |
Lecture: ActivityLab: Pandas Review |

Week 5 |
Lecture: Web ScrapingLab: Personalized Mentoring Sessions |

Week 6 |
Submit Final Project ProposalLecture: Data Visualization, Multiple Linear RegressionLab: Activity |

Week 7 |
Lecture: Midterm Examination |

Week 8 |
Lecture: ActivityLab: Logistic Regression Review |

Week 9 |
Final Project Checkpoint 1Lecture: Case Study — Regression Discontinuity (Will Fithian)Lab: Activity |

Week 10 |
Lecture: Applications of Data Science for BusinessLab: Personalized Mentoring Sessions |

Week 11 |
Final Project Checkpoint 2Lecture: Final Project MentorshipLab: Personalized Mentoring Sessions |

Week 12 |
Lecture: Cross-ValidationLab: Personalized Mentoring Sessions |

Week 13 |
Lecture: Applications of Data Science Beyond STEMLab: Final Project Symposium |

**Evaluation of Student Performance**

2 Units; Pass/No Pass; 70% Needed to Pass

20% Attendance

15% Lab

10% Midterm Examination

5% Final Project Proposal

10% Final Project Checkpoint 1

10% Final Project Checkpoint 2

30% Final Project

**Key Learning Outcomes**

- Learn about data science research methods.
- Learn how to build a data science project from scratch.
- Produce a project that applies data science to a particular domain, connecting data to real world problems.

**How does the course fit within the program of study?**

- Bridges the gap between data science in the classroom and data science in the real world.
- Encourages students to apply data science skills to real world inquiries.
- Showcases the utility of data science applications in wide-ranging fields.

**Discussion Activities**

- Consulting (Secret Plan)
- In groups of 5, students work together to present their ideas and solutions on a data science case.

- Community-Enriched Learning Activities (Smoke Bomb)
- Using a dataset, students will make predictions using multilabel classification.

- Survey and Statistical Inference Activities (Misguided Missile)
- With data obtained from survey answers, students will use classification to predict similarities in features.

- Competitive (MindSpike)
- Begin with examining a dataset that is used in an analysis or journal. Students split into teams to compete to come up with the best interpretation of a dataset given a guiding question. At the end, students review the journal or article to see how they approached answering the question.
- Some competition rules:
- Each team is given a set of cards that they must award to other teams when an insightful comment or analysis is heard.

**Interactive Project Activities**

- 30 mins of student-directed discussion among their groups; 30 mins of discussion with students group + mentor.
- Break out into groups of 3 students. 1 person acts as the client while the other 2 act as consultants. The client presents their work and:
- Proposes a guiding question for presentees to discuss throughout the session, and with consultants, come up with feasible solutions by the end.
- Shares a Jupyter notebook with code for review.
- Explains thought process, rationale for design choices (for pipeline, direction of exploration, model, etc) through a story.

- Provide students with a questions and feedback sheet.

**Case Studies**

**Reading List**

*The Data Science Landscape*

*Research Projects*

House Hunting – the Data Scientist Way

**Final Project Proposal Guidelines**

- State the problem you would like to answer with data science. Why is this problem important?
- State the segments of the population that this problem affects.
- State the data that you would need to collect to adequately address the problem.
- What skills or topics from the course will you use to complete this project?
- State the assumptions you have going into this project. (What is your null hypothesis, what do you believe about the topic, what do you predict about the outcome?)
- Set at least 3 intermediate goals to meet over the course of the project.
- What are the limitations of your project?

**Final Project Symposium**

- Half of the class presents their projects on their laptops.
- Half of the class walks around with a form through which they can vote for projects (most interesting, most creative, most socially beneficial).