Course Information

See the course Canvas page for details about meeting times, location, instructional team, and office hours.


Course Description

This course focuses on modern statistical learning methods for prediction, with an emphasis on understanding how predictive models behave, how their performance is evaluated, and how predictions are used to support decisions under uncertainty. Students will study a range of predictive model families, including linear, tree-based, kernel, ensemble, probabilistic, time series, survival, and recommender system models, and will learn how model structure, data characteristics, and sample size influence generalization. The course emphasizes evaluation on unseen data, uncertainty in predictions and performance estimates, and the diagnosis of model and validation weaknesses. Through a combination of hands-on computation and conceptual analysis, students gain experience applying predictive modeling methods across different prediction targets and data types, while developing the judgment needed to assess when predictive results can be trusted and how they should be used in practice.


Student Learning Objectives

  • Define prediction problems and assess predictive performance on unseen data.
  • Fit, tune, and compare predictive models while accounting for sample size and variability.
  • Understand how the structure and assumptions of different model families affect generalization.
  • Quantify and interpret uncertainty in predictions and performance estimates.
  • Diagnose weaknesses in predictive models, data, and validation designs.
  • Understand how predictions are used to inform decisions under uncertainty, accounting for costs and constraints.
  • Apply predictive modeling methods across different model families, prediction targets, and data types.

Course Prerequisites

This course is only open to current residential MSDS students who have completed the fall semester of coursework.


Other Course Materials

Statistical Computing and Technical Writing

This course uses modern statistical computing and technical writing tools.

  • R http://cran.us.r-project.org is a free programming language for statistical computing, graphics, and machine learning. Most in class examples will be presented using the tidyverse dialect of R. It is recommended that you update to to the latest version (I’m using 4.5.2).

  • Python https://www.python.org/ is a general purpose programming language widely used for data science and machine learning. Students who prefer Python may use libraries such as pandas, numpy, scikit learn, and statsmodels to complete assignments, provided results and methodology are clearly documented.

  • Quarto https://quarto.org/docs/get-started/ is a free technical publishing system. We will use Quarto documents for homework and written assignments. Quarto supports both R and Python, allowing students to work in both languages while producing reproducible reports. It is recommended that you update to to the latest version (I’m using 1.8.26).

While R will be used for most examples, you can use R, Python, or both for coursework.

There is no required IDE for this course. Students are encouraged to use the development environment that best supports their workflow and preferred programming language. Be sure to update to the latest version.

  • Positron is a newer IDE designed specifically for data science workflows. It integrates R, Python, and other languages, works seamlessly with Quarto, and has an interface similar to VS Code. Positron is a good choice for students who plan to work across multiple languages or who want strong Quarto support.

  • Rstudio is a familiar and well supported IDE for R based workflows. It remains an excellent choice for students primarily working in R (and still supports Python).

  • VS Code is another good option, particularly for students who prefer Python or already use VS Code for other programming tasks. With appropriate extensions, VS Code works well with both R, Python, and Quarto.

Additional References

  • The free textbook Modern Data Science with R by Baumer, Kaplan, and Horton is an undergrad level “Intro to Data Science” course. It covers tidyverse, statistical inference, and basic intro to many of the methods we will study this semester. This would provide a good overall preparation or handy reference.

  • The free textbook Feature Engineering and Selection: A Practical Approach for Predictive Models by Kuhn and Johnson provides a more in-depth coverage of feature engineering than we will be able to do in this course.

  • The free textbook Hands-on Machine Learning with R by Boehmke and Greenwell gives R code with some helpful details for most of the methods we will cover. This can be a handy reference.

  • The free textbook Interpretable Machine Learning by Christoph Molnar is described as A Guide for Making Black Box Models Explainable and covers topics such as feature importance and how to measure the influence of a feature on the predictions (e.g., Shapley, Partial Dependence).

  • The free textbook Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin is an accessible introduction to modern (i.e., resampling based) statistical inference. If you feel you are still missing the big picture of statistical inference, this is a good place to start.

  • The free textbook Math for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon On is a good reference for the mathematical concepts helpful for machine learning. Chapters 1-7 provide a good foundation for this course.

  • The textbook Introduction to Data Mining (Second Edition) by Tan, Steinbach, Karpatne, and Kumar has three free chapters on Classification, Association Analysis, and Cluster Analysis.


Course Assessment

  • The course grade will be based on homework assignments (35%), in-class quizzes (15%), mid-term exam (20%), final exam (25%), and course participation (5%).

  • A: \(\geq\) 92%, A-: 90-91%, B+: 88-89%, B: 82-87%, B-: 80-81%, etc.

    • A+: awarded rarely for exceptional work
  • There is no grade “curving” in this course.

    • There will be no make-up homework, exams, projects, or quizzes.
    • Note: There will be no “extra credit” assignments; spend your time on the assigned work.
  • All homework assignment dates are posted in the course website and Canvas. Note these now so there are no conflicts.

  • All assignment submissions will be made through Canvas. You are given a grace period of 5 minutes for late submissions, the time stamps produced by Canvas will be the authoritative reference for all such decisions. If you have special circumstances (e.g., a documented physical condition) that prevent you from adhering to the posted deadlines, please inform me at least 1 week in advance of the deadline so that I can make arrangement to accommodate you.

  • An Incomplete grade (IN) is only used when a student has instructor approval to complete missed assignments/exams after the end of semester. A student may not request an IN grade to raise his or her grade i.e., doing extra credit work. An IN is not permitted if the student:

    • does not have a solid attendance record.
    • has not completed at least 75% of the work for the class.
    • is failing the class.
    • is in their final semester of the program.

The purpose of the homework is to help you develop fluency with the core ideas of the course by actively applying methods, interpreting results, and connecting theory to real data. Homework is intended as a learning and exploration activity rather than a high-stakes assessment.

Homework is graded on a coarse 5-point scale that emphasizes engagement with the learning process rather than correctness or polish. High scores are expected for serious attempts.

Score Meaning What this reflects
5 Complete and engaged All required parts are present. Results and explanations are coherent and clearly connected to the questions. The assignment shows a serious attempt to complete the learning task.
4 Engaged with minor issues The assignment is largely complete and coherent, but one part is underdeveloped, unclear, or only partially addressed.
3 Partial engagement The assignment was attempted, but one or more essential components are missing or not meaningfully addressed.
1–2 Minimal engagement The submission does not provide sufficient evidence of a serious attempt to complete the assignment.
0 Not submitted No submission or no meaningful work submitted.
  • The lowest two homework grades will be dropped.

  • You can discuss and work with classmates on homework assignments, but what you submit must be in your own words (and code). See Honor Code for more details.

  • Homework will be submitted as Quarto source (which will contain the code) and the compiled html.

    • Quarto will produce the html and contain the code.
    • I will provide you the quarto template with the questions and you will fill in the solutions.
    • All code must be easy to follow (e.g., by good commenting)
    • Mathematical symbols follows LaTex notation. See quarto specific latex and general latex for guidance.

In class quizzes are brief, low stakes assessments intended to reinforce recent material, encourage steady engagement with the course, and provide quick feedback on your understanding. Their primary purpose is to support learning by highlighting areas that may need further review and helping you stay aligned with the pace of the course.

  • The quizzes will be primarily based on the most recently submitted homework. Students should review the posted homework solutions prior to the quiz. The quizzes are not designed to test memorization of numbers or code, but rather to ensure that you personally have a solid understanding of the main concepts, independent of assistance from large language models or other automated tools.

  • Quizzes will typically be administered weekly. Unless otherwise announced, quizzes will be closed book, closed notes, and closed computers or phones.

  • Each quiz is worth 2 points. The lowest two quiz scores will be dropped.

Course participation is intended to encourage thoughtful engagement with the course material and support learning through reflection, questioning, and interaction. In my experience, both students and the instructor gain valuable insights through discussion. Participation recognizes consistent effort to engage with ideas presented in lecture, homework, and course activities.

Participation may take many forms, including asking questions, contributing to discussions in class or online, completing brief in-class or online activities, or engaging during office hours. Speaking in front of the class is not required.

The course includes two in-class exams designed to assess understanding of the core concepts and methods covered in the course. The exams emphasize conceptual reasoning, interpretation of results, and the ability to clearly explain ideas. Exams are intended to evaluate individual mastery of the material and to complement the homework, quizzes, and participation components of the course.

  • Exam grades will reflect the level of conceptual understanding, synthesis, and reasoning demonstrated, rather than the number of isolated errors.

Course Topics

  • The predictive modeling pipeline
  • Bias-variance trade-off, model complexity, and sample size effects
  • Estimation and evaluation of predictive performance
  • Probability modeling and classification methods
  • Calibration and predictive diagnostics and inference
  • Sampling strategies and case weighting
  • Prediction Trees and Random Forests
  • Generative classifiers
  • Support vector machines
  • Ensemble methods
  • Time series and forecasting methods
  • Survival and time-to-event modeling
  • Recommender systems and ranking problems

Course Management

  • Most course material will be available from the class webpage
  • All assignments (e.g., homeworks, quizzes, exams) will be submitted in Canvas
  • Announcements may be made in email or Canvas.
  • Course Discussion on Canvas
    • Rather than emailing questions to the teaching staff, I encourage you to post your questions here.
    • The teaching staff will always check discussions during our office hours and possibly at other times.
    • Please feel free to answer questions from other students, but use your discretion in not directly providing specific solutions to a homework problem (e.g., don’t give the code that directly answers a question).
    • Also, please post any discussion questions or material that you want input from the class and instructors.

Recording of classroom lectures

In the event that I or a large number of students are unable to attend class in person, lectures may be recorded via Zoom. Because lectures may include fellow students, you and others may be personally identifiable in these recordings. Recordings may only be used for individual or group study by students enrolled in this class during the current semester. They may not be distributed, in whole or in part, through any other platform or to any individuals outside of this class. Students may not make their own recordings of this class unless written permission has been obtained from the instructor and all participants have been informed that recording will occur. For additional details, please see Provost Policy 005.


Academic Calendar

Important dates for the semester can be found on the academic calendar: http://www.virginia.edu/registrar/calendar.html


Policy on Academic Misconduct (Honor Code)

I trust every student in this course to fully comply with all provisions of the University’s Honor Code and work together to maintain UVA’s Community of Trust. By enrolling in this course, you have agreed to abide by and uphold the Honor System of the University of Virginia, as well as the following policies specific to this course.

  • All submitted work must be pledged.
  • All work must be completed individually unless specific permissions are given on the assignment.
    • Homework and in-class exercises can be discussed with classmates, but the final write-up, code, and solutions must be your own. List the names of who you worked with (like a citation).
    • The individual homework sets must be done completely on your own. You are not to discuss exams with anyone except the teaching staff.
    • You are not permitted to copy code. You may use the internet and LLMs to help understand the concept or process, but code in your own words.
  • It is a scholarly responsibility to attribute all your work. This includes figures, code, ideas, etc. Think of it this way: Will someone who reads your submission think that it is your original idea, figure, code, etc? Add a link and/or reference to all sources you used to solve a problem.
  • It is not always easy to tell what qualifies as a violation, so do not be afraid to talk to me about it. Such discussions do not imply guilt of any kind.
  • All suspected violations will be forwarded to the Honor Committee, and you may, at my discretion, receive an immediate zero on that assignment regardless of any action taken by the Honor Committee.

Please let me know if you have any questions regarding the course Honor policy. If you believe you may have committed an Honor Offense, you may wish to file a Conscientious Retraction by calling the Honor Offices at (434) 924-7602. For your retraction to be considered valid, it must, among other things, be filed with the Honor Committee before you are aware that the act in question has come under suspicion by anyone. More information can be found at http://honor.virginia.edu. Your Honor representatives can be found at: http://honor.virginia.edu/representatives.


SDS Guidelines on AI Tools and Assistance

The use of generative AI tools and foundation models, (i.e. ChatGPT GPT, DALL-E, Stable Diffusion, Midjourney, GitHub Copilot, and similar tools) is permitted with the following activities in accordance with the stated guidelines at no penalty:

  • Brainstorming and refining your ideas;
  • Fine tuning your research questions;
  • Finding information on your topic;
  • Drafting an outline to organize your thoughts; and
  • Checking grammar and style.

Students are responsible for

  • Acknowledging that large language models tend to produce incorrect facts and fake citations.
  • Acknowledging that code generation models may produce inaccurate outputs.
  • Acknowledging that image generation models can occasionally come up with highly offensive products.
  • Taking responsibility for any inaccurate, biased, offensive, or otherwise unethical content submitted, regardless of the origin (i.e. student-generated or from a foundation model).
  • Properly citing the contribution of the foundation model or other AI tools in submitted material.
  • The entirety of any information they submit, based on an AI query or AI assistance.

The use of generative AI tools is NOT permitted for the following activities:

  • Impersonating students in classroom contexts, such as by using the tool to compose discussion board prompts or content entered into a Zoom chat.
  • Completing group work assigned to a student, unless it is mutually agreed upon that they may utilize the tool.
  • Writing a draft of a writing assignment.
  • Writing entire sentences, paragraphs or papers to complete class assignments.

Students may be penalized for

  • Using a foundation model without including an acknowledgement.
  • Improperly citing the use of work by other human beings or the submission of work by other human beings as that of the student.
  • Violating intellectual property laws.
  • Submitting materials containing misinformation or unethical content.

The usage of AI tools must be properly cited to stay within university policies on academic honesty. Failure to adhere to these guidelines will result in a failing grade on the assignment or exam (a zero) and may be an honor code violation depending on the context (to be determined at the instructor’s discretion). Having said all these disclaimers, the use of foundation models is encouraged, as it may make it possible for you to submit assignments with higher quality, in less time.


School of Data Science Support and Policies

Office of Student Support

The Office of Student Support is here to support your academic journey toward success. Their office provides resources to help you be successful and engaged. If you need support resources related to student success or personal well-being, please reach out to the Office of Student Support directly at: .

Kylen Baskerville, your Graduate Program Manager, is your go-to resource for academic advising and providing personal support when you need it. Connect with Kylen for questions around course enrollment and choosing your electives, getting approval for an internship, or if you’re feeling behind in your courses or having trouble balancing your academic and personal life.

The Data Science Student Portal has resources and information about career, academic support, engagement opportunities, funding, and more.

Career Services

The School of Data Science Career Services Team provides a wealth of opportunities for you to learn, connect, and grow. These offerings are adapted to the needs of the cohort or the year and may take different forms as needs change.

Complete your profile and career interests, explore the Data Analytics Resource Card or make an appointment with Career Services at the School of Data Science by using your UVA Email to Login to Handshake.

Graduate Record

Visit the School of Data Science Graduate Record for policies and information about academic regulations, academic standing, financial assistance, and grades.


Univeristy Policies

Course Evaluations

Student feedback is critical to the school, the instructor, and future students. Students are expected to complete anonymous and confidential course evaluations in a timely manner for each course at the end of each term.

Discrimination/Harassment/Retaliation

UVA prohibits discrimination and harassment based on age, color, disability, family medical or genetic information, gender identity or expression, marital status, military status (which includes active duty service members, reserve service members, and dependents), national or ethnic origin, political affiliation, pregnancy (including childbirth and related conditions), race, religion, sex, sexual orientation, veteran status. See https://uvapolicy.virginia.edu/policy/HRM-009 for more details. UVA policy also prohibits retaliation. All faculty and TAs are also responsible employees for disclosures or reports of potential discrimination, harassment, and retaliation.

Disability and Pregnancy Accommodations

If you anticipate or experience any barriers to learning in this course, please discuss your concerns with me. If you have a disability, or think you may have a disability, contact the Student Disability Access Center (SDAC) to request reasonable accommodation(s) for this course. If you have accommodations through SDAC, send me your Faculty Notification Letter as soon as possible and meet with me so we can develop an implementation plan together.

Students may be entitled to reasonable accommodations for pregnancy, childbirth, or related medical issues. Please contact SDAC for additional information. Pregnant and parenting students are encouraged to contact SDAC or EOCR to discuss plans and ensure ongoing access to their academic courses and program. Information for pregnant and parenting students is also available on EOCR’s Pregnancy and Parenting Resources webpage.

Religious Academic Accommodations

UVA also provides reasonable accommodations when a student’s sincerely held religious beliefs or observances conflict with academic requirements. Students who wish to request an academic accommodation for a religious observance should submit their request to me by email as far in advance as possible. If you have questions or concerns about your request, you can contact the University’s Office for Equal Opportunity and Civil Rights (EOCR) https://eocr.virginia.edu/accommodations-religious-observance. Accommodations do not relieve you of the responsibility for completion of any part of the coursework you miss as the result of a religious observance.


Student Mental Health and Wellbeing

The University of Virginia is committed to advancing the mental health and wellbeing of its students, while acknowledging that a variety of issues directly impact your academic performance. Residential MSDS students may access the School of Data Science Embedded Psychotherapist Beth Holt Wright, LCSW by:

Scheduling online through the Healthy Hoos Portal, emailing mdj7wf@virginia.edu, or calling CAPS at 434-243-5150. Notify the receptionist that you are enrolled in the School of Data Science.

If you need expedited / urgent support, consider a walk-in appointment at the main Counseling and Psychological Services (CAPS) location on the 4th floor of the Student Health and Wellness Center.

If you or someone you know is feeling overwhelmed, depressed, and/or in need of support, contact the CAPS Care Managers at CAPSCareMgrs@virginia.edu.

For help finding a community therapist, visit the CAPS Community Referrals page.