Course Info | |
---|---|
Class Time: | Lectures posted Mon, Wed (Asynchronous) |
Class Location: | Online |
Course Canvas site: | https://canvas.its.virginia.edu/courses/92132 |
Course Teams site: | SYS 6018 Teams (find join code on Canvas) |
Instructor | Dr. Michael D. Porter |
---|---|
Email: | mdp2u {at} virginia.edu |
Office: | TBD and Zoom |
Office Hours: | Mondays 1:45 - 2:45pm (and by appt.) |
TA | Hossein Kaviani |
---|---|
Email: | hk3sku {at} virginia.edu |
Office Hours: | TBD |
The course lectures will be delivered asynchronously. Recorded lectures will be posted in Canvas on Mondays and Wednesdays.
Students taking this course should have prior knowledge in linear regression analysis (e.g., SYS/STAT 4021/6021, STAT 5120), statistical inference (e.g., APMA 3120), and linear algebra (e.g., APMA 3080). Students should also have a basic working knowledge in a scientific programming language (e.g., R, Python, Matlab). The R-based SYS-2202 provides sufficient background. All course examples will be in R (tidyverse dialect).
Fundamentals of data mining and machine learning within a common statistical framework. Topics include regression, classification, clustering, resampling, regularization, tree-based methods, ensembles, boosting, and Support Vector Machines. Coursework is conducted in the R programming language.
Students will learn how and when to use common data mining and statistical learning methods, understand their comparative strengths and weaknesses, and how to critically evaluate their performance. Students completing this course should be able to: (i) construct and apply novel statistical learning methods for predictive modeling, (ii) use unsupervised learning methods to find structure in data, (iii) properly select, tune, and evaluate models.
This course requires the use of the following statistical and typesetting software:
Quarto (https://quarto.org/docs/get-started/) free technical publishing system that replaces RMarkdown. We will use quarto documents for homework. Version 1.3.450 or higher is required.
Other course material and reading assignment will come from instructor notes and recent journal articles.
The free textbook Modern Data Science with R by Baumer, Kaplan, and Horton is an undergrad level “Intro to Data Science” course. It covers tidyverse, statistical inference, and basic intro to many of the methods we will study this semester. This would provide a good overall preparation or handy reference.
The free textbook Feature Engineering and Selection: A Practical Approach for Predictive Models by Kuhn and Johnson provides a more in-depth coverage of feature engineering than we will be able to do in this course.
The free textbook Hands-on Machine Learning with R by Boehmke and Greenwell gives R code with some helpful details for most of the methods we will cover. This can be a handy reference.
The free textbook Interpretable Machine Learning by Christoph Molnar is described as A Guide for Making Black Box Models Explainable and covers topics such as feature importance and how to measure the influence of a feature on the predictions (e.g., Shapley, Partial Dependence).
The free textbook Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin is an accessible introduction to modern (i.e., resampling based) statistical inference. If you feel you are still missing the big picture of statistical inference, this is a good place to start.
The free textbook Math for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon On is a good reference for the mathematical concepts helpful for machine learning. Chapters 1-7 provide a good foundation for this course.
The free textbook Forecasting: Principles and Practice 3e by Rob J Hyndman and George Athanasopoulos provides a great introduction to time series data and forecasting.
The course grade will be based on ten homework assignments (70%), reading quizzes (10%), and a final project (20%).
A: >91%, A-: 90-91%, B+: 88-89%, B: 82-87%, B-: 80-81%, etc.
There is no grade “curving” in this course.
All homework assignment dates are posted in the Class Schedule (on the course website). Note these now so there are no conflicts.
All assignment submissions will be made through Canvas. You are given a grace period of 5 minutes for late submissions, the time stamps produced by Canvas will be the authoritative reference for all such decisions. If you have special circumstances (e.g., a documented physical condition) that prevent you from adhering to the posted deadlines, please inform me at least 1 week in advance of the deadline so that I can make arrangement to accommodate you.
min(HW total, 475)/475
allowing
you to effectively drop low scoring submissions. Another way to view
this policy is that receiving a 95% is full credit.
There will be around 24 pre-class reading quizzes (due before the
start of class) each worth 1 point. Your quiz percentage will be
min(Quiz Total, 20)/20
.
The pre-class quizzes are to encourage you to prepare for the lectures.
Quizzes will completed in Canvas/Quizzes.
The objective of the final project is gain experience implementing a data mining / statistical learning pipeline. You will apply the concepts and methodologies we covered in class to extract actionable insights and knowledge from data.
You need to 1) find a problem or task you’d like to solve or understand; 2) find data; 3) use method covered in this class or related to help you solve the problem.
Deliverables:
Students will work in teams of 2.
The deliverables will be in two parts: an recorded presentation and a written component.
Written component considerations
Presentation component considerations
I will be recording every lecture to accommodate students who will be learning remotely. Because lectures may include fellow students, you and they may be personally identifiable on the recordings. These recordings may only be used for the purpose of individual or group study with other students enrolled in this class during this semester. You may not distribute them in whole or in part through any other platform or to any persons outside of this class, nor may you make your own recordings of this class unless written permission has been obtained from the Instructor and all participants in the class have been informed that recording will occur. If you want additional details on this, please see Provost Policy 005.
Important dates for the semester can be found on the academic calendar: http://www.virginia.edu/registrar/calendar.html
I trust every student in this course to fully comply with all provisions of the University’s Honor Code and work together to maintain UVA’s Community of Trust. By enrolling in this course, you have agreed to abide by and uphold the Honor System of the University of Virginia, as well as the following policies specific to this course.
Please let me know if you have any questions regarding the course Honor policy. If you believe you may have committed an Honor Offense, you may wish to file a Conscientious Retraction by calling the Honor Offices at (434) 924-7602. For your retraction to be considered valid, it must, among other things, be filed with the Honor Committee before you are aware that the act in question has come under suspicion by anyone. More information can be found at http://honor.virginia.edu. Your Honor representatives can be found at: http://honor.virginia.edu/representatives.
Generative AI (GenAI), like ChatGPT, is new disruptive technology that has the potential to fundamentally change how we learn, code, and do data science. However, there is little guidance on when and how to use GenAI for learning. As such, I don’t feel very confident in recommending or restricting its use. Therefore, there are no Generative AI restrictions in this course. However, be sure to follow the honor policy as stated above. You cannot copy code and must attribute and detail if and how you used GenAI in the assignments.
GenAI tools can be an especially great resource for troubleshooting and improving code. However, they can also limit your ability to learn good coding if you become too dependent. I do not think GenAI is currently reliable enough to trust for conceptual understanding. I still recommend the assigned reading and references found in the course notes for additional learning resources. If GenAI hallucinates in producing code, you will be able to see right away that it does produced the desired result. However, if it hallucinates about how a model works or perpetuates common misconceptions on methodology you may not know about it for a long time.
The University of Virginia strives to provide accessibility to all students. If you require an accommodation to fully access this course, please contact the Student Disability Access Center (SDAC) at (434) 243-5180 or sdac@virginia.edu. If you are unsure if you require an accommodation, or to learn more about their services, you may contact the SDAC at the number above or by visiting their website at http://studenthealth.virginia.edu/student-disability-access-center/faculty-staff.
The University of Virginia and SEAS serve as a safe space for students and aims to promote your well-being. If you are feeling overwhelmed, stressed, or isolated, there are many individuals here who are ready and wanting to help. If you wish, you can make an appointment with me to discuss in private. Alternatively, the Student Health Center offers Counseling and Psychological Services (CAPS) https://www.studenthealth.virginia.edu/caps. If you prefer to speak anonymously and confidentially over the phone, call Madison House’s HELP Line 24/7 at434-295-8255 https://www.madisonhouse.org/overview-helpline/. Engineering undergraduates are supported through an array of student support services including peer-to-peer tutoring, professional academic coaching, access to mental health support, and dedicated advising. Graduate Engineering students can find similar student support resources. If you are in another school, you can contact the above Engineering resources and they will help direct you to the appropriate resources.
If you or someone you know is struggling with gender, sexual, or domestic violence, there are many community and University of Virginia resources available. The Office of the Dean of Students, Sexual Assault Resource Agency (SARA), and UVA Women’s Center are ready and eager to help. Contact the Director of Sexual and Domestic Violence Services at 434-982-2774.
The University of Virginia is dedicated to providing a safe and equitable learning environment for all students. To that end, it is vital that you know two values that I and the University hold as critically important:
If you or someone you know has been affected by power-based personal violence, more information can be found on the UVA Sexual Violence website that describes reporting options and resources available <www.virginia.edu/sexualviolence>. As your professor and as a person, know that I care about you and your well-being and stand ready to provide support and resources as I can. As a faculty member, I am a responsible employee, which means that I am required by University policy and federal law to report what you tell me to the University’s Title IX Coordinator. The Title IX Coordinator’s job is to ensure that the reporting student receives the resources and support that they need, while also reviewing the information presented to determine whether further action is necessary to ensure survivor safety and the safety of the University community. If you wish to report something that you have seen, you can do so at the Just Report It portal. The worst possible situation would be for you or your friend to remain silent when there are so many here willing and able to help.
Students who wish to request academic accommodation for a religious observance should submit their request to me by email as far in advance as possible. If you have questions or concerns about your request, you can contact the University’s Office for Equal Opportunity and Civil Rights (EOCR) https://eocr.virginia.edu/accommodations-religious-observance. Accommodations do not relieve you of the responsibility for completion of any part of the coursework you miss as the result of a religious observance.