Statistics has been evolving rapidly to keep up with the modern world, especially with compu- tational methods for the explosion of data. As a signinificant part of data science we start the class with exploratory data analysis (EDA). We then show how to build, interpret, and adapt sim- ple models; then go beyond with newer contemporary methods and techniques for handling large and complex data with applications in finance, marketing, medical fields, social science, entertain- ment, you name it. While this course makes extensive use of the statistical programming language R, no programming experience is required. By the end of the semester we hope that students have not only learned the modern statistical methods but have also become skilled in dealing with data of essentially any size. This class is cross-listed as STAT 471 for undergraduates, STAT 571 as a graduate level course for students outside of the statistics department, and STAT 701 for MBA’s.
Course materials are available on Canvas.
Exploratory Data Analysis (EDA) |
Multiple Regression/Stepwise Regression |
Logistic Regression/Multi-Nomial regression |
K-nearest neighbors (KNN) |
Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) |
Penalized regression: LASSO, Ridge Regression, Elastic Net |
Principal Components Analysis (PCA) |
Tree based methods such as Boosting, Random Forest |
Support Vector Machines. |
Text mining sentiment analysis |
Bootstrap and k-fold cross validation |
Training and Testing errors |
ROC/AUC and FDR |
Neural network/Deep learning |
Network model |
Unsupervised learning |