Statistics 471/571/701 Modern Data Mining, Spring 2019

Course description

Statistics has been evolving rapidly to keep up with the modern world, especially with compu- tational methods for the explosion of data. As a signinificant part of data science we start the class with exploratory data analysis (EDA). We then show how to build, interpret, and adapt sim- ple models; then go beyond with newer contemporary methods and techniques for handling large and complex data with applications in finance, marketing, medical fields, social science, entertain- ment, you name it. While this course makes extensive use of the statistical programming language R, no programming experience is required. By the end of the semester we hope that students have not only learned the modern statistical methods but have also become skilled in dealing with data of essentially any size. This class is cross-listed as STAT 471 for undergraduates, STAT 571 as a graduate level course for students outside of the statistics department, and STAT 701 for MBA’s.

Course materials are available on Canvas.


Topics

Exploratory Data Analysis (EDA)
Multiple Regression/Stepwise Regression
Logistic Regression/Multi-Nomial regression
K-nearest neighbors (KNN)
Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA)
Penalized regression: LASSO, Ridge Regression, Elastic Net
Principal Components Analysis (PCA)
Tree based methods such as Boosting, Random Forest
Support Vector Machines.
Text mining sentiment analysis
Bootstrap and k-fold cross validation
Training and Testing errors
ROC/AUC and FDR
Neural network/Deep learning
Network model
Unsupervised learning

Papers   Teaching  Home  Links  Life