Statistics 471/571/701 Modern Data Mining, Spring 2019

Course description

Statistics has been evolving rapidly to keep up with the modern world, especially with compu- tational methods for the explosion of data. As a signinificant part of data science we start the class with exploratory data analysis (EDA). We then show how to build, interpret, and adapt sim- ple models; then go beyond with newer contemporary methods and techniques for handling large and complex data with applications in finance, marketing, medical fields, social science, entertain- ment, you name it. While this course makes extensive use of the statistical programming language R, no programming experience is required. By the end of the semester we hope that students have not only learned the modern statistical methods but have also become skilled in dealing with data of essentially any size. This class is cross-listed as STAT 471 for undergraduates, STAT 571 as a graduate level course for students outside of the statistics department, and STAT 701 for MBA’s.

Course materials are available on Canvas.

Topics

Exploratory Data Analysis (EDA)

Multiple Regression/Stepwise Regression

Logistic Regression/Multi-Nomial regression

K-nearest neighbors (KNN)

Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA)

Penalized regression: LASSO, Ridge Regression, Elastic Net

Principal Components Analysis (PCA)

Tree based methods such as Boosting, Random Forest

Support Vector Machines.

Text mining sentiment analysis

Bootstrap and k-fold cross validation

Training and Testing errors

ROC/AUC and FDR

Neural network/Deep learning

Network model

Unsupervised learning