Robert A. Stine

Department of Statistics

444 Huntsman Hall

The Wharton School of the University of Pennsylvania

Philadelphia, PA 19104-6340

Research papers

Streaming Feature Selection

Streaming feature selection evaluates potential explanatory features for a regression type model one-at-a-time rather than all at once. This approach allows faster selection methods (such as VIF regression) and avoids the need to precompute every possible predictor at the start of modeling. Building variables on-the-fly is essential in database modeling and some types of image processing.

Auctions allow a blending of substantive insight with automatic searches for predictive features. I'll put a paper here one of these days that describes the auction process more completely, but in the meantime, see these slides from a recent talk. The papers that are here are ingredients needed for the auction.

Foster, D. P. and Stine, R. A. (2013). Risk inflation of sequential tests controlled by alpha investing: This paper (submitted for publication) developes a computational method for finding the exact risk inflation of the estimator implied by a testing process that is controlled by alpha investing. The resulting feasible sets display all possible risks of one estimation procedure relative to another.

Foster, D. P. and Stine, R. A. (2008, JRSS-B). Alpha-investing: sequential control of expected false discoveries: This paper (submitted for publication) describes a procedure for testing multiple hypotheses while controling the number of false discoveries. The key distinguishing attributes are that (a) it handles a stream of hypotheses so you don't need all the p-values at once and (b) it allows an investigator to incorporate formally domain knowledge into the testing procedure. We have used a variation of this procedure to pick predictors in the auction.

Text Mining, Computational Linguistics

Data sets that are well matched to regression often come with supplemental textual information, such as written comments, open-ended responses, and annotations. Some data sets come with nothing but text. Generating regressors from these can lead to more predictive models. There are also slides from a recent talk.

Foster, D. P., Liberman, M. and Stine, R. A. (2013). Featurizing text: Converting text into predictors for regression analysis: This draft manuscript (really more of a working paper) describes fast methods for the construction of numerical regressors from text using spectral methods related to the singular value decomposition (SVD). An example uses these methods to build regression models for the price of Chicago real estate using nothing but the text of a property listing. Topic models (LDA) provide some explanation for why these methods work so well as they do. For example, our model for real estate explains some 70% of the variation in prices using just the text of the listing with no attempt to use location or related demographics.

Statistics in Finance

It can be very hard to separate luck from skill when it comes time to evaluate the success of investors. We use a dice simulation described in the following paper to illustrate this point to students, as well as show them the role of portfolios in reducing the variance of an investment.

I'll soon put another paper here that offers one approach to making the distinction, but its not ready yet. Here are the slides from a recent talk.

Foster, D. P. and Stine, R. A. (2005). Being Warren Buffett: A classroom simulation of risk and wealth when investing in the stock market: This paper describes the dice simulation, including the notion of volatility drag as a penalty for variation. This form can be used in class to organize the simulation.
Foster, D. P. and Stine, R. A. (2005). Finding Warren Buffett: Separating knowledge from luck in investments.: This paper will detail our approach using Bennett's inequality and Bonferroni to identify investments that do better than chance. The slides from a talk summarize the approach.
Foster, D.P., Stine, R.A., and Young, P (2011). A Markov test for alpha: This revised version of a previous manuscript avoids getting fooled by a clever manager, a trick that slips by using a maximal approach. The paper introduces a simple-to-perform test called the compound alpha test (CAT). The test has good power (it is tight for sequences of returns generated by "trickery") and robust to numerous assumptions. The paper includes several illustrative examples using recent stock returns.

In a different vein, the following papers consider models for the forward curve or yield curve. The models decompose the forward curve into several components that isolate different aspects of the evolution of the curve over time.

Chua, C. T., Foster, D. P., Ramaswamy, K., and Stine, R. A. (2007). A dynamic model for the forward curve Review of Financial Studies 21, 265-310.: This paper proposes an arbitrage-free model for the temporal evolution of the forward curve. The paper includes a discussion of model estimation (using a Kalman filter), examples fit to Treasury data, and comparison to alternatives.
Chua, C. T., Ramaswamy, K., and Stine, R. A. (2008). Predicting short-term Eurodollar futures. Journal of Fixed Income , to appear.: This manuscript adapts the methods used in the prior work for Treasuries to practical aspects of modeling Eurodollar futures.

Pooling Subjective Intervals

A work in progress that concerns the use of subjective confidence intervals for making business decisions. A current interactive tool based on 50% intervals is available by following this link.

Information Theory and Model Selection

Information theory (coding, in particular) provides motivation for the various types of model selection criteria in wide use today (e.g., AIC, BIC). It also leads to generalizations of these methods which allow comparison, for example, of non-nested models. These methods can also be used for 'feature selection' in data mining.

The use of information theory in model selection is not new. The AIC (Akaike information criterion) originated as an unbiased estimate of the relative entropy, a key notion in comparing the lengths of codes. More closely tied to coding are MDL and stochastic complexity that were proposed by Rissanen.

MDL (minimum description length) is typically used in an asymptotic form which assumes many observations (large n) and fixed parameters. In this setting, MDL agrees with BIC, the large sample approximation to Bayes factors. Both penalize the likelihood by (1/2) log n for each added parameter.

Foster, D. P. and Stine, R. A. (2005). Polyshrink: An adaptive variable selection procedure that is competitive with Bayes experts Submitted for publication.

This revised manuscript considers the following competitive analysis of the variable selection problem in regression. Suppose you are trying to fit a regression model and do not know which predictors to include. You decide to use the data to pick the predictors using a selection method. How well can you do this? For example, can you fit a model as well as someone else who knows the distribution of the regression coefficients? This paper gives a lower bound for how well your rival can do, and provides a method that we call "Polyshrink" that approaches its performance.

This R package implements the Polyshrink estimator described in the paper.

Stine, R. A. (2004). Model selection using information theory and the MDL principle. Sociological Methods & Research , 33, 230-260.

This overview designed for the social sciences shows how information theory expands the scope of model selection theory to emcompass the role of theory. It also shows how one can compare non-nested models as well as models of rather different form. Examples illustrate the calculations, considering several models for Harrison and Rubinfeld's Boston housing data. The paper also introduces a variation on Foster and George's RIC that allows for searches that follow the principle of marginality.

Foster, D. P. and Stine, R. A. (2004). Variable selection in data mining: building a predictive model for bankruptcy. J. Amer. Statistical Assoc., 99, 303-313.

This revision of a prior manuscript describes an application of variable selection in the context of predicting bankruptcy. The central theme is the attempt to find a selection criterion that picks the right number of variables, where "right" in this context means that it identifies the model that minimizes the out of sample error --- without having to set aside a hold-back sample. The problem is hard in this example because we consider models with more than 67,000 predictors.

The prior manuscript is here as well, but is missing the figures and one or two references. The new version differs from the prior manuscript in many ways. For example we no longer use subsampling, we use of binomial variances, and we have included a 5-fold cross-validation that compares the predictions of stepwise to those of the tree-based classifiers C4.5 and C5.0.

For the truly adventurous, a compressed tar file has all of the source code used for fitting the big models in this paper (written in C and a bit of C++). To build the program, you need a unix system with gcc, but the build is pretty automatic (that is, if you have done this sort of thing -- see the README file). The software is distributed under the GNU General Public License (GPL). You can get a "sanitized" portion of the data in this compressed tar file (Be patient... the file is a bit more than 24 MB.) The data layout follows the format needed by C4.5. Each file represents a fold of 100,000 cases. Further instructions are at the top of the names file.

To see a collection of papers that consider credit modeling more generally, go to the Wharton Financial Institutions Center for proceedings from the Credit Risk Modeling and Decisioning conference which was held here at Wharton, May 29-30, 2002.

Foster, D. P., Stine, R. A., and Wyner, A. D. (2002). Universal codes for finite sequences of integers drawn from a monotone distribution. IEEE Trans. on Information Theory , 48, 1713-1720.

We show that you can compress a data series almost as well as if you knew the source distribution. The bounds on performance that we obtain are not asymptotic, but apply for all sequence lengths.

Stine, R. A. and Foster, D. P. (2001). The competitive complexity ratio Proceedings of 2001 Conf on Info Sci and Sys, WP8 1-6.

Stochastic complexity is a powerful concept, but its use requires that the associated model have bounded integrated Fisher information. Some models, like that for a Bernoulli probability, satisfy this condition, but others do not. In particular, the normal location model or regression model do not have bounded information. This leads one to bound the parameter space for the model in some fashion, and then compute how this bound affects a code length and the comparison of models.

Foster, D. P. and Stine, R. A. (1999). Local asymptotic coding IEEE Trans on Information Theory , 45, 1289-1293.

Dean Foster and I show that the usual asymptotic characterization of MDL (ie, (1/2) log n) is not uniform. It fails to hold near the crucial value of zero. Near zero, the MDL criterion leads to an "AIC-like" criterion.

Foster, D. P. and Stine, R. A. (2006). Honest confidence intervals for the error variance in stepwise regression To appear, at long last.

This paper describes the impact of variable selection on the confidence interval for the prediction error variance Of a stepwise regression. When you pick a model by selecting from many factors, you need to widen the interval for s² to account for selection bias. The problem is particularly acute when one has more predictors (features) than observations, as often occurs in data mining. This text file has the monthly stock returns used in the example.

Introductory Lectures

An introductory sequence of lecture notes on methods of model selection (from a tutorial I've given) are also available. The emphasis is to build ideas needed to support the information theory point of view, so the coverage of some areas (like AIC) is less comprehensive.

Hidden Markov Models (HMM)

One paper deals with the problem of estimating the arrival rate and holding distribution of a queue. What makes it hard is that you do not get to see the arrivals, just the number in the queue. Fortunately, the covariances characterize both the arrival rate and holding distribution. With some approximations that give the queue a Markovian form, one can use dynamic programming to compute a likelihood (via a hidden Markov model).

A paper co-authored with J. Pickands appears in Biometrika, 84, 295-308.

A second paper considers two issues: the covariance structure of HMMs (including multivariate HMMs) and the use of model selection methods based on these covariances to find the order of the model (that is, the dimension of the underlying Markov chain). The idea is to exploit the connection between the order of the HMM and the implied ARMA structure of the covariances.

Autocovariance structure of Markov regime switching models and model selection , written with Jing Zhang (who did the hard parts), is to appear in Journal of Time Series Analysis .

Statistical Computing Environments for Social Research

This collection of papers (published by Sage in 1996 and co-editted by myself and John Fox) describes and contrasts seven programming environments for doing statistical computing:

Data analysis using APL2 and APL2STAT John Fox and Michael Friendly
Data analysis using Gauss and Markov J. Scott Long and Brian Noss
Data analysis using Lisp-Stat Luke Tierney
Data analysis using Mathematica Bob Stine (me)
Data analysis using SAS Charles Hallahan
Data analysis using Stata Lawrence Hamilton and Joe Hilbe
Data analysis using S-plus Dan Schulman, Alec Campbell, and Eric Kostello
AXIS: an extensible graphical user interface for statistics Bob Stine (me, again)
The R-code: a graphical paradigm for regression analysis Sandy Weisberg
ViSta: a visual statistics system Forrest Young and Carla Bann

Each of these environments is programmable, with a flexible data model and extensible command set.

Examples for each show how to do kernel density estimation, robust regression, and bootstrap resampling. Additional articles focus on three extensions of LispStat.

Graphical interpretation of a variance inflation factor,

The American Statistician , 49, Feb 1995.

This paper illustrates the use of an interactive plotting tool implemented in Lisp-Stat to reveal the simple relationship among partial regression plots, component plots, and the variance inflation factor. The data sets from the paper are in the files fighters.dat and wildcats.dat .

The basic idea is that the ratio of t-statistics associated with these two plots is the square root of the associated VIF. The interactive plot shows how collinearity affects the relationship presented in regression diagnostic plots. One uses a slider to control how much of the present collinearity appears in the plot.

Explaining normal quantile plots through animation ,

To appear, The American Statistician 2016.

This manuscript characterizes quantile-quantile plots as a comparison between water levels in two vases that are gradually filled with water. Imagine water filling two vases, each able to hold a liter of water. Assume the water fills the vases at the same rate. If the vases have the same shape, then the water levels will match. A graph of the water level in one versus the water level in the other would then trace out a diagonal line. That's also the idea of these animated QQ plots. A parametric plot of the water levels in gradually filling vases shaped as probability distributions motivates quantile plots as used in statistics.

The R package qqvases implements this construction. The software shows an animation of the process, allowing you to choose different distributions. The plots are nicer with smooth populations, but you can show the similar figures with samples. These don't look so good unless the sample sizes are fairly large. The implementation requires installing R and shiny on your system (not to mention, knowing R). You can also try the procedure by following this link (thanks to Dean Foster for figuring out how to get Shiny running) to see an on-line version of the software in your browser (avoiding the need to install R and shiny on your own system).

Presentation slides

Institute for Research in Cognitive Science (U. Penn) and City University of New York. October, 2013.

Featurizing text
This talk describes the construction of regressors for regression (aka, explanatory variables) from text. A running example illustrates the construction using data on prices of real estate. Details of the method appear in this manuscript, and a video of the talk at CUNY is available from this link.

Joint Statistics Meeting, Montreal. August, 2013.

Risk inflation of sequential tests
This talk describes the risk of estimators based on a sequence of tests, conducted sequentially and controlled by alpha investing. The universal rule is seen to work well, and the analysis of feasible sets motivates conjectures about worst case processes and attainable risks. This manuscript provides the background and details.

34th New Jersey ASA Spring Symposium, New Brunswick, NJ. June, 2013.

Models for millions
These slides survey several methods that are useful when building regression-type models from very large (huge) data sets. The emphasis is on calculation rather than graphics or model selection per se. The presentation has two themes: reducing the size of the data table and approaching the data sequentially rather than en masse (streaming). Methods include random projection, stochastic gradient, auction models, and VIF regression . There are also a few cautions, such as the problems outliers can produce with sparse data, issues with cross validation (population drift), and dependence.

Credit Scoring and Credit Control XII, Edinburgh, Scotland. August, 2011.

Spatial-temporal models for retail credit
This talk explores recent spatial trends in default rates in the US using county-level data from TransUnion. It includes models for the spatial patterns using a traditional regression freamework as well as a more flexible procedure using singular value decomposition.

SIAM International Conference on Data Mining, Phoenix, AZ. April, 2011.

Linear models for latent states
This talk describes a method for building predictive features for text mining to identify named entities (NER). The method combines classical linguist ideas with random projections. Streaming feature selection is then used to choose the most predictive variables from the created ensemble.

Modern Massive Data Sets, MMDS 2010, Stanford, CA. June, 2010.

Streaming feature selection
This talk is a refined version of the talk below given at Texas, with more thoughts on the applications and the algorithms. The talk emphasizes the similiar issues that occur in modeling different types of complex data (spatial-temporal records of credit default and linguistic data) and the concepts that drive our approach to models. The talk also describes the auction paradigm we use to select model features.

Conference on Resampling Methods and High Dimensional Data, College Station, TX. March, 2010.

Streaming feature selection
This talk describes our auction framework for feature selection in building models. Topics are the design of experts to nominate features, robust standard errors using sandwich estimates, and alpha-investing to control multiplicity. Specific applications include modeling spatial-temporal data and cloze problems in linguistics.

Wharton Commodities Club, Philadelphia, PA. December, 2009.

Statistical methods for valuing investments
This talk combines elements of several earlier presentations, collecting topics to keep in mind when judging the performance of investments from data.

Joint Statistics Meeting, Washington, DC. August, 2009.

Model risk in finance
This talk describes an approach for evaluating an investment (testing for the presence of excess alpha). The novel contribution is the use of the martingale test (CERT, described in this manuscript to judge the significance of the estimate of alpha.

Philadelphia Chapter, ASA. February, 2009.

Valuing investments: a statistical perspective
This talk considers things to consider when investing from a statistical point of view (accepting that data measures what it claims to measure...)

Wharton Commodities Club, Philadelphia, PA. November, 2008.

Modeling the yield curve
These slides accompany the talk given to the Wharton Finance Club. The slides introduce models for the yield curve and underlying issues in statistical models.

Credit Scoring and Control, Edinburgh. August, 2007.

Space-time models for retail credit
Credit data are often viewed at one point in time or one geographical region. We often see time series of defaults for the nation or default rates over states in a chosen year. But how are these two aspects of the data related? Is the behavior over time stationary? Are the regions homogeneous? Can one model accommodate different locations and periods --- and is this a good stress test of the model? See what you think.

Joint Statistics Meeting, Salt Lake City. August, 2007.

Alpha-investing: sequential control of expected false discoveries
This talk gives an overview of alpha-investing, a method for testing a stream of hypotheses while controling the expected number of false discoveries. A copy of the underlying paper is here.

University of Southampton and University of Edinburgh. April, 2007.

Space-Time Models for Retail Credit
This talk describes the early stages of modeling the evolution of spatial patterns in the market for retail credit in the US. The emphasis is on looking at the maps of data and formulating simple models that include spatially varying macroeconomic variables. This work is part a collaboration with the Federal Reserve Bank of Philadelphia, but the errors are all mine!

Northern Illinois University. March, 2006.

Introduction to Data Mining and Data Mining using Regression
These two talks introduce data mining using a variation on stepwise regression. The first is more introductory, illustrating how useful regression can be while also noting some problems. The problems are addressed in the second talk that shows how to use RKHS methods, auctions, streaming selection criteria, shrinkage, and (whew) calibration to enhance regression.

AAAI 2005, Pittsburg, PA. July, 2005.

Tutorial on feature selection
This zip file contains slides used by Huan Liu and myself for this tutorial. The slides overview the area, then cover methods from statistical variable selection (wrapper methods) and computer science (filter method). The final section introduces ideas of streaming feature selection.

Summer Program in Data Analysis (SPIDA), York University, Toronto. June, 2005.

An Overview of Bootstrap Resampling
This presentation overviews bootstrap resampling, emphasizing the ideas. Illustrations include ratios from survey research and longitudinal modeling. R commands used to generate the examples are in this text file and here are the data sets:
- Osteoporosis in women (intro)
- Florida 2000 election (regression)
- Earnings by region (ratio)
- Sales in districts (longitudinal)
For more background information, have a look at my paper "An Introduction to Bootstrap Methods" (which appeared in Sociological Methods & Research back in 1989).

Department of Statistics, University of Pennsylvania. April, 2005.

Variable Selection in Wide Data Sets
This talk looks at issues of picking predictors in wide data sets, data sets with more columns than rows. The talk also introduces a new multiple testing criterion (EDC) and procedure (alpha-investing rules).

Department of Statistics, Rutgers University. February, 2005.

Variable Selection in Wide Data Sets
This talk summarizes work that Dean Foster and I have done on variable selection. The talk motivates the problem of picking predictors with an illustration using stocks, looks at what it takes to make regression succeed, and introduces a sequential approach to feature selection.

Data Mining Conference, The College of New Jersey. January, 2005.

Feature Selection in Models for Data Mining
This presentation introduces the role of feature selection in building models from large data bases which offer numerous possible predictive features. It has several examples of successes and failures with regression and motivates the use of auction models.

INFORMS, Denver, CO. October 2004.

Being Warren Buffett
This talk introduces a dice simulation that shows students the effect of variance on investments. It then considers why it is that teams of students using identical strategies obtain very different results. Could it be that Warren was just lucky?

M2004 SAS Data Mining Conference, Las Vegas, NV. October 2004.

Auction Modeling
This presentation describes the use of an auction to select variables to use in predictive models. Auctions allow the algorithm to combine the knowledge of experts with fast heuristic search strategies. The procedure searches sequentially, allowing it to explore conceptually infinite collections of features.

Credit Rating and Scoring Models, Washington DC. May 2004.

Auctioning experts in credit modeling
This talk overviews the use of auctions to blend domain experts with automatic search methods when building models from "wide" datasets.

Credit Scoring and Credit Control, VIII, Edinburgh. September 2003.

Auctioning predictors: combining domain knowledge with automatic search strategies
This talk introduces the use of auctions to have the best of both worlds. An auction setting provides the foundation for using automatic sequential searches of multiple streams of predictors. Multiple bidders rate various choices, with successful bidders learning and earning greater wealth and so contributing more to the fit.

Nonparametric Modeling, Crete. July 2002.

Data mining with stepwise regression
This talk describes how one can make familiar stepwise regression into a powerful data mining tool with just a few simple to implement adjustments. The adjustments deal with the standard error of slope estimates, methods for assessing significance, and the use of an adaptive variable selection criterion.

Profiting from Data Mining

This ppt talk discusses issues in data mining, including the value of methods that I have developed with Dean Foster on the use of adaptive variable selection for predictive modeling. Examples consider predicting stock prices and personal credit risks.

Credit Scoring and Credit Control, VII, Edinburgh. September 2001.

Adaptive variable selection in credit modeling
This talk shows results of fitting large, automatically-selected regression models to bankruptcy data. The domain for variable selection is over 100,000 predictors.

Temple University and Univ. of Pennsylvania. November, 2000

Variable selection in models for rare events
This talk updates with further results from applications the talk given to the ASA meeting in August 2000.

ASA Annual Meeting, Indianapolis, ID. August, 2000

Variable selection in models for bankruptcy
This presentation describes the use of modern variable selection methods to predict the onset of bankruptcy. Adresses issues of non-constant variance, stepwise search with 65,000 predictors, and the role of cross-validation with rare events. This paper is an update, with a more complete example, of the ideas presented in 1999 in Baltimore (below).

MSMESB Conference, Syracuse NY. June, 2000

An overview of data mining
This talk describes data mining with emphasis on how data mining problems and ideas can be used in a business statistics class.
D.K. Hildebrand memorial session on teaching
Dave made substantial contributions to teaching statistics at Wharton. Some of his contributions are described here.

ASA Annual Meeting, Baltimore, MD. August, 1999

Variable selection in credit modeling
Can you really do this with stepwise regression? What if you select from 67,000 predictors?

DSI Talks in the DASI track (formerly MSMESB)

Bob Andrews continues to do a great job organizing a track of sessions now named Data, Analytics, and Statistics Instruction (the old Making Statistics More Effective in Schools of Business ) at the annual national and SE DSI meetings. I managed to make it to several of these to give short talks, most often as part of panel discussions. The slides with a little introduction are listed below.

Difficult Topics (2010, Wilmington)
Everyone has their own list of topics that are hard to teach in an introductory business statistics class. Here are a few of mine. This was for a panel discussion, so only a few slides to get a discussion going.
Statistical Stories (2011, Savannah)
Students often remember the context of a great example better than the technical name of the underlying concept. One of my favorites is this "cottage example". That phrase is more memorable for students than "influential leverage point" and reminds them of the importance of these outliers in regression. Other examples of outliers raise the issue of whether to remove them (better not if the data measure dollars) and transformations.
Data & Model Vizualization (2012, Columbia)
A longer talk. Some things are easier taught by letting go of the traditional algebraic presentation and leveraging a bit of technology. Two examples: normal quantile plots and interactions in regression. For normal quantile plots, I have a more recent on-line tool for animating these available here (with some additional explanation in this manuscript ). For teaching interactions in miultiple regression, I am a big fan of the "profile plot" available in JMP. After students use this tool, they can appreciate the details the classical algebra reveals.
Getting Ready for Big Data (2013, Baltimore)
Okay, this one was for DSI, not SEDSI. The availability of huge, messy data streams changes the nature of statistical analysis. This talk suggests topics (e.g., multiplicity) that need to be addressed in order to prepare students for these challenges.
Motivating Learning & Measuring Outcomes (2013, Charleston)
Others in this session talked about in-class aids for encouraging participation. I talked about a role for in-class quizes and multiple choice exams. I like multiple choice final exams when teaching large intro classes with 300 or more students. Spend your time writing the exam questions rather than dealing with trying to get consistent grades to essay questions.
Fitting Big Data into Business Statistics (2014, Wilmington)
Back to Wilmington and catching the big-data wave with comments on topics to add (examples with really large data sets) and ways to incorporate these into the syllabus without having to toss your favorite outline.
Linking Goodness of Fit to Economic Gains (2015, Savannah)
We like to tell students that better fitting models with more precise predictions are good. But really, how good? Let's put a dollar value on the gain offered by finding a better-fitting model. You need a good context in order to quantify how much money that higher R-squared might mean, and the classic newsvendor is a good one. Plus, it offers the chance to connect intro stat to another course.
Engaging Students in the Time of Business Analytics (2016 DSI, Austin)
I find that examples, when framed to get students interested in the problem and not just the formulas, are an attractive way to pull their attention away from phones and other distractions. This short talk describes a few examples I have used in an introductory course on regression analysis.
Analytics and Big Data: The Role for Statistical Significance (2017 SEDSI, Charleston)
This talk describes how the notion of statistical significance and sampling variation remain relevant even if you have "big data". The talk uses examples to show how dependence, multiplicity, and outliers influence results.
Software for Teaching Analytics: Is it time for R? (2017 DSI, Washington DC)
This talk discusses pros and cons of using R for teaching introductory statistics and analytics courses. The introduction of Markdown notebooks in R-Studio make it possible to show students R while reducing the need for them to learn so much of the syntax of programming. These notebooks allow you -- and students -- to blend explanations, code, and results into one document. An example notebook ( example.Rmd ) that produces this html page illustrates the ideas. The same methods can also be used to make a book, like this R companion for my intro stat textbook with Dean Foster.
"Innovative" Instruction for Business Analytics (2017 DSI, Washington DC)
Business analytics means more to cover (more business context and communication) in less time with (in my case) more students than ever. This course describes leveraging technology both in the classroom (using an iPad rather than whiteboards) and outside the classroom (BlueJeans online evening office hours). (Why the quotes around innovative? I doubt that any of these are very novel, but the combination might be.)

Computing

Mapping lambda functions in C++

[This was relevant before C++ got serious about lambda functions in C++11.] I'm an old APL/Lisp programmer and I've always been annoyed that C++ and the STL are so clumsy to use when it comes to mapping a function over a range. The code given here handles that task, without having to wait for the next version of the Boost code, C0x, and a compiler that can make any of that stuff. This tar file has files that define

a range (basically a pair of iterators),
maps functions over ranges, allowing delayed evaluation, and
permit maps of Boost lambda functions over ranges.

You will need to have the gcc and the Boost library installed on your system. Unpack the tar file (tar -xvzf lambda_maps.tzg), move into that directory, and then build the application (make all). The application will run at the command line. To see the interesting stuff, have a look into the file function_iterators.test.cc and then wind your way farther back into the code. The 'interesting' line is this one (though I cannot recall the html format to show what's in the brackets):

std::cout << make_unary_range(ret<>(_1+6.6),iz) << std::endl;

iz is a vector of doubles; this lambda function defines a new range with elements that are 6.6 plus those in the range defined by the container iz. Notice that the lambda function has to declare explicitly its return type.

Data analysis tools for Mathematica

I've defined some simple tools for analyzing data using Mathematica. These two files give you a notebook that illustrates the commands (using a small example with stock prices) and the accompany package of definitions. The third file has data used in the examples.

DataTools.nb (an illustrative notebook)
DataToolsCode.m (the package)
gm8788.dat (two years of daily data)

AXIS command interface for Lisp-Stat

AXIS provides a point-and-click, iconic interface to Lisp-Stat. Menus and dialogs provide access to an extended set of statistical analysis routines, particularly for bootstrap resampling. The interface is extensible and includes features to support various linked views. A zip file has the needed files.

Two useful files of documentation are axis.ps , which offers an overview of the use of the AXIS interface, and princomp.ps , which shows how to extend the interface by adding a command to perform principal components analysis. The associated lisp files for adding commands are

axishist.lsp , a simple example which adds a histogram command, and
princomp.lsp , a more elaborate illustration which adds a principal components command.

Further discussion of AXIS with emphasis on extending the interface appears in the collection of edited papers, Statistical Computing Environments for the Social Sciences (Sage, 1996).

Some sample data sets for use with AXIS are:

duncan.dat The duncan data on occupational prestige.
fighters.dat Performance data on fighter aircraft from Cook and Weisburg.
lsat.dat Efron's data on law school admissions.

Automated simulation methods

This paper (published in the 1992 ASA proceedings and here as a pdf file) describes the use of Lisp-Stat to automate some of the more tedious aspects of Monte Carlo simulations.

3-D Rotation methods

This is work in progress which considers the use of various rotation methods for diöscovering problems in multiple regression models.

The paper
Lisp-Stat code

Interactive wavelet plots in Lisp-Stat.

This material was presented at the 1995 ASA Meeting in Orlando.

Transparencies from talk
Lisp-Stat program

Teaching

Textbooks

Statistics for Business: Decision Making and Analysis (Third Edition, with Dean Foster)

News for R users. Like most books in the B-stat market, our text features Excel, along with JMP and Minitab. I have a real fondness for the visualization capabilities of JMP, but R has become popular with the growing interest in data science. To help those interested in R, I have prepared an R-companion for the 3rd edition of this textbook. The 3rd edition has Analytics in Excel embedded in the text chapters; this companion shows you how to do all of those examples using R instead. Each section of the on-line material shows how to do the "Analytics" applications from our textbook in R. This link takes you to the companion itself. The R-examples use data and examples from the 3rd edition; those from the 2nd edition are similar, but not all the same. YMMV. If working through the examples on your own computer, you will need the data archive file and a few supplemental R functions defined in the file functions.R

Every so often I will add useful classroom supplements from instructors or past students who send me links to related content.

Neil Desnoyers(Drexel): The Area Principle
These notes illustate how fancy 3-D perspective views of pie charts often mislead by violating the area principle (Section 3.3).
Wharton student (Stat 102): Spurious Correlation in Time Series
If you search though enough data, you will find large correlations. Like I say in class, statistics rewards persistence. This web site collects a large number of these and shows sequence plots of series that happen to have very large correlations. You can also manufacture your own spurious correlations, disguising a random walk as a trend (an exercise in Chapter 27 Time Series).

Basic Business Statistics and Business Analysis using Regression (with Dean Foster and Richard Waterman)

These casebooks offer a collection of data analysis examples that motivate and illustrate key ideas of statistics, ranging standard error to regression diagnostics. The data used in the casebooks can be downloaded by following the above link.

MBA Statistics Concentration and Courses

Core MBA courses (Stat 603, Stat 604, Stat 608, Stat 621)

These are notes for older versions of the courses as offered up to 2001. Current versions of the materials for these are available from Web Cafe. For information about a concentration in Statistics in the MBA program, see this overview of the requirements.

Statistics 622. Data-driven statistical modeling

This six-week course is offered (usually) in the first and third quarters. It develops modeling ideas introduced in Stat 621, with a greater emphasis on decision making taking costs into account. The course also introduces more recent developments in automated modeling and visualization. Topics include

Automatic construction of regression models
Calibrating predictions
Models for classifying cases into groups
Data mining techniques including neural networks and tree models
Optimal ordering and cost minimization methods

This talk ( ppt slides ) looks at data mining from a business and modeling point of view, adding in the comments of a statistician who builds models with some experiences from doing the modeling in business problems such as financial modeling and credit risk analysis. You can get a pdf version of the powerpoint slides as well.

Statistics 712. Decision Making using Statistics

This course develops the role of statistical methods in the decision making process. Rather than a formal decision theory course, the emphasis is upon practical methods used in day-to-day work, including the reconciliation of judgement and quantitative summaries, the role of coincidence, and principles of classical utility theory.

Undergrad and Graduate Statistics Courses

Statistics 102. Introduction to Statistics (2001)

This course is my version of a one-semester introductory statistical methods course, covering hypothesis testing through regression circa 2001. The emphasis is data analysis using regression. I'd do this course differently now. Here's a pdf file of slides that introduce fitting curves in regression. The data files covered in these slides are

Insurance 260. Introduction to Time Series Analysis (from 2009)

This six-week portion of INSR 260 is an introduction to the practical analysis of time series data, emphasizing data analysis and intrepretation.

Statistics 430. Introduction to Probability

This course introduces students to probability theory.

Statistics 540. Statistical Computing

A course that combines the foundations of statistical computing done using Tierney's Lisp stat with the development of Web pages that describe research areas in statistics.

Statistics 910. Time Series Analysis and Forecasting

This course is a mixture of the theory and use of time series methods. Theoretical material focusses upon the properties of stationary time series, emphasizing Hilbert space methods and state-space models. Applications blend theory with computing. Simulations are used to check various large sample approximations.

SMMD for ISB, 2003

These are the lecture notes and data used in the ISB program for 2003.

Lectures

Bootstrap Resampling Lectures: This series of five lectures (and an overview lecture) introduces the ideas of bootstrap resampling. The presentation is mostly via examples of applications in statistics, emphasizing regression-type models. Additional topics include the construction of confidence intervals and applications in time series and structural equations.
Data Mining Lectures: These lectures introduce students from the social sciences to the ideas of data mining. The emphasis is on the big picture, with lots of examples using data from ICPSR, medical trials, and business applications.
Text Analytics Lectures: These lectures introduce the ideas of text analytics. The emphasis is "featurizing" text, turning text data into the familar numerical information used in, say, regression models.