The Liem Sioe Liong/First Pacific Company Professor of Statistics
Room: 471 JMHH
Job Opening: The Department has a faculty position to fill at any level, starting July 2010.
Office: (215) 898-8222 (Leave a note with the administrator.)
Email: Click here for an image of the address. (The address 'buja@wharton...' is obsolete.)
Curriculum vitae:
[.pdf]
(a more fun alternative from
the 2004/5 MBA guide))
Slides as of June 10, 2009.
Rapplet to create an animated display of proper scoring rules.
Either download the file and source it in R, or do this in one swoop:
source("http://stat.wharton.upenn.edu/~buja/proper-scoring-rapplet.R")
Then drag the mouse on the R plot window or hit 'h' for Help.
R script and text for regression trees
R script and text for principal component analysis
R script and text for k-means clustering
R script and text for interactive R programming
Here is the preliminary Syllabus,
and here is the webCafe site for this class
(go to "STAT 101" and your section).
The required textbook and software will be the same as in Spring 2008.
Syllabus.
Here are the modules in case you lost access to webCafe.
These notes are only of historic interest.
We now use Foster and Stine's textbook instead.
See the class web page.
A related topic is model checking with parametric bootstrap:
I brought this up back in 2004 in a discussion of a paper by Andrew Gelman
who does the same with a posterior predictive approach.
A preliminary version and a companion paper which I keep posted because others have
started referring to them:
Here is a precursor talk given at the Joint Statistics Meetings 1999, with Di Cook
[.pdf].
Here are
Gelman's JCGS article,
followed by my discussion,
and his rejoinder.
An older version that had both papers in one should be considered out of date.
Appeared in "Handbook of Statistics" (eds. E. Wegman, C. R. Rao; 2005).
Quasi-Darwinian Selection in Marketing Relationships
[.pdf]
Journal of Marketing, Oct 2007,
featured JM blog article
and a finalist for JM's 2007 Harold H. Maynard Award.
Along with the paper go a few scenario calculations that are not included in the article:
[.pdf]
Loss Functions for Binary Class Probability Estimation: Structure and Applications. (Former title: Degrees of Boosting)
[.pdf] (under revision)
Yi Shen's 2005 Ph.D. thesis on cost-weighted class probability estimation
[pdf]
Cost-Weighted Boosting with Jittering and Over/Under-Sampling: JOUS-Boost
[pdf]
(Journal of Machine Learning Research 8 (Mar), 409-439, 2007)
Observations on Bagging (Statistica Sinica 2006, Special Issue on Machine Learning, 16 (2), 323--352 (2006))
[pdf]
The Effect of Bagging on Variance, Bias, and Mean Squared Error
[.pdf]
PPT slides,
Smoothing Effects of Bagging
[.pdf]
Calibration for Simultaneity: (Re)Sampling Methods for Simultaneous Inference with
Applications to Function Estimation and Functional Data
[.pdf, 1.7MB] (under revision)
Alan Gous and Andreas Buja;
Journal of Computational and Graphical Statistics, 13 (1), 1-19 (2004).
(We are permitted to post the color version of this paper. The printed version is
b/w with gray-scale figures.)
Data Mining Criteria for Tree-Based Regression and Classification
[.ps.gz]
A. Buja and Y.-S. Lee; Proceedings of KDD 2001, 27--36.