The Liem Sioe Liong/First Pacific Company Professor of Statistics
Room: 471 JMHH
Job Opening: The Department has a faculty position to fill at any level, starting July 2009.
Office: (215) 898-8236, Fax: (215) 898-1280, Dept: (215) 898-8222
Email: click here for an image of the address
Curriculum vitae:
[.pdf]
(a more fun alternative from
the 2004/5 MBA guide))
STATISTICS 541 has switched from spring to fall.
Slides as of June 10, 2009.
Rapplet to create an animated display of proper scoring rules.
Either download the file and source it in R, or do this in one swoop:
source("http://stat.wharton.upenn.edu/~buja/proper-scoring-rapplet.R")
Then drag the mouse on the R plot window or hit 'h' for Help.
R script and text for regression trees
R script and text for principal component analysis
R script and text for k-means clustering
R script and text for interactive R programming
Here is the preliminary Syllabus,
and here is the webCafe site for this class
(go to "STAT 101" and your section).
The required textbook and software will be the same as in Spring 2008.
This amounts to a switch with STATISTICS 540 (Statistical Computing)
which is now taught in spring as STATISTICS 542.
Syllabus.
Here are the modules in case you lost access to webCafe.
These notes are only of historic interest.
We now use Foster and Stine's textbook instead.
See the class web page.
A preliminary version and a companion paper which I keep posted because others have
started referring to them:
Here is a precursor talk given at the Joint Statistics Meetings 1999, with Di Cook
[.pdf].
The reason for posting this now is that a few of us in the department are currently discussing
model diagnostics.
Quasi-Darwinian Selection in Marketing Relationships
[.pdf]
Journal of Marketing, Oct 2007,
featured JM blog article
and a finalist for JM's 2007 Harold H. Maynard Award.
Along with the paper go a few scenario calculations that are not included in the article:
[.pdf]
Loss Functions for Binary Class Probability Estimation: Structure and Applications. (Former title: Degrees of Boosting)
[.pdf] (under revision)
Yi Shen's 2005 Ph.D. thesis on cost-weighted class probability estimation
[pdf]
Cost-Weighted Boosting with Jittering and Over/Under-Sampling: JOUS-Boost
[pdf]
(Journal of Machine Learning Research 8 (Mar), 409-439, 2007)
Observations on Bagging (Statistica Sinica 2006, Special Issue on Machine Learning, 16 (2), 323--352 (2006))
[pdf]
The Effect of Bagging on Variance, Bias, and Mean Squared Error
[.pdf]
PPT slides,
Smoothing Effects of Bagging
[.pdf]
Calibration for Simultaneity: (Re)Sampling Methods for Simultaneous Inference with
Applications to Function Estimation and Functional Data
[.pdf, 1.7MB] (under revision)
An older version that had both papers in one should be considered out of date.
Appeared in "Handbook of Statistics" (eds. E. Wegman, C. R. Rao; 2005).
Alan Gous and Andreas Buja;
Journal of Computational and Graphical Statistics, 13 (1), 1-19 (2004).
(We are permitted to post the color version of this paper. The printed version is
b/w with gray-scale figures.)
Data Mining Criteria for Tree-Based Regression and Classification
[.ps.gz]
A. Buja and Y.-S. Lee; Proceedings of KDD 2001, 27--36.