Andreas Buja's Home Page
The Liem Sioe Liong/First Pacific Company Professor of Statistics
Department of Statistics
The Wharton School
University of Pennsylvania
Philadelphia, PA 19104-6340
Room: 471 JMHH
Office: (215) 898-8236, Fax: (215) 898-1280, Dept: (215) 898-8222
Email: click here for an image of the address
Curriculum vitae:
[.pdf]
(a more fun alternative from
the 2004/5 MBA guide))
Job Opening: The Department has a faculty position to fill at any level, starting July 2009.
Teaching:
- STATISTICS 101, Fall 2008: Introductory Business Statistics (I)
Here is the preliminary Syllabus,
and here is the webCafe site for this class
(go to "STAT 101" and your section).
The required textbook and software will be the same as in Spring 2008.
- STATISTICS 541, Statistical Methods, Fall 2008
(class web page)
STATISTICS 541 has switched from spring to fall.
This amounts to a switch with STATISTICS 540 (Statistical Computing)
which is now taught in spring as STATISTICS 542.
- STATISTICS 101, Spring 2008: Introductory Business Statistics (I)
Syllabus.
- STATISTICS 101, Spring 2007: Introductory Business Statistics (I)
Here are the modules in case you lost access to webCafe.
These notes are only of historic interest.
We now use Foster and Stine's textbook instead.
- STATISTICS 102, Spring 2006: Introductory Business Statistics (II)
(syllabus)
- STATISTICS 621, Fall 2003: Business Analysis with Regression
- STATISTICS 102, Spring 2003: Introductory Business Statistics (II)
- STATISTICS 540, Fall 2002: Statistical Computing
See the class web page.
Current Interests:
- Multivariate Analysis: A talk I gave at an econometrics workshop at Stanford.
It tries to answer the question of how to choose the ``reference
metric'' or the constraint in multivariate methods based on
eigendecompositions. [.pdf]
- Model checking with parametric bootstrap: I brought this up back in 2004 in a discussion
of a paper by Andrew Gelman who does the same with a posterior predictive approach. Here are
Gelman's JCGS article,
followed by my discussion,
and his rejoinder.
The reason for posting this now is that a few of us in the department are currently discussing
model diagnostics.
- Lisha Chen's thesis paper:
undo
Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Layout and Proximity Analysis
[.pdf]
- On a topic in marketing, joint with
N. Eyuboglu:
Quasi-Darwinian Selection in Marketing Relationships
[.pdf]
Journal of Marketing, Oct 2007,
featured JM blog article
and a finalist for JM's 2007 Harold H. Maynard Award.
Along with the paper go a few scenario calculations that are not included in the article:
[.pdf]
- Here is a paper that started out as work on boosting but turned into something different,
joint with Werner Stuetzle and Yi Shen:
Loss Functions for Binary Class Probability Estimation: Structure and Applications. (Former title: Degrees of Boosting)
[.pdf] (under revision)
Yi Shen's 2005 Ph.D. thesis on cost-weighted class probability estimation
[pdf]
- On a simple modification of boosting, joint with David Mease and Adi Wyner:
Cost-Weighted Boosting with Jittering and Over/Under-Sampling: JOUS-Boost
[pdf]
(Journal of Machine Learning Research 8 (Mar), 409-439, 2007)
- A paper on bagging, joint with
Werner Stuetzle:
Observations on Bagging (Statistica Sinica 2006, Special Issue on Machine Learning, 16 (2), 323--352 (2006))
[pdf]
A preliminary version and a companion paper which I keep posted because others have
started referring to them:
The Effect of Bagging on Variance, Bias, and Mean Squared Error
[.pdf]
PPT slides,
Smoothing Effects of Bagging
[.pdf]
- A paper on the use of simulation
for simultaneous inference, with Wolfgang Rolke:
Calibration for Simultaneity: (Re)Sampling Methods for Simultaneous Inference with
Applications to Function Estimation and Functional Data
[.pdf, 1.7MB] (under revision)
- Talk given at the Joint Statistics Meetings 1999, on the possibility
of valid inference in exploratory data analysis, with Di Cook:
Inference for Data Visualization
[.pdf]
- Two papers on MDS:
-
Visualization Methodology for Multidimensional Scaling
[pdf]
A. Buja and D.F. Swayne; J. of Classification, 19, 7-43, 2002.
-
Interactive Data Visualization with Multidimensional Scaling
[pdf]
(to appear in JCGS, 2008)
joint with Deborah Swayne, Michael Littman, Nate Dean, Heike Hofmann, and Lisha Chen.
- Two papers on special topics in high-dimensional data visualization,
joint with Di Cook, Dan Asimov, and Catherine Hurley:
- Computational Methods for High-Dimensional Rotations in Data Visualization
[pdf]
Appeared in "Handbook of Statistics", eds. E. Wegman, C. R. Rao.
- Theory of Dynamic Projections in High-Dimensional Data Visualization
[pdf] (submitted)
An older version that had both papers in one should be considered out of date.
-
Visual Comparison of Datasets using Mixture Decompositions
[.pdf]
Alan Gous and Andreas Buja;
Journal of Computational and Graphical Statistics, 13 (1), 1-19 (2004).
(We are permitted to post the color version of this paper. The printed version is
b/w with gray-scale figures.)
- A paper on classification and regression trees with a datamining orientation:
Data Mining Criteria for Tree-Based Regression and Classification
[.ps.gz]
A. Buja and Y.-S. Lee; Proceedings of KDD 2001, 27--36.
On writing:
Here is an article everybody should read:
- The Science of Scientific Writing
by Gopen and Swan
[HTML]
[.pdf]]
originally published in the
American Scientist,
retyped and posted with permission.
It's the single best piece on writing in the sciences--no exaggeration!