Room: 471 JMHH
Doctoral Program in Statistics:
Office: (215) 898-8222 (Leave a note with the administrator.)
Email: Click here for an image of the address. (The address 'buja@wharton...' is obsolete.)
Curriculum vitae: [.pdf] (a more fun alternative from the 2004/5 MBA guide))
If you are interested in our Ph.D. program, please, visit our program website.
Once you decide to apply, start your application at this website.
Doctoral Program in Statistics:
source("http://stat.wharton.upenn.edu/~buja/STAT-101/src-probability.R")Here are things that can be done:
X <- make.RV(1:6, rep("1/6",6)) # Create a fair die (class: "RV") Y <- make.RV(1:6, c(0.1,0.1,0.1,0.1,0.2,0.4)) # Create a loaded die ( '' ) P(X>3); P(Y>3) # Probabilities of events E(X); E(Y) # Expected values V(X); V(Y) # Variances SD(X); SD(Y) # Standard deviations par(mfrow=c(2,1)); plot(X); plot(Y) # Plot as pin graphs S <- SofI(X,Y); par(mfrow=c(1,1)); plot(S) # Sum of two independent RVs S10 <- SofIID(X,10); plot(S10) # Sum of 10 iid copies of X (works for many more => CLT) qqnorm(S10) # Normal quantile plot for RVs to check the CLT effect X.sim <- rsim(1000, X) # Simulate from X (class: "RVsim") plot(X.sim) # Plot simulated data as pin graph probs(X); props(X.sim) # Compare probabilites and simulated proportions E(X); mean(X.sim) # Compare expected value and mean SD(X); sd(X.sim) # Compare theoretical and observed std.dev. X2 <- X^2; X2; plot(X2) # univariate analytical transformation Yexp <- exp(Y); Yexp; plot(Yexp) # '' Yfair <- Y - E(Y); Yfair # Centering a RV: creates a fair game from a loaded die Z <- (X - E(X))/SD(X); Z; plot(Z) # z-scoring/standardizing a random variable Ybern <- con(ifelse(Y>3,1,0)) # Create a Bernoulli variable; 'con()' contracts values/probsCheck the header of the source file for more explanations and examples.
When predictors are random, statisticians seem comfortable to condition on them and treat them as fixed. The underlying argument is that the predictors form an ancillary statistic. This argument is flawed, however, because it assumes the correctness of the model before even examining it. We reconstruct in our own way a piece of econometric theory to sort out the effects of model violations in the presence of random predictors.
The source code is in two files:
simple version showing nonlinearity only, and
augmented version showing nonlinearity and linearity.
by Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang and Linda Zhao.
When predictors for statistical models are selected by looking at the data, statistical inference based on these models is in danger of being invalid. We show that confidence intervals may need to be widened considerably to protect against invalidation. This is a fundamental difficulty with statistical inference that has implication all the way down to how we teach statistics in introductory courses.
This is a report (joint with Abba Krieger and Ed George) written for the Simons Foundation - Autism Research Initiative (SFARI). The work under a SFARI grant was the reason why we created an interactive tool for visualizing correlation tables for many hundreds of variables. The report draws its examples from the 'Simons Simplex Collection' (SSC), a large database of autism phenotype data.
Then follow the simple instructions on page 35 of the above report to apply the software to your own numeric data matrix.
(Journal of Marketing, Oct 2007, featured JM blog article and a finalist for JM's 2007 Harold H. Maynard Award)
Along with the paper go a few scenario calculations that are not included in the article: [.pdf]
Appeared in "Handbook of Statistics" (eds. E. Wegman, C. R. Rao; 2005). (An older version that had both papers in one should be considered out of date.)
Yi Shen's 2005 Ph.D. thesis on cost-weighted class probability estimation [pdf]
(Journal of Machine Learning Research 8 (Mar), 409-439, 2007). On a simple modification of boosting, joint with David Mease and Adi Wyner.
(Statistica Sinica 2006, Special Issue on Machine Learning, 16 (2), 323--352 (2006))
A preliminary version and a companion paper which I keep posted because others have started referring to them: The Effect of Bagging on Variance, Bias, and Mean Squared Error [.pdf] PPT slides,
Smoothing Effects of Bagging [.pdf]
Alan Gous and Andreas Buja; Journal of Computational and Graphical Statistics, 13 (1), 1-19 (2004).
(We are permitted to post the color version of this paper. The printed version is b/w with gray-scale figures.)
A. Buja and Y.-S. Lee; Proceedings of KDD 2001, 27--36.