Qingyuan Zhao 赵卿元

Table of Contents



About me
I am currently a postdoctoral fellow in the Statistics Department of the Wharton School, University of Pennsylvania (mentored by Dylan Small and Sean Hennessy).

I received my Ph.D. in Statistics from Stanford University (advised by Trevor Hastie) in 2016 and my B.S. in Mathematics from the Special Class for the Gifted Young (SCGY), University of Science and Technology of China (USTC) in 2011. I worked at eBay (2013) and Google (2014, 2015) as summer intern.

Click the link for my curriculumn vitae and Google Scholar profile.

400 Huntsman Hall, 3730 Walnut St, Philadelphia, PA 19104.


<2018-11-30 Fri> Slides for Larry Brown Memorial Workshop   Academic

Click here for the slides.

I had the fortune to have a few conversations with Larry during the first year of my postdoc. Every time he struck me as a brilliant statistician but a humble person. At one time I presented our work on selective inference for effect modification in Larry's Friday workshop and received many incisive comments from him. I wish I could have talked to him more often.

<2018-10-19 Fri> Homepage upgrade   Life

I switched the HTML style from Twitter Bootstrap to the awesome Bigblow theme. This entire homepage is written in org-mode and generated by a single command in Emacs!

<2018-10-16 Tue> –<2018-10-20 Sat> ASHG 2018   Academic Life

This is my first time at the annual meeting of American Society of Human Genetics. I presented a poster (#3196T, Reviewer's Choice abstract) about our recent work on Mendelian randomization.

Check out this video on Instagram taken when I was enjoying riding the electric scooter at the San Diego waterfront.

<2018-09-24 Mon> New commentary on the causal inference data competition   Academic

<2018-09-12 Fri> New report on selective inference for effect modification   Academic

<2018-08-07 Tue> New article on performance evaluation of mutual funds   Academic

<2018-08-01 Wed> Trip to Vancouver and the Canadian Rockies   Life

Check out this post on Instagram for some highlights of the trip.

<2018-06-19 Tue> –<2018-06-21 Thu> EcoSta 2018   Academic

I presented our new results on estimating the skill of mutual fund managers in an invited session. [slides]

<2018-05-21 Mon> –<2018-05-23 Wed> ACIC 2018   Academic

I organized and spoke in a session on "New advancements in sensitivity analysis of observational studies" about our new percentile bootstrap approach to sensitivity analysis. [slides]

I went to a special workshop on treatment effect heterogeneity and reported the analysis results using selective inference on a dataset provided by the workshop organizers. [slides]

<2018-04-24 Tue> Talks on Mendelian randomization   Academic

I visited University of Minnesota (Stats Department), Johns Hopkins (Biostats Department), UC Berkeley (Biostatistics division) and Stanford (Stats Department) to give seminars about our new work on Mendelian randomization. [slides]


Research interest: I am broadly interested in causal inference, high dimensional statistics and applied statistics.

You can click the triangle to expand each paper for more information, or click "[Expand all]" on the top right of this page to expand all entries.


Will competition-winning methods for causal inference also succeed in practice? To appear in Statistical Science (invited commentary) [paper]   Causal_Inference

  • Authors: Qingyuan Zhao, Luke Keele, Dylan Small.
  • Summary: This is an invited commentary for Statistical Science on the causal inference data competition in ACIC 2016.

Selective inference for effect modification: An empirical investigation. [paper]   Causal_Inference Effect_Modification

  • Authors: Qingyuan Zhao, Snigdha Panigrahi.
  • Summary: In a special workshop in ACIC 2018, we were invited to analyze a simulated dataset to detect treatment effect heterogeneity. This article reports our results presented in the workshop. We also tried out more recent selective inference methods based on the selective sampler.

Performance evaluation with latent factors. [paper] [SSRN]   

  • Authors: Yang Song, Qingyuan Zhao.
  • Summary: We use Confounder Adjusted Testing and Estimating (CATE) proposed in our previous paper to estimate the abnormal return (aka "alpha") of U.S. equity mutual funds. When funds are ranked by the difference between CATE alpha and CAPM alpha, the top decile outperforms the bottom decile by 500 bps per year. We also find evidence that mutual fund flows become less responsive to FFC factors.
  • Slides at EcoSta 2018.

Falsification tests for instrumental variable designs with an application to tendency to operate. To appear in Medical Care   Causal_Inference Instrumental_Variable

  • Authors: Luke Keele, Qingyuan Zhao, Rachel Kelz, Dylan Small.
  • Summary: We propose a falsification test for the IV assumptions using sub-populations of the data with overwhelming proportion of treated or untreated units. If the IV assumptions hold, we should find the intention-to-treat effect is zero within these sub-populations. We demonstrate this test using an IV known as tendency to operate (TTO) from health services research.

Powerful genome-wide design and robust statistical inference in two-sample summary-data Mendelian randomization. [paper] [arXiv]   Causal_Inference Epidemiology Instrumental_Variable Mendelian_Randomization

  • Authors: Qingyuan Zhao, Yang Chen, Jingshu Wang, Dylan Small.
  • Summary: We extend the MR-RAPS method in our previous paper using the empirical partially Bayes framework described by Lindsay, allowing a true genome-wide design for Mendelian randomization.
  • Slides (more accessible version); Handout.

Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption. To appear in International Journal of Epidemiology. [bioRxiv]   Causal_Inference Epidemiology Instrumental_Variable Mendelian_Randomization

  • Authors: Jack Bowden, Fabiola Del Greco M, Cosetta Minelli, Debbie Lawlor, Qingyuan Zhao, Nuala Sheehan, John Thompson, George Davey Smith.
  • Summary: This paper proposes a modified Cochran's \(Q\) statistic to detect horizontal pleiotropy in Mendelian randomization. This extension is quite important when there are many weak genetic instruments.

Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. [paper] [arXiv]   Causal_Inference Epidemiology Instrumental_Variable Mendelian_Randomization

  • Authors: Qingyuan Zhao, Jingshu Wang, Gibran Hemani, Jack Bowden, Dylan Small.
  • Summary: We give a comprehensive theoretical basis for two-sample summary-data Mendelian randomization. We find that horizontal pleiotropy is pervasive in MR studies. We propose a new method—robust adjusted profile score— that can consistently estimate the causal effect under pervasive balanced pleiotropy and is robust to occasional outliers.
  • Software: R package mr.raps is available CRAN. It can be directly called from the TwoSampleMR platform, see this documentation.
  • Slides at UMN; Slides at JHU.


Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. [paper] [arXiv]   Causal_Inference Sensitivity_Analysis

  • Authors: Qingyuan Zhao, Dylan Small, Bhaswar Bhattacharya.
  • Summary: Rosenbaum’s sensitivity analysis framework has several limitations: 1. It is mostly applicable to matched observational studies; 2. It only tests the sharp null hypothesis; 3. It assumes treatment effect homogeneity to obtain a confidence interval of the causal effect. Seeking to overcome these limitations, we propose a new approach to sensitivity analysis based on the inverse probability weighting estimator. The key ideas are to use numerical optimization to estimate the causal effect bound and to use the percentile bootstrap to quantify the sampling uncertainty.
  • Slides at ACIC '18.

Graphical diagnosis of confounding bias in instrumental variables analysis. In Epidemiology, 2018. [link]   Causal_Inference Instrumental_Variable

  • Authors: Qingyuan Zhao, Dylan Small.
  • Summary: This research letter proposes a new diagnostic plot for IV analysis, so large bias ratios (compared to OLS estimator) are not over-interpreted when the covariate is unrelated to the outcome.
  • Software: R functions iv.diagnosis and iv.diagnosis.plot in the package ivmodel on CRAN.

Two-sample instrumental variable analyses using heterogeneous samples. [paper] [arXiv]   Causal_Inference Instrumental_Variable Mendelian_Randomization

  • Authors: Qingyuan Zhao, Jingshu Wang, Wes Spiller, Jack Bowden, Dylan Small.
  • Summary: Many modern IV studies (especially MR) are carried out with the two-sample design, where the samples may come from different populations. We derive a new class of linear IV estimates that are robust to sample heterogeneity. We then attempt to relax the linearity assumption and find that the two-sample design generally requires more untestable assumptions.
  • Slides at MRC Integrative Epidemiology Unit, University of Bristol.

Selective inference for effect modification via the lasso. [paper] [arXiv]   Causal_Inference Effect_Modification Selective_Inference

  • Authors: Qingyuan Zhao, Dylan Small, Ashkan Ertefaie.
  • Summary: We approach the heterogeneous treatment effect problem in a different way. Instead of trying to obtain the optimal treatment regime, we seek an interpretable model for effect modification using the recently developed selective inference framework.
  • Slides at ACIC '17. Slides at ICSA '18.

Multiple testing when many \(p\)-values are uniformly conservative, with application to testing qualitative interaction in educational interventions. To appear in Journal of American Statistical Association. [link] [paper] [arXiv]   Effect_Modification Selective_Inference Multiple_Testing

  • Authors: Qingyuan Zhao, Dylan Small, Weijie Su.
  • Summary: Qualitative interaction is an extreme form of treatment effect heterogeneity where the treatment can be beneficial for some but harmful for others. We formulated this question as a global testing problem with many conservative null \(p\)-values and proposed a simple technique—conditioning—to greatly improve the statistical power.

Cross-screening in observational studies that test many hypotheses. In Journal of American Statistical Association, 2018. [link] [paper] [arXiv]   Causal_Inference Sensitivity_Analysis Multiple_Testing

  • Authors: Qingyuan Zhao, Dylan Small, Paul Rosenbaum.
  • Summary: This paper proposes a new method called "cross-screening" to increase the power of sensitivity analysis when multiple causal hypotheses need to be tested simultaneously.
  • Software: R package CrossScreening and package vignette.

On sensitivity value of pair-matched observational studies. To appear in Journal of American Statistical Association. [link] [paper] [arXiv]   Causal_Inference Sensitivity_Analysis

  • Authors: Qingyuan Zhao.
  • Summary: A crucial quantity in Rosenbaum’s sensitivity analysis is the "sensitivity value", the amount of unmeasured confounding needed to alter the qualitative conclusions of an observational study. This paper looks into the properties of "sensitivity value" and characterizes its asymptotic behaviors.
  • Slides at JSM '17.

Estimation and prediction in sparse and unbalanced tables. [paper] [arXiv]   Computation

  • Authors: Qingyuan Zhao, Trevor Hastie, Daryl Pregibon.
  • Summary: When there is a multi-way table where each dimension has large number of levels, it is computationally intensive to fit even the standard mixed effects models. We propose a novel hierarchical ANOVA representation for such data. Modeling back-fitting requires repeated calculations of sub-table means, which can be efficiently computed when observations are sparse.


Causal interpretations of black-box models. [paper]   Causal_Inference Machine_Learning

  • Authors: Qingyuan Zhao, Trevor Hastie.
  • Summary: This is an invited discussion paper for Journal of Business & Economic Statistics. We link Friedman's partial dependence plot with Pearl's backdoor adjustment formula. We discuss situations when possible causal interpretations can be made for black-box machine learning models.
  • Slides at JSM '16 (JBES invited session).

Comment on "Causal inference using invariant prediction". In Journal of the Royal Statistical Society (Series B), 2016. [link] [paper]   Causal_Inference

  • Authors: Qingyuan Zhao*, Charles Zheng*, Trevor Hastie and Robert Tibshirani.
  • Summary: This is a contributed discussion on the article "Causal inference using invariant prediction" by Peters et al.

Permutation \(p\)-value approximation via generalized Stolarsky invariance. To appear in Annals of Statistics [paper] [arXiv]   Genomics

  • Authors: Hera He, Kinjal Basu, Qingyuan Zhao, Art Owen.
  • Summary: This paper uses a generalized Stolarsky's invariance principle to approximate the permutation \(p\)-value for two-sample linear test statistics. Along the way we discovered a simple probabilistic proof of Stolarsky's invariance principle.

Covariate balancing propensity score by tailored loss functions. To appear in Annals of Statistics. [paper] [arXiv]   Causal_Inference

  • Authors: Qingyuan Zhao.
  • Summary: This paper extends the dual interpretation of entropy balancing to general situations and proposes a tailored loss function. Minimizing this loss function by machine learning algorithms generates approximate covariate balance in large function classes.


Confounder adjustment in multiple hypothesis testing. In Annals of Statistics, 2017. [link] [paper] [arXiv]   Causal_Inference Multiple_Testing Genomics

  • Authors: Jingshu Wang*, Qingyuan Zhao*, Trevor Hastie, Art Owen.
  • Summary: Confounding introduces hidden bias to the statistical inference. We show in modern simultaneous testing, it is possible to correct for unmeasured confounders. Previous methods including SVA, LEAPP, RUV are unified in the same framework in this paper. Interestingly, confounder adjustment is as efficient as the oracle linear regression when latent variables are strong.
  • Software: R package cate and vignette.
  • Slides.

Entropy balancing is doubly robust. In Journal of Causal Inference, 2017. [link] [paper] [arXiv]   Causal_Inference

  • Authors: Qingyuan Zhao, Daniel Percival.
  • Summary: We show a recently proposed method called Entropy Balancing is doubly robust, that is the causal effect estimator is consistent if the propensity score is logistic and/or the outcome regression model is linear in the covariates.
  • Slides at JSM 2015.

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity. In Proceedings of ACM SIGKDD, 2015. [link] [paper]    Computation

  • Authors: Qingyuan Zhao, Murat Erdogdu, Hera He, Anand Rajaraman, Jure Leskovec.
  • Summary: We study a simple hence extremely noisy form of information cascade—tweet. We use a doubly stochastic self-exciting point process to model the retweet process. The SEISMIC model we develop only requires the timestamps and the graph degrees to make more accurate predictions than the state-of-the-art.
  • Software: R package seismic. More information can be found at this webpage at SNAP.
  • Slides at KDD 2015 (Video on YouTubte).


As a guest lecturer

  • Randomization test in Wharton STATS 341. [Lecture notes]
  • A tutorial on instrumental variables and Mendelian randomization in Johns Hopkins. [Slides]

As a teaching assistant

At Stanford University

Quarter Course Title Instructor
Spring 2016 STATS 371 Bayesian Statistics II Persi Diaconis, Wing Wong, Chiara Sabatti
Winter 2016 STATS 300B Theory of Statistics II David Siegmund
Fall 2015 STATS 300A Theory of Statistics I Lester Mackey
Winter 2015 STATS 315A Modern Applied Statistics: Learning Trevor Hastie
Fall 2014 STATS 141 Biostatistics Rajarshi Mukherjee
Winter 2014 STATS 290 Paradigms for Computing with Data Balasubramanian Narasimhan
Fall 2013 STATS 305 Introduction to Statistical Modeling Art Owen
Winter 2013 STATS 300B Theory of Statistics II David Siegmund
Fall 2012 STATS 300A Theory of Statistics I Joseph Romano
Summer 2012 STATS 206 Applied Multivariate Statistics Sadri Khalessi
Spring 2012 STATS 60 Introduction to Statistical Methods: Precalculus Mike Baiocchi, Galan Reeves
Winter 2012 STATS 200 Introduction to Statistical Inference Guenther Walther


Software Link Description
bootsens On GitHub Bootstrapping sensitivity analysis
mr.raps On CRAN; Developer version on GitHub Mendelian randomization via robust adjusted profile score
CrossScreening On CRAN (package vignette) Multiple testing in pair-matched observational studies
cate On CRAN (package vignette) High-dimensional factor analysis and confounder adjusted testing and estimation
seismic On CRAN; More information on SNAP Self-exciting process model for information cascade prediction

Author: Qingyuan Zhao

Email: qyzhao@wharton.upenn.edu

Created: 2018-12-05 Wed 10:09

Emacs 25.3.1 (Org mode 8.2.10)