Home
Publications
Software
Presentatioins
People
Group meetings
Prospective Students
Genomics Curriculum
Computing Resources

Bio
    Dr. George Tseng is Professor and Vice Chair for Research in the Departments of Biostatistics, School of Public Health, University of Pittsburgh. He also has secondary appointments in Human Genetics, and Computational and Systems Biology. He received BS (1997) and MS (1999) in Mathematics from the National Taiwan University under Dr. Hung Chen, and ScD (2003) in Biostatistics from the Harvard School of Public Health under Dr. Wing Hung Wong's lab. He joined Pitt since 2003 and leads a research group in Bioinformatics and Statistical Learning. His research interests focus on statistical modeling and applications for -omics and bioinformatic problems to improve precision medicine and human health. His research group has published 90+ methodological/major papers and 115+ collaborative papers (as of Mar 2023), in addition to co-invention of 5 patents. He has received multiple awards, including ASA Fellow, Statistician of the Year (ASA Pittsburgh Chapter), and Provost's Award for Excellence in PhD Mentoring (University of Pittsburgh). Collaboration with biological and clinical labs plays an important role where most of his projects and methodological ideas come from.

    Dr. Tseng has actively served in the statistical community, including President of ASA Pittsburgh Chapter in 2014-2017 (President-Elect, President and Past-President), Chair of ASA Section on Statistics in Genomics and Genetics (SSGG) in 2023-2025 (Chair-Elect, Chair and Past-Chair), and Board of Directors of International Chinese Statistical Association (ICSA) in 2024-2026.

    Dr. Tseng's lab has been funded by NIH PI/MPI grants (R01LM014142 2023-2026, R01MH111601 2018-2024, R01MH118311 2019-2024, R21HD102565 2020-2023, R21LM012752 2018-2020, R01CA190766 2015-2020, R21MH094862, 2012-2015) as well as serving as a co-investigator in many collaborative grants.

Intro. of the research group:
    We are a statistical group with major focus on genomics and bioinformatics. Our vision is to develop rigorous, timely and impactful methodologies to help understand disease mechanisms and improve disease diagnosis and treatment. Our projects are generally driven from close collaboration with biologists and clinicians. These collaborations play a key role to inspire our ideas and methodological development. Our long-term interests cover analyses of various high-throughput omics experimental data (e.g. micro-array based, sequencing based or mass spectrometry) for candidate marker detection, disease subtype discovery, machine learning, pathway knowledge learning and network analysis.

    Summary of research interests:
  • Vertical integration of multi-level omics data (2015-present): 2015 BMC Genomics, 2016 ARSIA, 2017 Biostatistics, 2017 AOAS, (Fang)2018 Bioinformatics, (Li)2019 AOAS, 2020 Statistics in Biosciences.

    Since 2009, we have worked extensively in the field of omics data integration. These include two major directions: (1) horizontal meta-analysis that combines multiple omics data of the same type (e.g. microarray, GWAS or eQTL) to increase statistical power, accuracy and validated conclusion (2) vertical integrative analysis that combines multi-level omics data (e.g. gene expression, CNV, genotyping, methylation, somatic mutation, miRNA and clinical variables) of the same patient cohort to investigate disease subtypes, disease driver genes or related regulatory network. The ultimate goal is translational "precision medicine" to better diagnose and treat patients.

  • Horizontal meta-analysis of combining multiple omics studies (2010-present): (Lu)2010 Bioinformatics, (Shen)2010 Bioinformatics, 2012 Bioinformatics, (Kang)2012 NAR, (Tseng)2012 NAR,(Begum)2012 NAR, 2013 BMC Bioinformatics, 2014 PLoS ONE, 2015 Molecular Neuropsychiatry, 2016 NAR, 2016 JASA, 2016 Bioinformatics, (Li)2017 Bioinformatics, (Lin)2017 Bioinformatics, 2017 JRSS-C, 2017 JCB, (Kim)2018 Bioinformatics, 2019 Statistica Sinica, 2019 Stat. Appl. Genet. Mol. Biol., 2019 Bioinformatics, (Huo)2019 AOAS, 2020 Bioinformatics, 2020 Statistics Sinica.

  • p-value combination problems (2011-present): 2012 AOAS, (Song)2014 AOAS, (Tang)2014 AOAS, 2022 Statistics Sinica, 2023 Statistical Sinica.

  • Cluster analysis in high-dimensional data (2005-present): 2005 Biometrics, 2006 Bioinformatics, 2007 Bioinformatics, 2015 BMC Genomics, 2016 JASA, 2017 Biostatistics, 2017 AOAS, 2022 Biostatistics, 2022 Biometrics.

    Our group has long term interests in cluster analysis of omics data. Many complex diseases were thought to be a single disease but have later been found with multiple disease subtypes that have different survival or drug response. Cluster analysis by omics data provide a first-step characterization of disease subtypes that are potential targets for precision medicine.

  • Supervised machine learning (2003-present): 2003 JASA, 2009 Bioinformatics, 2010 Bioinformatics, 2014 Bioinformatics, 2016 Bioinformatics, (Li)2019 AOAS, 2022 AOAS,

    Machine learning plays a critical role to construct a classification model that can predict future patients. We have long term interests in this translational critical task. Our focus is to avoid a black-box complicated model that is difficult to interpret. Instead, we seek interpretable and replicable methods that incorporate biological prior knowledge and with variable selection in high-dimensional omics data.

  • Power calculation and design of high-throughput omics experiments (2019-present): 2019 JRSS-C, 2021 Biostatistics.

    Power calculation and design of high-throughput experiments have to consider genome-wide type I error (i.e. false discovery rate in multiple testing) and genome-wide statistical power. For next generation sequencing (NGS) technology, the balance of sample size and sequencing depth leads to a more complicated design issue. Our lab received a methodological R01 (R01CA190766) to develop methods in this direction.

  • Bayesian methodology for omics data (2017-present): 2017 JRSS-C, 2017 JCB, (Fang)2018 Bioinformatics, (Huo)2019 AOAS, (Li)2019 AOAS, 2021 AOAS.

    Generative (hierarchical) model in Bayesian framework is very natural and powerful for variable selection and integrating prior biological knowledge in omics research. Our lab has increased interests in adopting Bayesian modeling in research.

  • Pathway, gene module and network analysis (2010-present): 2010 Bioinformatics, 2014 G2B, 2014 PLoS ONE, (Li)2017 Bioinformatics.

    Biomarkers detected from differential expression analysis are often with high variability and difficult to interpret. On the other hand, pathway enrichment, gene modules and gene networks have been found more stable and interpretable.

  • Low-level omics data preprocessing (2001-present):2001 NAR, 2008 BMC Bioinformatics, 2010 CSDA, 2011 Bioinformatics, 2014 BMC Bioinformatics.

    Every new high-throughput experimental technique nowadays requires careful data preprocessing, batch correction, normalization and missing value imputation before meaningful downstream analyses can be performed. Our group has long term interest in related issues.

  • Along the line of utilizing next generation technology, we work with collaborators on various next-generation sequencing (DNA, RNA-seq, bisulfite sequencing and ChIP-seq) data analysis and apply pipelines for various omics analyses such as mutation/indel detection, fusion gene detection, isoform quantification and differential methylation analysis. We also work on CyTOF, mass spectrometry, microbiome and metabolome data.


  • Recent evolving directions:
  • Congruence of animal model or cancer model to human using omics data (2023 PNAS): Two PNAS papers ("Genomic responses in mouse models greatly mimic human inflammatory diseases" and "Genomic responses in mouse models poorly mimic human inflammatory diseases") reported opposite result when analyzing congruence of transcriptmic response in human inflammatory diseases using the same datasets. The controversy mainly came from arbitrary analysis strategy and threshold that potentially can be biased by researchers. We have developed a bioinformatic tool with rigorous statistical modeling to investigate the degree and in which pathways mouse model mimics human in a given experimental setting (2023 PNAS). We are extending the solution from bulk RNA-seq setting to scRNA-seq and multi-omics setting to provide a holistic solution. We are also investigating congruence analysis of cancer models (cell lines and tumoroids) with temporal samples.
  • Outcome-guided clustering: We are working on outcome-guided clustering methods with semi-supervised flavor to characterize disease subtypes from omics data and predictive to outcome. We expect the identified disease subtypes to better serve the purpose of precision medicine (an R01 is funded and multiple publications in preparation).
  • High-dimensional causal mediation in imaging or omics applications: Many methods have been developed for causal mediation in high-dimension mediator setting. The methods either have weak statistical power or violate causal assumptions. Our lab has increasing interests in causal inference in complex and high-dimensional data in omics or population setting.


  • Last updated 04/05/2023