Dr. George Tseng is Professor in the Departments of Biostatistics, Human Genetics, Computational and Systems Biology in the University of Pittsburgh Graduate School of Public Health. He received BS (1997) and MS (1999) in Mathematics from the National Taiwan University under Dr. Hung Chen, and ScD (2003) in Biostatistics from the Harvard School of Public Health under Dr. Wing Hung Wong's lab. He has joined Pitt since 2003 and leads a research group in Bioinformatics and Statistical Learning. His research interests focus on statistical modeling and applications for -omics and bioinformatic problems to improve precision medicine and human health. Dr. Tseng has published 75+ methodological/major papers and 95+ collaborative papers (as of Jan 2021), 5 patents and received multiple awards, including ASA Fellow, Statistician of the Year (ASA Pittsburgh Chapter) and Provost's Award for Excellence in Mentoring (University of Pittsburgh). Collaboration with biological and clinical labs plays an important role where most of his projects and methodological ideas come from.
Intro. of the research group:
We are a statistical group with major focus on genomics and bioinformatics. Our vision is to develop rigorous, timely and impactful methodologies to help understand disease mechanisms and improve disease diagnosis and treatment. Our projects are generally driven from close collaboration with biologists and clinicians. These collaborations play a key role to inspire our ideas and methodological development. Our long-term interests cover analyses of various high-throughput omics experimental data (e.g. micro-array based, sequencing based or mass spectrometry) for candidate marker detection, disease subtype discovery, machine learning, pathway knowledge learning and network analysis.
Summary of research interests:
- Power calculation and design of high-throughput omics experiments (2019-present): 2019 Biostatistics, 2019 JRSS-C.
Power calculation and design of high-throughput experiments have to consider genome-wide type I error (i.e. false discovery rate in multiple testing) and genome-wide statistical power. For next generation sequencing (NGS) technology, the balance of sample size and sequencing depth leads to a more complicated design issue. Our lab has received a methodological R01 to develop methods in this direction.
- Bayesian methodology for omics data (2017-present): 2017 JRSS-C, 2017 JCB, (Fang)2018 Bioinformatics, (Huo)2019 AOAS, (Li)2019 AOAS.
Generative (hierarchical) model in Bayesian framework is very natural and powerful for variable selection and integrating prior biological knowledge in omics research. Our lab has increased interests in adopting Bayesian modeling in research.
- Vertical integration of multi-level omics data (2015-present): 2015 BMC Genomics, 2016 ARSIA, 2017 Biostatistics, 2017 AOAS, (Fang)2018 Bioinformatics, (Li)2019 AOAS.
Since 2009, we have worked extensively in the field of omics data integration. These include two major directions: (1) horizontal meta-analysis that combines multiple omics data of the same type (e.g. microarray, GWAS or eQTL) to increase statistical power, accuracy and validated conclusion (2) vertical integrative analysis that combines multi-level omics data (e.g. gene expression, CNV, genotyping, methylation, somatic mutation, miRNA and clinical variables) of the same patient cohort to investigate disease subtypes, disease driver genes or related regulatory network. The ultimate goal is translational "precision medicine" to better diagnose and treat patients.
- Horizontal meta-analysis of combining multiple omics studies (2010-present): (Lu)2010 Bioinformatics, (Shen)2010 Bioinformatics, 2011 AOAS, 2012 Bioinformatics, (Kang)2012 NAR, (Tseng)2012 NAR,(Begum)2012 NAR, 2013 BMC Bioinformatics, 2014 PLoS ONE, (Song)2014 AOAS, (Tang)2014 AOAS, 2015 Molecular Neuropsychiatry, 2016 NAR, 2016 JASA, 2016 Bioinformatics, (Li)2017 Bioinformatics, (Lin)2017 Bioinformatics, 2017 JRSS-C, 2017 JCB, (Kim)2018 Bioinformatics, 2019 Statistica Sinica, 2019 Stat. Appl. Genet. Mol. Biol., 2019 Bioinformatics, (Huo)2019 AOAS.
- Pathway, gene module and network analysis (2010-present): 2010 Bioinformatics, 2014 G2B, 2014 PLoS ONE, (Li)2017 Bioinformatics.
Biomarkers detected from differential expression analysis are often with high variability and difficult to interpret. On the other hand, pathway enrichment, gene modules and gene networks have been found more stable and interpretable.
- Cluster analysis in high-dimensional data (2005-present): 2005 Biometrics, 2006 Bioinformatics, 2007 Bioinformatics, 2015 BMC Genomics, 2016 JASA, 2017 Biostatistics, 2017 AOAS
Our group has long term interests in cluster analysis of omics data. Many complex diseases were thought to be a single disease but have later been found with multiple disease subtypes that have different survival or drug response. Cluster analysis by omics data provide a first-step characterization of disease subtypes that are potential targets for precision medicine.
- Supervised machine learning (2003-present): 2003 JASA, 2009 Bioinformatics, 2010 Bioinformatics, 2014 Bioinformatics, 2016 Bioinformatics, (Li)2019 AOAS
Machine learning plays a critical role to construct a classification model that can predict future patients. We have long term interests in this translational critical task. Our focus is to avoid a black-box complicated model that is difficult to interpret. Instead, we seek interpretable and replicable methods that incorporate biological prior knowledge and with variable selection in high-dimensional omics data.
- Low-level omics data preprocessing (2001-present):2001 NAR, 2008 BMC Bioinformatics, 2010 CSDA, 2011 Bioinformatics, 2014 BMC Bioinformatics.
Every new high-throughput experimental technique nowadays requires careful data preprocessing, batch correction, normalization and missing value imputation before meaningful downstream analyses can be performed. Our group has long term interest in related issues.
- Along the line of utilizing next generation technology, we work with collaborators on various next-generation sequencing (DNA, RNA-seq, bisulfite sequencing and ChIP-seq) data analysis and apply pipelines for various omics analyses such as mutation/indel detection, fusion gene detection, isoform quantification and differential methylation analysis. We also work on CyTOF, mass spectrometry, microbiome and metabolome data.
Recent special directions:
- Congruence of mouse model to human in transcriptomic response: Two PNAS papers ("Genomic responses in mouse models greatly mimic human inflammatory diseases" and "Genomic responses in mouse models poorly mimic human inflammatory diseases") reported opposite result when analyzing congruence of transcriptmic response in human inflammatory diseases using the same datasets. The controversy mainly came from arbitrary analysis strategy and threshold that potentially can be biased by researchers. We are developing a bioinformatic tool with rigorous statistical modeling to investigate the degree and in which pathways mouse model mimics human in a given experimental setting. The problem will extend from bulk RNA-seq setting to scRNA-seq and multi-omics setting to provide a holistic solution.
- Non-parametric robust machine learning: Prediction models constructed from a training omics data set are usually difficult to translate and apply to an independent validation cohort. The failure of prediction comes from different experimental platform and protocol utilized in the replication study. We are working to develop non-parametric robust machine learning approaches for replicable and interpretable classification models suitable for clinical use.
- Outcome-guided clustering: We are working on outcome-guided clustering methods with semi-supervised flavor to characterize disease subtypes from omics data and predictive to outcome. We expect the identified disease subtypes to better serve the purpose of precision medicine.
Last updated 02/11/2019