We are a statistical group with major applications on genomics and bioinformatics. Our vision is to develop rigorous, timely and useful statistical and computational methodologies to help understand disease mechanisms and improve disease diagnosis and treatment. Our projects are usually data-driven. We collaborate closely with local and remote biologists and clinicians. These collaborations play a key role to inspire our ideas and methodological development. Our long-term interests cover analyses of various high-throughput omics experimental data (e.g. micro-array based, sequencing based or mass spectrometry) for candidate marker detection, disease subtype discovery, machine learning, pathway knowledge learning and network analysis.
Currently we have particular interests on statistical and computational methods to combine information from multiple high-throughput omics data sets. These include two major directions: (1) horizontal meta-analysis that combines multiple same-type genomic data (usually microarray, GWAS or eQTL) to increase statistical power, accuracy and validated conclusion (2) vertical integrative analysis that combines multi-omics data (e.g. gene expression, CNV, genotyping, methylation, somatic mutation, miRNA and clincal variables) of the same patient cohort to investigate disease subtypes, disease associated or driver genes and related regulatory network. The ultimate goal is translational "precision medicine" to better diagnose and treat patients.
Along the line of utilizing next generation sequencing technology, we work with collaborators on various next-generation sequencing (DNA, RNA-seq, bisulfite sequencing and ChIP-seq) data analysis and apply pipelines for various omics analyses such as mutation/indel detection, fusion gene detection, isoform quantification and differential methylation analysis.
Our current and past research interests:
- Genomic meta-analysis: Statistical meta-analysis for combining multiple transcriptomic studies
Our group has worked on the genomic meta-analysis field since 2006. The methods below are developed (or under development) for combining multiple microarray studies. The methods and concepts can be extended to combine different types of genomic data.
- MetaQC: quantitative quality assessment for inclusion/exclusion criteria for microarray meta-analysis. ()
- MetaDE: meta-analysis for detecting differentially expressed genes.
- AW: Adpatively weighted meta-analysis method (Li and Tseng, Annals of Applied Staistis 2010)
- MCC: Multi-class correlation (Lu et al., Bioinformatics 2010)
- rOP: rth order statistics for meta-analysis (Song and Tseng)
- RIM_minP: random intercept model with minP varialbe selection for incorporating potential confounding covariates and paired design (Wang et al.)
- MetaPath: meta-analysis for pathway (gene set) analysis (Shen and Tseng, Bioinformatics 2010)
- MetaDimR: meta-analysis for dimension reduction (under development)
- MetaPCA: meta-analysis for dimension reduction by principal component analysis (Kang and Tseng)
- MetaMDS: meta-analysis for dimension reduction by multi-dimensional sclaing
- MetaFDA: meta-analysis for Fisher discriminant analysis
- MetaClust: meta-analysis for gene clustering (under development)
- MetaPredict: meta-analysis for inter-study prediction analysis
- rGN: ratio-adjusted gene-wise normalization (Cheng et al., Bioinformatics 2009)
- MBP: module-based predication analysis (Mi et al., Bioinformatics 2010)
- MetaNetwork: meta-analysis for gene regulatory network (under development)
- Clustering and classification in genomic data:
- Unsupervised machine learning (clustering):
- Tight clustering: systematically extract stable and tight patterns in large complex data through resampling approach. (Tseng and Wong, Biometrics 2005; Bioinformatics 2006)
- Penalized and weighted K-means: a class of loss function extended from K-means that allows a noise set not being clustered and incorporation of prior knowledge. (Tseng, Bioinformatics 2007)
- Supervised machine learning (classification):
- Psi learning: utilize a modified penalty term in SVM to achieve a theoretically optimal error rate. (joint work with Xiaotong Shen and Wing Wong) (JASA 2003)
- Statistical issues in microarray and other omics data:
- Quality filtering, normalization and Bayesian hierarchical model. (Tseng et al., Nucleic Acids Research 2001)
- Missing value imputation in transcriptome and phenome (Brock et al., BMC Bioinformatics 2009; Oh et al., Bioinformatics 2011; Liao et al.)
- Data mining and graphical visualization for genomic and proteomic data
- Quantile map: a visualization tool for simutaneous presentation of many probability distributions (Tseng, Computational Statistics and Data Analysis 2010).
- Etienne Sibille (Department of Psychiatry, Pitt): aging and depression
- David Lewis (Department of Psychiatry, Pitt): schizophrenia
- Naftali Kaminski (Yale University): genomic approaches for lung diseases, IPF, COPD
- Frank Sciurba (Medicine, Pitt): phenomic analysis for lung diseases
Computational biology and genomics:
- Takis Benos (Computational and Systems Biology, Pitt)
- Xinghua Lu (Biomedical Informatics, Pitt)
Ovarian and breast cancer
- Adrian Lee (Magee, Women’s Cancer Research Center, UPMC)
- Steffi Oesterreich (Magee, Women’s Cancer Research Center, UPMC)
Prostate and liver cancers
- Jianhua Luo, Yanping Yu (Department of Pathology, Pitt): prostate cancer
- George Michalopoulos (Department of Pathology, Pitt): liver cancer