This page has stopped updating after 2018. For all our methodological papers, R packages as well as data/code are available (mostly in GitHub) in the Publication page for other researchers to reproduce our results with minimal effort.
- MetaOmics: a suite of packages for microarray and genomic meta-analysis. R packages of the following subpackages are available here.
- MetaQC (2012): quality control to determin inclusion/exclusion of studies in meta-analysis
- MetaDE (2010-current): detect DE (candidate marker) genes in meta-analysis
- MetaPath (2010): detect associated pathways in meta-analysis
- MetaSparseKmeans: meta-analytic framework to identify novel disease subtypes
- MetaPCA: dimension reductioin by PCA, sparse PCA and robust PCA in meta-analysis
- MetaKTSP: a meta-analyltic framework of top scoring pair (TSP) algorithm for classification algorithm.
- MetaDCN: meta-analysis to detect differential co-expression network modules.
- FBM (2018): an R package for full Bayesian omics integrative model (iBAG) with missingness.
- MOG (2018): an R package for Bayesian indicator model with multi-level overlapping group structure.
- IS-Kmeans (2017): an R package to combine multi-level omics data for sparse K-means clustering with overlapping group structured regularization.
- BayesMetaSeq (2017): an R package to combine multiple RNA-seq studies by Bayesian hierarchical model for detecting differentially expressed genes.
- CBM (cross-platform Bayesian meta-analysis) (2017): an R package to combine multiple RNA-seq and microarray studies by Bayesian hierarchical model for detecting differentially expressed genes.
- BayesMP (2018): an R package to combine multiple transcriptomic studies by Bayesian modeling on p-values.
- GSTiCluster (2015): an R package to integrate vertical multi-omics data for disease subtype discovery using overlapping group lasso and tight clustering in the iCluster latent variable model.
- iPF (2015): an Integrative Phenotyping Framework (iPF) that integrated clinical data, mRNA and miRNA and identified an intermediate disease subtype between IPF and COPD.
- FusionMetaCaller (2015): an R package to combine results of multiple top-performing fusion transcript detection algorithms from paired-end RNA-seq data.
- MLbias (2014): an R package to correct machine learning bias when many classifiers are compared in the model selection.
- phenomeImpute (2014): an R package to impute high-dimensional phenome data.
- Inter-study prediction in mocroarray studies
- Gene clustering methods in microarray or high-dimensional data analysis
- QuantileMap : a visualization tool to compactlydemonstrate multiple (hundreds) distributions in genomic applications.
- a set of R functions for cDNA array analysis:
- MOG: This is an R package for Bayesian indicator model with multi-level overlapping group structure.
Li Zhu,y , Zhiguang Huoz , Tianzhou Ma,y , and George Tseng. (2018) Bayesian indicator variable selection to incorporate multi-layer overlapping group structure in multi-omits applications.
- FBM: This is an R package for full Bayesian omics integrative model (iBAG) with missingness.
Bayesian integrative model for multi-omics data with missingness.
- IS-Kmeans: This is an R package to combine multi-level omics data for sparse K-means clustering with overlapping group structured regularization.
Zhiguang Huo and George C. Tseng*. (2017) Integrative sparse K-means for disease subtype discovery using multi-level omics data. Annals of Applied Statistics. accepted.
- BayesMetaSeq: This is an R package to combine multiple RNA-seq studies by Bayesian hierarchical model for detecting differentially expressed genes. (manual)
Tianzhou Ma, Faming Liang and George C. Tseng. (2017) Biomarker detection and categorization in RNA-seq meta-analysis using Bayesian hierarchical model. JRSS-C. Accepted.
- CBM (cross-platform Bayesian meta-analysis): This is an R package to combine multiple RNA-seq and microarray studies by Bayesian hierarchical model for detecting differentially expressed genes. The model extends from BayesMetaSeq to accommodate continuous measurements in microarray and count data in RNA-seq and incorporate normalization of fold changes between the platforms (manual)
Tianzhou Ma, Faming Liang, Steffi Oesterreich and George C. Tseng. (2017) A joint Bayesian modeling for integrating microarray and
RNA-seq transcriptomic data. Journal of Computational Biology
- BayesMP: This is an R package to combine multiple transcriptomic studies by Bayesian modeling on p-values. Compared to the unified model in BayesMetaSeq and CBM, this method is a two-stage approach to model directly on p-values. As a result, the method loses some statistical power but is more generalizable/robust to different experimental platforms and experimental designs.
Zhiguang Huo, Chi Song* and George C. Tseng*. (2016) Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals. AOAS.
- FusionMetaCaller: This is an R package to combine multiple top-performing methods for detecting fusion transcripts in paired-end RNA-seq data. (manual)
Silvia Liu, Wei-Hsiang Tsai, Ying Ding, Rui Chen, Zhou Fang, SungHwan Kim, Tianzhou
Ma, Ting-Yu Chang, Nolan Michael Priedigkeit, Adrian Lee, Jianhua Luo, Hsei-Wei Wang,
I-Fang Chung and George C. Tseng. (2016) Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. NAR
- GSTiCluster: This is an R package to integrate vertical multi-omics data for disease subtype discovery using overlapping group lasso and tight clustering in the iCluster latent variable model. (manual)
Sunghwan Kim, Steffi Oesttereich, Yongseok Park* and George C. TSeng* (2015) Integrative multi-omics clustering for disease subtype discovery by sparse overlapping group lasso and tight clustering. Biostatistics. Under review.
- iPF: This is an R package to integrate phenome, mRNA and miRNA data. We have shown that it identifies a novel intermediate disease subtype between IPF and COPD. (manual)
SungHwan Kim, Jose D. Herazo-Maya, Dongwan D. Kang, Brenda M. Juan-Guardela, John Tedrow, Fernando J. Martinez, Frank C. Sciurba, George C. Tseng* and Naftali Kaminski*. (2015) Integrative Phenotyping Framework (iPF): Integrative Clustering of Multiple Omics Data Identifies Novel Lung Disease Subphenotypes. BMC Genomics.
- MLbias: This is an R package to correct for machine learning bias when many classifiers are compared and the best is selected. It included two old methods (nested cross validation and Tibshirani's procedure) and a new inverse power law (IPL) method.
Ying Ding, Shaowu Tang, Serena G. Liao, Jia Jia, Yan Lin, George C. Tseng*. (2014) Bias correction for selecting the minimal-error classifier from many machine learning models. Bioinformatics. 30(22):3152-8.
- phenomeImpute: This is an R package to impute missing values in large-scale high-dimensional phenome data. It includes several variations of KNN, random forest and MICE methods.
Serena G. Liao&, Yan Lin&, Dongwan Kang, Naftali Kaminski, Frank C. Sciurba, George C. Tseng. (2014) Missing value imputation in high-dimensional phenomic data: Imputable or not? And how?. BMC Bioinformatics. 15:346.
- Inter-study prediction in microarray studies:
- Ratio-adjusted gene-wise normalization (rGN): Download
Chunrong Cheng, Kui Shen, Chi Song, Jianhua Luo and George C Tseng. (2009) Ratio Adjustment and Calibration Scheme for Gene-wise Normalization to Enhance Microarray Inter-study Prediction. Bioinformatics. 25:1655-1661.
- Module-based preidction approach (MBP):
Zhibao Mi, Kui Shen, Nan Song, Chunrong Cheng, Chi Song and George C Tseng. (2010) Unsupervised module-based prediction approach for robust inter-study prediction in microarray data. Bioinformatics. 26: 2586-2593.
- Gene clustering in microarray data: Our group has developed two complementary gene clustering methods for microarray data (or for clustering high-dimensional complex data in general). Both methods directly identify small and tight clusters in the data and allow a set of scattered genes without being clustered. Tight clustering utilizes resampling techniques to obtain consistent tight clusters in repeated subsampling evaluations. Penalized and weighted K-means extends the target function of K-means. It has faster computation than tight clustering and can allow incorporation of prior biological information.
- Tight clustering: Download (ANSI C source code and R package), CRAN download
George C. Tseng and Wing H. Wong. (2005) Tight Clustering: A Resampling-based Approach for Identifying Stable and Tight Patterns in Data. Biometrics.61:10-16.
- PWKmeans: Download (C source code and PPAM package)
George C. Tseng. (2007). Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics. 23:2247-2255.
- Quantile maps: This is a visualization tool to compactly and unbiasedly demonstrate multiple (hundreds) distributions.
George C. Tseng. (2009) Quantile map: Simultaneous visualization of patterns in many distributions with application to tandem mass spectrometry. Computational Statistics and Data Analysis. in press.
- R functions for cDNA array analysis: a set of R functions for filtering, normalization, Bayesian hierarchical modelling and MCMC procedures in cDNA microarray analysis.
The method is developed to assess gene expression level with replicates in cDNA microarray data. A Bayesian hierarchical model is established to model gene-specific replicate variations with prior information from calibration experiments. A version of empirical Bayes procedure is used. MCMC simulation is then used to generate the posterior distribution.
George C. Tseng, Min-Kyu Oh, Lars Rohlin, James C. Liao, and Wing Hung Wong. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research. 29: 2549-2557.