Home
Publications
Software
Presentations
People
Group meetings
Prospective Students
Genomics Curriculum
Computing Resources

Software
  • MetaOmics: a suite of packages for microarray and genomic meta-analysis
    • MetaQC (2012): quality control to determin inclusion/exclusion of studies in meta-analysis
    • MetaDE (2010-current): detect DE (candidate marker) genes in meta-analysis
    • MetaPath (2010): detect associated pathways in meta-analysis
    • MetaSparseKmeans: meta-analytic framework to identify novel disease subtypes (R package, manual)
    • MetaPCA: dimension reductioin by PCA, sparse PCA and robust PCA in meta-analysis (R package)
    • MetaKTSP: a meta-analyltic framework of top scoring pair (TSP) algorithm for classification algorithm. (R package, manual)
    • MetaDCN: meta-analysis to detect differential co-expression network modules (R package, MetaDCNExplorer (Cytoscape plug-in visualization tool), MetaDCNExplorer manual)
  • FBM (2018): an R package for full Bayesian omics integrative model (iBAG) with missingness.
  • MOG (2018): an R package for Bayesian indicator model with multi-level overlapping group structure.
  • IS-Kmeans (2017): an R package to combine multi-level omics data for sparse K-means clustering with overlapping group structured regularization.
  • BayesMetaSeq (2017): an R package to combine multiple RNA-seq studies by Bayesian hierarchical model for detecting differentially expressed genes.
  • CBM (cross-platform Bayesian meta-analysis) (2017): an R package to combine multiple RNA-seq and microarray studies by Bayesian hierarchical model for detecting differentially expressed genes.
  • BayesMP (2018): an R package to combine multiple transcriptomic studies by Bayesian modeling on p-values.
  • GSTiCluster (2015): an R package to integrate vertical multi-omics data for disease subtype discovery using overlapping group lasso and tight clustering in the iCluster latent variable model.
  • iPF (2015): an Integrative Phenotyping Framework (iPF) that integrated clinical data, mRNA and miRNA and identified an intermediate disease subtype between IPF and COPD.
  • FusionMetaCaller (2015): an R package to combine results of multiple top-performing fusion transcript detection algorithms from paired-end RNA-seq data.
  • MLbias (2014): an R package to correct machine learning bias when many classifiers are compared in the model selection.
  • phenomeImpute (2014): an R package to impute high-dimensional phenome data.
  • Inter-study prediction in mocroarray studies
  • Gene clustering methods in microarray or high-dimensional data analysis
  • QuantileMap : a visualization tool to compactlydemonstrate multiple (hundreds) distributions in genomic applications.
  • a set of R functions for cDNA array analysis:



  • MOG: This is an R package for Bayesian indicator model with multi-level overlapping group structure.
    Li Zhu,y , Zhiguang Huoz , Tianzhou Ma,y , and George Tseng. (2018) Bayesian indicator variable selection to incorporate multi-layer overlapping group structure in multi-omits applications.

  • FBM: This is an R package for full Bayesian omics integrative model (iBAG) with missingness.
    Bayesian integrative model for multi-omics data with missingness.

  • IS-Kmeans: This is an R package to combine multi-level omics data for sparse K-means clustering with overlapping group structured regularization.
    Zhiguang Huo and George C. Tseng*. (2017) Integrative sparse K-means for disease subtype discovery using multi-level omics data. Annals of Applied Statistics. accepted.

  • BayesMetaSeq: This is an R package to combine multiple RNA-seq studies by Bayesian hierarchical model for detecting differentially expressed genes. (manual)
    Tianzhou Ma, Faming Liang and George C. Tseng. (2017) Biomarker detection and categorization in RNA-seq meta-analysis using Bayesian hierarchical model. JRSS-C. Accepted.

  • CBM (cross-platform Bayesian meta-analysis): This is an R package to combine multiple RNA-seq and microarray studies by Bayesian hierarchical model for detecting differentially expressed genes. The model extends from BayesMetaSeq to accommodate continuous measurements in microarray and count data in RNA-seq and incorporate normalization of fold changes between the platforms (manual)
    Tianzhou Ma, Faming Liang, Steffi Oesterreich and George C. Tseng. (2017) A joint Bayesian modeling for integrating microarray and RNA-seq transcriptomic data. Journal of Computational Biology


  • BayesMP: This is an R package to combine multiple transcriptomic studies by Bayesian modeling on p-values. Compared to the unified model in BayesMetaSeq and CBM, this method is a two-stage approach to model directly on p-values. As a result, the method loses some statistical power but is more generalizable/robust to different experimental platforms and experimental designs.
    Zhiguang Huo, Chi Song* and George C. Tseng*. (2016) Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals. AOAS.


  • FusionMetaCaller: This is an R package to combine multiple top-performing methods for detecting fusion transcripts in paired-end RNA-seq data. (manual)
    Silvia Liu, Wei-Hsiang Tsai, Ying Ding, Rui Chen, Zhou Fang, SungHwan Kim, Tianzhou Ma, Ting-Yu Chang, Nolan Michael Priedigkeit, Adrian Lee, Jianhua Luo, Hsei-Wei Wang, I-Fang Chung and George C. Tseng. (2016) Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. NAR


  • GSTiCluster: This is an R package to integrate vertical multi-omics data for disease subtype discovery using overlapping group lasso and tight clustering in the iCluster latent variable model. (manual)
    Sunghwan Kim, Steffi Oesttereich, Yongseok Park* and George C. TSeng* (2015) Integrative multi-omics clustering for disease subtype discovery by sparse overlapping group lasso and tight clustering. Biostatistics. Under review.


  • iPF: This is an R package to integrate phenome, mRNA and miRNA data. We have shown that it identifies a novel intermediate disease subtype between IPF and COPD. (manual)
    SungHwan Kim, Jose D. Herazo-Maya, Dongwan D. Kang, Brenda M. Juan-Guardela, John Tedrow, Fernando J. Martinez, Frank C. Sciurba, George C. Tseng* and Naftali Kaminski*. (2015) Integrative Phenotyping Framework (iPF): Integrative Clustering of Multiple Omics Data Identifies Novel Lung Disease Subphenotypes. BMC Genomics.


  • MLbias: This is an R package to correct for machine learning bias when many classifiers are compared and the best is selected. It included two old methods (nested cross validation and Tibshirani's procedure) and a new inverse power law (IPL) method.
    Ying Ding, Shaowu Tang, Serena G. Liao, Jia Jia, Yan Lin, George C. Tseng*. (2014) Bias correction for selecting the minimal-error classifier from many machine learning models. Bioinformatics. 30(22):3152-8.


  • phenomeImpute: This is an R package to impute missing values in large-scale high-dimensional phenome data. It includes several variations of KNN, random forest and MICE methods.

    Serena G. Liao&, Yan Lin&, Dongwan Kang, Naftali Kaminski, Frank C. Sciurba, George C. Tseng. (2014) Missing value imputation in high-dimensional phenomic data: Imputable or not? And how?. BMC Bioinformatics. 15:346.


  • Inter-study prediction in microarray studies:

    • Ratio-adjusted gene-wise normalization (rGN): Download

      Chunrong Cheng, Kui Shen, Chi Song, Jianhua Luo and George C Tseng. (2009) Ratio Adjustment and Calibration Scheme for Gene-wise Normalization to Enhance Microarray Inter-study Prediction. Bioinformatics. 25:1655-1661.

    • Module-based preidction approach (MBP):

      Zhibao Mi, Kui Shen, Nan Song, Chunrong Cheng, Chi Song and George C Tseng. (2010) Unsupervised module-based prediction approach for robust inter-study prediction in microarray data. Bioinformatics. 26: 2586-2593.


  • Gene clustering in microarray data: Our group has developed two complementary gene clustering methods for microarray data (or for clustering high-dimensional complex data in general). Both methods directly identify small and tight clusters in the data and allow a set of scattered genes without being clustered. Tight clustering utilizes resampling techniques to obtain consistent tight clusters in repeated subsampling evaluations. Penalized and weighted K-means extends the target function of K-means. It has faster computation than tight clustering and can allow incorporation of prior biological information.

    • Tight clustering: Download (ANSI C source code and R package), CRAN download

      George C. Tseng and Wing H. Wong. (2005) Tight Clustering: A Resampling-based Approach for Identifying Stable and Tight Patterns in Data. Biometrics.61:10-16.

    • PWKmeans: Download (C source code and PPAM package)

      George C. Tseng. (2007). Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics. 23:2247-2255.


  • Quantile maps: This is a visualization tool to compactly and unbiasedly demonstrate multiple (hundreds) distributions.

    George C. Tseng. (2009) Quantile map: Simultaneous visualization of patterns in many distributions with application to tandem mass spectrometry. Computational Statistics and Data Analysis. in press.


  • R functions for cDNA array analysis: a set of R functions for filtering, normalization, Bayesian hierarchical modelling and MCMC procedures in cDNA microarray analysis.

    The method is developed to assess gene expression level with replicates in cDNA microarray data. A Bayesian hierarchical model is established to model gene-specific replicate variations with prior information from calibration experiments. A version of empirical Bayes procedure is used. MCMC simulation is then used to generate the posterior distribution.

    This program provides a browser interface to implement methods described in the paper. The interface is written in JavaScript but runs in R at the background. The plug-in between JavaScript and R may no longer be maintained. In that case, users can still use the functions directly in R.


    George C. Tseng, Min-Kyu Oh, Lars Rohlin, James C. Liao, and Wing Hung Wong. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research. 29: 2549-2557.