Alternative splicing and other types of mRNA processing events contribute critically to proteome diversity by allowing one gene to encode multiple functionally distinct proteins. The switching of protein isoform expression from one isoform to another is widely implicated in disease and environmental stress, including heart diseases, cancer, alcohol exposure, and oxidative damage. Although many alternative isoforms have now been discovered, their existence and abundance at the protein level remain mostly unknown to researchers, rendering it very challenging to ascertain their true physiological functions and disease relevance. In theory, researchers can harness RNA-seq in conjunction with shotgun proteomics data to guide analysis for protein isoform characterization. However, proteomics and transcriptomics resources are currently fragmentary. Proteomics and transcriptomics data exist in isolated repositories, and computational tools to integrate them are lacking.

The rapid revolution of Big Data Sciences has now afforded new opportunities for the integration of proteotranscriptomics data. In this new project, we are building ProteoSeq: a user-oriented multi-omics integration platform designed to overcome the challenge of incoherent data and computational resources in the currently separate omics fields of proteomics and transcriptomics. ProteoSeq will enable the effective use of complementary RNA and protein data for in-depth inquiry through a series of proven and preassembled steps behind a seamless user interface. Our goal is to implement ProteoSeq to enable wide users to combine transcriptomics and proteomics data for the routine, large-scale characterization of alternative protein isoforms in various disease models.