Many Northwestern University Feinberg School of Medicine researchers are interested in accessing and analyzing large datasets, such as those generated by The Cancer Genome Atlas (TCGA), Roadmap Epigenomics Project, along with their own research data. Such data, now popularly referred as “Big Data”, include imaging, phenotypic, molecular (including –omics), physiological, anatomical, clinical, behavioral, environmental, and many other types of biomedical data.
Analysis of Big-Data from genomics, proteomics and clinical trials requires smart Data-Analysts, those with good computer programming skills and data-mining expertize. Without the right skills, researchers will end up finding patterns that mean nothing and missing those that are true breakthroughs.
Facilities and Location
The core is located on the 11th floor of the Arthur Rubloff Building. Several offices and cubicles are assigned to the core on the 11th floor, comprising approximately 500 square feet.
Software and Programs
- MPromDb – Mammalian Promoter Database
- IsoformEx – Isoform level gene expression estimation from RNA-seq data;
- TPD – Modeling Transcription Factor Binding Site Profiles from ChIP-Seq Data;
- NPEBseq – Differential Expression analysis based on RNA-seq data;
- PIGExClass – Platform-independent Isoform-level Expression based classification-system
- lumi, a NU initiative that develops, maintains, updates and distributes software libraries in the Bioconductor R package for Illumina expression and methylation microarray data analysis and genotyping estimation
- GeneAnswers, a NU initiative that facilitates gene-concept network analysis and generation of protein-protein interaction networks, which focus on clinical and biological interpretation based on given genes
- The Database for Annotation, Visualization and Integrated Discovery (DAVID)
- Weighted Gene Co-expression Network Analysis (WGCNA)
- Gene Set Enrichment Analysis (GSEA)
The ABBC provides the following services to Feinberg investigators. The services offered are (not a complete list):
- Data Integration and Analysis: The core faculty and staff will participate in consultation and collaboration with Feinberg investigators to provide efficient and effective ways to create connections across data types. Examples of data types that could be addressed include, but are not limited to:
- Omics data (e.g., genomics, proteomics, metabolomics, etc.)
- Multiscale data (genomic, epigenomic, subcellular, cellular, network, organ, systems, organism, population levels)
- Multiplatform data (microarray, NextGen sequencing, RT-PCR etc.)
- Data from multiple research areas and diseases (e.g., common inflammation pathways in cancer, obesity, immune diseases, and neurodegenerative diseases)
- Predictive Modeling: The core staff will collaborate with Feinberg investigators in the areas of predictive modeling to produce useful biomedical information:
- Biomarker identification by integrating various levels of datasets, such as clinical data and molecular profiles for the same patients, and mathematical methods
- Molecular subtyping (e.g., cancer patient stratification into different molecular subtypes) based on transcriptome (microarray and/or RNA-seq) and genomic data
- Interpretation of gene lists in the context of pathways and diseases – Pathway analysis using MetaCore, GSEA and DAVID as well as Biocondutor packages, such as GeneAnswers
- Data mining and integrating with public and data resources, including Gene Ontology, KEGG Pathway, Reactome Pathway, NCI-Nature curated pathways, Disease Ontology (a NU initiative), Bioconductor
- Data Management: The core staff will help Feinberg investigators in managing and storing biomedical Big Data.
- Database development and management – The core will provide consultation and collaborative services to organize, store, and query genomic and proteomic data.
Examples of the expertise of the ABBC Core Faculty and Staff:
- RNA-seq, small-RNA-seq and ChIP-seq data analysis
- Stong N, Deng Z, Gupta R, Hu S, Paul S, Weiner AK, Eichler EE, Graves T, Fronick CC, Courtney L, Wilson RK, Lieberman P, Davuluri RV, Riethman H. (2014) Subtelomeric CTCF and cohesin binding site organization using improved subtelomere assemblies and a novel annotation pipeline. Genome Res. Epub 2014/03/29. doi: 10.1101/gr.166983.113. PubMed PMID: 24676094.
- Ota, H.*, Sakurai, M.*, Gupta, R.*, Valente, L., Wulff, B.-E., Ariyoshi, K., Iizasa, H., Davuluri, R.V. and Nishikura, K. (2013) ADAR1 complexes with Dicer and plays a role in microRNA processing and RNA-induced gene silencing mechanisms. Cell, 153(3): 575-589. (*Equal contribution)
- Bi Y and Davuluri RV (2013) NPEBseq: Nonparametric Empirical Bayesian based Procedure for Differential Expression Analysis from RNA-seq Data. BMC Bioinformatics, 14: 262. Highly accessed..
- Pal, S., Gupta, R., Kim, H., Wickramasinghe, P., Baubet, V., Showe, L.C., Dahmane, N. and Davuluri, R.V. (2011) Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development, Genome Research, 21(8): 1260-1272.
- Glass , Wuertzer C, Cui X., Bi Y, Davuluri R, Xiao Y Y, Wilson M, Owens K, Zhang Y, Perkins A. Global identification of EVI1 target genes in acute myeloid leukemia. PLoS ONE. 8:e67134, 2013. PMCID: PMC3694976.
- Bi, Y., Kim, H., Gupta, R. and Davuluri, R.V. (2011) Tree-based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles, PLOS One 6(9): e24210. doi:10.1371/journal.pone.0024210.
- Kim, H., Bi, Y., Pal, S., Gupta, R., and Davuluri, R.V. (2011) IsoformEx: Isoform level expression estimation using weighted non-negetive least squares from mRNA-seq data, BMC Bioinformatics, 12(1):305.
- Exon-array data analysis and isoform-level cancer signatures
Zhang, Z., Pal, S., Tchou, J. and Davuluri, R.V. (2013) Isoform-level expression profiles provide better cancer signatures than gene-level expression profiles, Genome Medicine, Apr 17;5(4):33. Highly accessed.
- Data-mining for platform-independent molecular sub-typing bioassay development)
Pal S, Bi Y, Macyszyn L, Showe LC, O’Rourke DM and Davuluri RV. (2014) Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes. Nucleic Acids Res. Epub 2014/02/08. doi: 10.1093/nar/gku121. PubMed PMID: s.
- Functional annotation of SNPs
Jendrzejewski, J., He, H., Radomska, H.S., Li, W., Tomsic, J., Liyanarachchi, S., Davuluri, R.V., Nagy, R. and de la Chapelle, A. (2012) The polymorphism rs944289 predisposes to papillary thyroid carcinoma through a large intergenic noncoding RNA gene of tumor suppressor type. Proc Natl Acad Sci U S A, 109, 8646-8651.