Systems Biology, Systems Pharmacology, Biomedical Big Data, Bioinformatics, Computational Biology, Data-Mining, Software Engineering, Network Analysis
Research Team:
Program Director: Sherry Jenkins, MS
Research Assistant Professor: Alexander Lachmann, PhD
Data Scientist: Daniel Clarke, MS
Bioinformaticians: John Erol Evangelista, MS; Sherry Xie, BS
Bioinformatics Software Engineers: Nasheath Ahmed, BS, AB; Eden Deng, BS; Ido Diamant, BS; Giacomo Marino, ScB, AB; Stephanie Olaiya, ScB
Systems Analyst: Heesu Kim, MBA, MS
MD Student: Vivian Utti, BS
Master's Student: Lara Dogan, BA
2023 Undergrad Research Trainees: Alexandra Agris, Gaurvi Awasthi, Rhea Desai, Nooha Kawsar, Adam Lalani, Hannah Lee, Lauren Malek, Jacob Mayourian, Osaiyekemwen Ruth Ogbemudia, Shriya Rangaswamy, Abigail Zaroff, Mason Zhang
Summary of Research Studies:
Advances in high-throughput experimental molecular biology are allowing us to elucidate the molecular mechanisms of mammalian cell regulation with ever-increasing detail. However, the potential gains from these advances are often not fully realized since high-throughput techniques often produce more data than our current ability to adequately organize, model and visualize. A particular challenge is encountered when attempting to integrate several high-dimensional datasets from multiple types of high- and low-throughput experimental techniques applied to study mammalian cells.
For the purpose of organizing, visualizing, analyzing and modeling data from such sources we develop computational approaches which can assist experimental systems-biologists to form rational hypotheses for further experimentation. We analyze high-dimensional data collected for projects integrating results from multiple layers of regulation (genomics, transcriptomics and proteomics). In addition to our research efforts, we also develop software so that our methodologies can reach and impact the Big Data biomedical research community. Below are some of the software tools we have developed:
1) Enrichr is a gene set enrichment analysis tool that includes one of the largest collections of annotated gene sets: 298,481 gene sets organized into 172 gene set libraries. Enrichr provides visualization of enrichment results as bar graphs, tables, canvases and networks. Enrichment is computed by three different methods and users can save and share their lists and results with a single click. Articles describing the initial and updated versions of the software were published in BMC Bioinformatics and Nucleic Acids Research. PMID: 23586463 and PMID: 27141961
2) GEO2Enrichr is a browser extension and a web application for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr and other tools. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to several downstream analysis tools. An article describing the initial version of the software was published in Bioinformatics. PMID: 25971742
3) L1000CDS2 and Drug Pair Seeker (DPS) are two tools that use the Connectivity Map gene expression datasets, including the new version that utilizes the L1000 technology, to predict single and pairs of drugs that can either mimic or reverse gene expression given signatures of differentially expressed genes. Both tools use novel algorithms developed by the Ma’ayan Laboratory to prioritize drugs and small molecules. A detailed description of Drug Pair Seeker and its application to kidney disease can be found in publication in the journal JSAN. PMID: 23559582. An article describing L1000CDS2 was published in NPJ Systems Biology and Applications. PMID: 28413689
4) ChIP-X Enrichment Analysis (ChEA) database contains manually extracted datasets of transcription-factor/target-gene interactions from over 100 experiments such as ChIP-chip, ChIP-seq, ChIP-PET applied to mammalian cells. We use the database to analyze mRNA expression data where we perform gene-list enrichment analysis as the prior biological knowledge gene-list library. The system is delivered as web-based interactive software. With this software users can input lists of mammalian genes for which the program computes over-representation of transcription factor targets from the ChEA database. An article describing the system has been published in the journal Bioinformatics. PMID: 20709693
5) Kinase Enrichment Analysis (KEA) is a web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them. The system draws from several available kinase–substrate databases to compute kinase enrichment probability based on the distribution of kinase–substrate proportions in the background kinase–substrate database compared with kinases found to be associated with an input list of genes/proteins. An article describing the systemhas been published in the journal Bioinformatics. PMID: 19176546
6) Expression2Kinases (X2K) is a software tool that integrates and upgrades the functionality of ChEA, Genes2Networks, KEA and Lists2Networks into one platform and computational pipeline. Given a list of differentially expressed genes, the software identified upstream transcription factors using the software and database ChEA; X2K then connects the top identified transcription factors with Genes2Networks using databases of known protein-protein interactions; the resultant subnetwork is then entered into KEA for kinase enrichment analysis. X2K also includes all the functions for enrichment analysis available within Lists2Networks. An article describing the system has been published in the journal Bioinformatics. PMID: 22080467 and PMID: 29800326
We apply these and other computational methods for the analysis of data from a variety of projects with our collaborators. The results from our analyses produce concrete suggestions and predictions for further functional experiments. The predictions are tested by our collaborators and our analyses methods are delivered as software tools and databases for the systems biology research community.
For more information, please visit the Ma'ayan Laboratory website.