We welcome you to collaborate with us in the testing and development of a new generation of data management, mining, and analysis tools to help with the “Big Data” challenge facing the genomic scientists today.

Enormous volumes of molecular and clinical data are continually being generated by laboratories across the world and there is a tremendous need for new databases that can store large volumes of disparate data types and provide tools to help scientists make sense of this data through novel, integrative analysis in a single, unified portal.  Biomedical informatics is helping to address these challenges toward  the goal of alleviating human suffering.

Below we have listed the current applications that can demonstrate this data. Please contact us for more information:

  • G-DOC Plus: The next generation bioinformatics platforms for systems medicine

  • SNP2Structure: A public resource for mapping and modeling nsSNPs on human protein structures
  • medTurk: Helps researchers  automate the process of extracting  information from clinical notes and data from electronic health records.

Georgetown Database of Cancer (G-DOC)

The Georgetown Database of Cancer (G-DOC) is a cutting-edge, data integration platform that allows researchers to access and analyze clinical and research data across multiple trials and studies. The framework can be used to import data from multiple studies, to access biomedical research data, to perform analysis, and generate ad hoc queries and customized reports. The purpose of G-DOC is to facilitate systems medicine by providing easy identification of trends and patterns in these integrated datasets that ultimately result in better targeted and personalized therapies for cancer and other diseases. A paper describing G-DOC was published in Neoplasia in 2011.

In 2014 we expanded G-DOC into other clinical areas of interest. G-DOC Plus now includes multi-omics and clinical data from both cancer and non-cancer studies through an enhanced web platform that uses cloud computing and advanced computational tools to for analysis and visualization of clinical, microarray, next generation sequencing (NGS) data and medical images. The new data collection in G-DOC Plus includes whole genome sequencing (WGS) data from the 1000 Genomes Project and Complete Genomics; multi-omics data from the NCI-60 data collection numerous breast, GI, and pediatric cancer studies; and non-cancer studies including Duchenne Muscular Dystrophy (DMD), and Alzheimer’s disease. In March 2015, data from REMBRANDT portal (, and two studies from Ca Array were migrated to G-DOC Plus, with new studies being added on a regular basis. G-DOC Plus is publicly available to researchers to explore existing data from both individual patient samples or a population as a whole providing the user with a comprehensive view of the data. Researchers can upload data from their own private studies into G-DOC Plus for comparison with existing datasets in collaboration with ICBI staff who will process the data into the formats needed for upload in the database. This data will be kept private in the G-DOC Plus system for use by data owners and their collaborators.



One of the long-standing challenges in biology is to understand how non-synonymous single nucleotide polymorphisms (nsSNPs) change protein structure and further affect their function. While it is impractical to solve all the mutated protein structures experimentally, it is quite feasible to model the mutated structures in silico. Toward this goal, we are building a publicly available structure database SNP2Structure to facilitate our research endeavors focusing on single amino acid change. Compared with the existing web portals with a similar aim, ours has three major advantages. First, we corrected the existing sequence mapping discrepancies presented in others. Although the percentage of erroneously mapped structures is small, it is critical to correct such errors. Second, our portal offers comparison of two structures simultaneously. Third, the mutated structures are available to download locally for further structural analysis. We believe SNP2Structure will be a valuable public resource to the research community for understanding the functional impact of nsSNPs.


From the context of clinical research, clinical notes contain insightful data. The problem is, is that these data are not readily available for analytics. While natural language processing algorithms exist to extract such data, they do not necessarily perform well in areas where difficult decision-making is required. Because of this, many researchers often resort to manual review. medTurk is a tool when manual review is required. It helps standardize the process of how reviewing is performed and coordinates the activity of multiple curators working in parallel. Learn more here