The Georgetown Database of Cancer (G-DOC)

The Georgetown Database of Cancer (G-DOC) is a cutting-edge, data integration platform that allows researchers to access and analyze clinical and research data across multiple trials and studies. It contains molecular and clinical data from thousands of patients and cell lines, along with tools for analysis and data visualization. 

The G-DOC framework can be used to import data from multiple studies, to access biomedical research data, to perform analysis, and generate ad hoc queries and customized reports. 

The purpose of G-DOC is to facilitate translational research and precision medicine by providing easy identification of trends and patterns in these integrated datasets that ultimately result in improved understanding of disease mechanisms, and better-targeted and personalized therapies for cancer. A paper describing G-DOC was published in Neoplasia in 2011.

A quick summary of G-DOC, its strengths, achievements and applications is shown in the info-graphic below. 

G-DOC summary

In 2014 we expanded G-DOC into other clinical areas of interest. The new version, G-DOC Plus includes multi-omics and clinical data from both cancer and non-cancer studies through an enhanced web platform that uses cloud computing and advanced computational tools to for analysis and visualization of clinical, microarray, next generation sequencing (NGS) data and medical images. The new data collection in G-DOC Plus includes whole genome sequencing (WGS) data from the 1000 Genomes Project and Complete Genomics; multi-omics data from the NCI-60 data collection numerous breast, GI, and pediatric cancer studies; and non-cancer studies including Duchenne Muscular Dystrophy (DMD), and Alzheimer’s disease, and infectious diseases. 

The new version G-DOC Plus has three overlapping entry points for the user based on their interests: 1) Personalized Medicine: that allows to explore germline and somatic variants in WGS data, and medical MRI images 2) Translational Research: that offer various tools to compare groups of patients, and 3) Population Genetics: that allows to explore the 1000 genomes dataset. A journal publication describing this new version, along with several use cases was published in BMC Bioinformatics journal in 2016.

In March 2015, data from the largest public brain cancer NCI REMBRANDT portal (, and two studies from CaArray were migrated to G-DOC Plus, with new studies being added on a regular basis. G-DOC Plus is publicly available to researchers to explore existing data, from both individual patient samples and a population as a whole providing the user with a comprehensive view of the data. 

Researchers can upload data from their own private studies into G-DOC Plus for comparison with existing datasets in collaboration with ICBI staff who will process the data into the formats needed for upload in the database. This data will be kept private in the G-DOC Plus system for use by data owners and their collaborators.

Most recently, G-DOC is being used as a tool for teaching students the concepts of biomedical big data in medicine. "Demystifying Biomedical Big Data: A User's Guide", a free new massive open online course (MOOC) directed by Georgetown faculty Drs. Bassem Haddad, Yuriy Gusev and Peter McGarvey, will launch on online educational platform edX on February 14, 2017. Watch the trailer for this course here. To register, please visit:

Tutorials and webinar recordings on how to use G-DOC and its various tools are available here:

If you are interested in collaborating with us, please contact us at:, or

G-DOC can be accessed at: