Technology and Health IT

“Creating Software and Tools to Enable Knowledge Discovery and Clinical Decision Making.”

The research information technology group at ICBI develops innovative scientific software to enable translational research. Our projects include muti-omics data analysis, vaccine safety research, clinical data analysis, high definition data visualization, natural language processing, and mobile application development.

The Health Information Technology/ Software and Engineering team functions at the intersection of information science, computer science, and health care. The team manages and supports computing and communications technology applied to healthcare, health education, and research as well as the use and sharing of information within the institutions in support of our investigators. The team develops custom software applications, tools, and databases and manages the resources, devices, and methods required for optimizing the acquisition, storage, retrieval, analysis and use of information. It also manages dedicated servers, virtual machines both within the Georgetown premises and in the cloud, configured with strict security measures, restricted backup devices and capable of collecting, storing, analyzing, and distributing big data while facilitating collaboration with laboratories across the world. Specialized software, development packages and expertise are available for decision support, molecular modeling, processing, advanced statistical data analysis, knowledge management, mathematics, data mining and big data processing. The computing capabilities ICBI manages, include the latest technologies and secure environments for conducting data collection to do research and education or clinical care. The center is also leading advanced data visualization infrastructure within Georgetown University Medical Center (GUMC), a technology that allows researchers to visualize large data sets in high fidelity, displaying over 16 million pixels of data in a single view.

Current Projects

G-DOC

The Georgetown Database of Cancer Plus other diseases (G-DOC Plus) is a precision medicine platform containing molecular and clinical data from thousands of patients and cell lines, along with tools for analysis and data visualization. The platform enables the integrative analysis of multiple data types to understand disease mechanisms. G-DOC Plus has three overlapping entry points for the user based on their interests: 1) Personalized Medicine, 2) Translational Research, and 3) Population Genetics.

View G-DOC Tutorials and webinar recordings.

G-VISR

Application to analyze and curate information about the molecular basis of vaccine safety.

MedTurk

Helps researchers automate the process of extracting information from clinical notes and data from electronic health records.

Biospecimen Portal

This portal integrates data about biospecimens from the Lombardi Shared Resources allowing investigators to identify patients and samples for research projects.

Data Visualization Lab

The ICBI visualization system was created through an NVIDIA corporation hardware donation grant and hardware contributed by the University Information Systems department. High definition visualization hardware for big data visual analytics. The visualization lab consists of 2 ultra high definition 55 inch display panels which are driven by NVIDIA Quadro K5000 high end video card. The system is capable of displaying 16 million mega pixels.

GWCAM Mobile Application

A mobile application to collect quality of life information from gulf war veterans.

Collaborations

Amazon Web Services

ICBI has received a grant from Amazon.com (3 cycles) to conduct research in high performance computing using AWS.

ICBI developed comprehensive processes and tracking and monitoring systems to manage requests and be able to track and measure key performance indicators.

GHUCCTS

The Georgetown-Howard Universities Center for Clinical and Translational Science (GHUCCTS) is a collaborative research center that includes two major universities and three affiliated hospital and research systems. GHUCCTS institutions include the Georgetown University Medical Center, Howard University, MedStar Health Research Institute, the Washington DC Veterans Affairs Medical Center (VAMC) and Oak Ridge National Laboratory. Patient Data Access: To facilitate clinical trial recruitment and identify patients for research studies across GHUCCTS institutions, we established a secure exchange architecture that enables intra- and inter-institutional clinical and translational research information sharing across boundaries; we also developed cohort discovery methodologies for querying electronic health records across our participating institutions.

For this purpose, we implemented a combination of tools that were configured according to GHUCCTS data governance policies. These tools include Explorys, a grid based population discovery application and the Integrating Biology and the Bedside (i2b2) software, an open source analytical tool. The combination of these systems will ensure that researchers at each of the GHUCCTS institutions will be able to easily identify potential clinical trial participants across institutions, while protecting the confidentiality and security of their health information.

The established tools and capabilities will also support the analysis and visualization of research and clinical data and enable investigators easy access to available populations across multiple institutions. The implementation entailed the design and development of software applications to consistently map data from multiple sources into a common repository, and to create a process and implement tools for facilitating secure information sharing. In our setting it is challenging to implement an integrated research data repository that includes all participating institutions’ data. Our final approach was to establish a searchable GHUCCTS research data repository complemented by a process to handle query submission to institutions and data sources which cannot be readily integrated into the GHUCCTS i2b2 repository.

WebCAMP
SharePoint
i2b2, Explorys
EHR systems
Lab systems
REDCap

Technology

The development of research infrastructure is the foundation of the Biomedical Informatics program at Georgetown and underlies the mission of integrating and making sense of enormous volumes of data being generated in both the lab and clinic. ICBI scientists and software engineers are working together to develop technologies that enable the integration of biomedical data with state-of-the-art tools through a multi-disciplinary and collaborative approach that drives translational research and clinical care.

Our primary areas of technology development involve:

Big Data

Practically every major sector of the economy and scientific enterprise is kindling (or rekindling) the idea of Big Data as key to solving important problems in and across disciplines. The Wall Street Journal has highlighted Big Data as one of three game changer or ‘black swan’ technologies that will transform the future. It is fair to say that there is a big data rush underway; nowhere is the promise and potential more real than in the rapid rise of inexpensive whole-genome sequencing technologies using next-generation sequencing (NGS) instruments. The world’s current sequencing capacity is estimated to be 13 quadrillion DNA bases a year. The cost to produce an accurate human whole-genome sequence is dropping rapidly and is expected to cost under $100 per genome in the next decade; the capacity to sequence the genomes of a billion people will have been realized in the next twenty years. These will be truly Big Data, requiring 3 or more exabytes of storage. A number of public and private projects are already contributing to this biological data deluge. The NIH-funded 1000 Genomes Project deposited 200 terabytes of raw sequencing data into the GenBank archive during the project’s first 6 months of operation, twice as much as had been deposited into all of GenBank for the entire 30 years preceding. Our team is working on a new generation of data management and mining systems to support diverse needs from personalized clinical medicine, translational and population research involving molecular and genetic outcomes. This project spans the disciplines of computer systems, analytics, and genomics. Our systems research, in collaboration with the departments of computer science at Virginia Tech and Georgetown University will help harness petabytes of genomic information in a manner cognizant of the multimodal, multilevel nature of the datasets, and our project will extend the query capabilities of existing NoSQL models. Our analytics research extends the MapReduce workflow paradigm to support more complex workflows on genomic and other biological data while being attentive to the need to save some intermediary results and discard others. More broadly, this project is aimed to be the first implementation of petabyte-scale compositional data mining to extract actionable, ranked knowledge from large-scale genome studies.

Clinical Omics Data Integration

The development of research infrastructure is the foundation of our program and underlies the mission of integrating and making sense of enormous volumes of data. Our long-term goal is to develop methodologies to help provide clinical decision support, through the integration of available “omics” and patient data. Toward this goal we have developed G-CODE – the Georgetown Clinical & Omics Development Engine – to help empower the next generation of translational research. The power of the G-CODE concept lies in the integration of multi-omics data with clinical outcome data and supported within a powerful, but easy to use environment accessible to clinicians trying to decide the best treatment options, as well as to researchers looking for trends among large datasets. We are not only interested in the development of research platforms, but also in the analysis of large datasets for novel information. We use a variety of open source tools and infrastructure in addition to our own tools and algorithms. Our research team is asking key biological and medical research questions that can be addressed through data mining, analysis, and the integration of a wide array of disparate datasets primarily obtained through public studies, although we collaborate on private studies as well. The Georgetown Clinical and Omics Development Engine (G-CODE) G-CODE has been developed to help empower the next generation of translational research for a wide array of disease areas by making powerful bioinformatics tools and integrated experimental and clinical data easily accessible by both physician-scientists and laboratory researchers within a unified and quickly-deployable environment. This tool is freely available for use with public or private studies and can be tailored to specific use cases. To discuss how G-CODE can enable and accelerate your translational, basic, or clinical research, please contact us at icbi@georgetown.edu.

Clinical Research Management Systems

ICBI provides support for clinical research at the Georgetown University Medical Center through the Clinical Research Management Office, which supports clinical trials within the Lombardi Comprehensive Cancer Center.

Cloud Computing

Whole genome sequencing has brought about a whole new set of research challenges in the biomedical field. The vast amounts of data produced from sequencing the human genome necessitates a new computational strategy as well. No longer is it feasible for researchers to build out datacenters equipped with adequate computational power and storage capacity for whole genome sequencing. Cloud computing lowers the barrier to entry for many institutions and allows for access to required resources at a fraction of the cost that it would take to build out a datacenter with the required capacity. Currently, the leading player in the cloud computing arena is Amazon Web Services. They offer a multitude of resources that help to store and analyze the data produced by whole genome sequencing. Amazon S3 is used to store the data in a secure, encrypted, redundant environment. EC2 provides a computational environment that is flexible, scalable and stable. Users are able to create virtual machines of various sizes, with up to 60 GB of RAM and 88 cores, and are also able to spin up multiple instances so that workflows can be parallelized. Elastic Map Reduce provides a framework for parallelizing jobs, so that tasks that may have taken days before can now be performed in a matter of hours. All of these services combine to provide research institutions with the necessary capacity to store and analyze the onslaught of next generation sequencing data. ICBI utilizes Amazon Web Services to store and analyze hundreds of whole genome sequences in a secure and scalable environment. Data analysis pipelines leverage the elastic nature of the cloud and allow the center to scale to thousands of whole genome sequences.

ICancerLab

Our data portals and tools can be found here: https://apps.icbi.georgetown.edu/