Inaugural Biomedical Informatics Symposium at Georgetown

2012-10-12

With relief felt by the organizers and much gratitude toward the participants, the first Georgetown Biomedical Informatics symposium held on October 12, 2012 was filled with inspiring talks and provided opportunities for rich networking. The goal of the symposium was to both showcase cutting edge research and applications in the field of biomedical informatics and to inform the Georgetown University Medical Center (GUMC) community of the related educational opportunities and in-house informatics resources we have available to enhance both basic research and clinical trials. This free, one-day symposium featured a variety of talks by leaders in the fields of clinical and translational sciences, including keynote talks by John Quackenbush at Harvard, and John Niederhuber at Inova Translational Medicine Institute (and former director of the National Cancer Institute). There were 130 attendees, which exceeded our goal to reach 100 registrants.

The new GUMC Innovation Center for Biomedical Informatics (ICBI) hosted the event along with the Georgetown-Howard Universities Center for Clinical and Translational Science, Georgetown Center for Cancer Systems Biology, and Georgetown Lombardi Comprehensive Cancer Center. We felt honored by the support provided to us at GUMC, with introductory talks by Dr. Lou Weiner and Dr. Robert Clarke, which set the stage for the day. Dr. Weiner, Lombardi Cancer Center Director, kicked off the symposium with imagery from a 1951 movie “When Worlds Collide” to demonstrate the challenge in biomedical informatics of integrating clinical and quantitative molecular data. Among those challenges, he said, is that ‘we obtain clinical data idiosyncratically, and share it idiosyncratically so information in place A can’t get to place B’. With the plethora of data analysis platforms and databases for storage this will be a huge challenge for the community going forward. And although it may be hard to succinctly identify what the future of biomedical informatics will look like, it may have been aptly characterized by Dr. Clarke who said, “It’s a bit like pornography, you’ll know it when you see it!”

ICBI Director, Dr. Subha Madhavan, presented a short overview of the new bioinformatics center with highlights of four main projects. ICBI primarily performs peer-reviewed research in the translational sciences, although support services to the Georgetown community comprise an important part of the new center mission. The Georgetown Database of Cancer (G-DOC) is one of the research projects she highlighted as the biomedical informatics platform that integrates patient molecular data with clinical data and provides several analytical tools for processing the results to obtain new insights. Ultimately, G-DOC will serve not only researchers but clinicians involved in molecular diagnostics who can use G-DOC to make personalized therapeutic decisions based on a patient’s genomic profile.

ICBI is also engaged in a pharmacogenomics project supported by the FDA, said Madhavan, to determine how different genotypes affect drug efficacy and metabolism in hopes of improving labeling and therapeutic decisions. Why put a patient through potentially toxic side effects of certain drugs if based on their genotype the drug may not work, or work effectively at the given dosage?

Taming and making use of “big data” is another major theme for the ICBI and many biomedical research centers in consideration that the “NIH-funded 1000 Genomes Project deposited 200 terabytes of data into the GenBank archive during the project’s first 6 months of operation, twice as much as had been deposited into all of GenBank for the entire 30 years preceding” said Madhavan. The “Million Genomes Project” is not too far way she continued, and as impossible as it is to comprehend that amount of data we are in desperate need of figuring out how to store, deconvolute the data into useful information that clinicians can use to make decisions.

Keynote Addresses

John Quackenbush, the first keynote speaker, said that “biomedical research is changing from a laboratory science to an information science.” Considering that the cost of sequencing has dropped by “33% every four months since 2007,” he said that he anticipates that when the cost becomes low enough, “I will give my wife her genome for her birthday.” A gift surly to melt any woman’s heart. Related to romance, he said that since he met his wife on Match.com, that his 6 year old son will likely meet his wife on “Genematch.com” when considering how cheap sequencing will soon become. “I guarantee his genome will be sequenced by the time he is 15, but I don’t want it on Facebook,” he said, “I want it to be available to him for his private use.”

Privacy is another critical issue facing the effective use of biomedical informatics, as genomics data may likely be considered HIPPA protected. Considering that with “only 250 base pairs of DNA a person is identifiable” Quackenbush thinks the government and patients will insist on ensuring privacy despite a patient’s desire to have their genome be widely analyzed for information that may improve their health outcome. Many companies selling software for genomic storage and clinical use are proactively becoming HIPPA compliant to be ready for the anticipated widespread availability and growing demand for genomic data in the clinic and for use in Electronic Medical Records.

One of the flagship biomedical informatics projects Quackenbush and his team at Harvard are involved with is the development of the data-coordinating center of the NIH funded Lung Genomics Research Consortium. His team collects and integrates data from the LGRC partners and develops new methods to analyze the data and present it to the scientific community to help understand and treat lung disease. “We developed a proprietary data mining tool,” he joked, “called high-school students.” His team has developed this portal not only for the bioinformatics researchers but for other users who want to be “editors of genomic content” rather than informatics analysts who develop algorithms and tools for looking at the data.

John Niederhuber, the afternoon keynote speaker discussed the future of healthcare as the pursuit of personalized health. “We manage patient care on the average; how the average patient responds to medication.” When we talk about personalized healthcare “we take it out of the average” said Niederhuber, “to see how YOU will respond.” “Is this the wrong drug? What dosing is best?” This approach, he said, will allow the clinician to immediately know if the typical drug used to treat the population that normally responds (the average) to a particular medication will be effective on a particular patient who may respond differently than average based on their genotype.

Niederhuber, who directs the Inova Translational Medicine Institute, highlighted two projects the Inova is doing to contribute toward the future of personalized medicine. One study launched in April 2012 has already recruited 2500 family cohorts to build a “generational” genomics database starting with women in their first trimester of pregnancy and following the baby after birth through adulthood. The project team will sequence the baby, both parents, and the grandparents to garner a huge dataset for genetic, behavioral and environmental data that can contribute to disease outcome. He hopes to get 2-3 other hospitals/centers on board with the project to add families to this soon-to-be treasure trove of genetic data. Another study that started last year was to understand the genetic underpinnings of preterm birth through sequencing trios- mom, dad, and baby. One commenter from the audience brought up the environment (e.g. allergies) as a possible contributor to pre-term birth; Niederhuber said the longitudinal study will address such confounding factors that cannot be explained by genetics alone. He also stressed the need for “ethnic specific reference genomes” rather than the admixture of genomics that was used to create the first human genome. Large scale genomic projects will be met with greater success if families or individuals with defined ethnicities can have their genomes compared to a reference genome specific to their genetic background. Certain mutations in genes from one ethnic group have been shown to have a different effect in a different ethnic group. Niederhuber said we will soon be able to generate raw (sequencing) data very quickly – less than 24 hours at low cost, but we are still a long way from processing the data efficiently with analytical tools to make the data medically relevant and affordable.

Session 1 – Informatics Enables Clinical Research

Three excellent talks were given by government representatives who are involved in the challenges of big data and how best to analyze the exponential growth of genomic and other translational research information. Kenna Shaw, director of The Cancer Genome Atlas (TCGA) project discussed the current TCGA collection stated that the goal is to identify molecular alterations in human cancers, not find clinical markers of disease. She referred to the TCGA as a “biology project” not a clinical project. Shaw said they will be accepting patient samples until December 15, 2013 as the funding ends in 2014. And if there is a cohort of at least 100 cases of a rare tumor, Shaw said they are very interested in learning about it for possible incorporation into the TCGA pipeline.

ShaAvhree Buckman, director of the Office of Translational Sciences, Center for Drug Evaluation and Research (CDER) at FDA, discussed FDA’s major emphasis on improved regulatory decision making. The FDA is “rapidly moving toward a modernized, integrated informatics-based review environment,” Buckman stated, and a major part of this effort is to incorporate genomic information and new analytical capabilities where available to support “quantitative decision making.” She stressed that “high quality, standardized data is key” to this effort. Toward the goal of making information standardized and assessable, CDER has created a new Computational Science Center to enhance the review and regulatory process, as well as to “enhance responsiveness to emerging safety problems through better information,” said Buckman. This has been a challenge for the FDA since until recently drug submissions have arrived in paper form, and several still do she noted, so they are not amenable to quick analysis. Buckman said that FDA has the “largest repository in the world of subject-level clinical trial and nonclinical study experience,” but this is largely “unstructured data and poorly accessible.” Through the new center, the FDA is working to reduce the burden for reviewers who have to wade through volumes of paper. A new electronic format she said, will be used to create a “Clinical Trials Repository” that supports the structured “acquisition, validation, integration, and extraction of data from the increasingly large and complex datasets received by the Agency.” An increasing number of these clinical datasets include genomic data and this new system, she said, will “make use of enhanced analytical tools and techniques that enable reviewers to search, model, and analyze data to conduct better safety and efficacy analyses.”

Session 2 – Technology Driven Systems Medicine

Two technology development talks were given on some of the exciting developments in bioinformatics. Joel Saltz, Director of the Center for Comprehensive Informatics, and Chair of Biomedical Informatics at Emory University, discussed progress in the integrative analysis of heterogeneous, multiscale datatypes, which is where the field is going now. We are at an ‘”intermediate point” with data integration, and “closer to the beginning than the end” he said regarding the integration of data types (e.g. radiology imaging, pathologic features, omics data, and patient outcome) to improve clinical decision making. The integration of data types will also help reduce variability among pathology interpretations and help to sub-classify patients for therapeutic intervention.

Eliot Siegel, Professor and Vice Chair of the University of Maryland Department of Diagnostic Radiology, and Chief of Radiology and Nuclear Medicine for the Veterans Affairs Maryland Healthcare System, spoke about the analysis of big data using artificial intelligence (AI). For example, IRB approval was recently received for a major VA initiative to process of collecting records from one million veterans, which will be extremely valuable for clinical researchers. Saltz said that the challenge will be to determine how to search through vast amounts of this largely “unstructured” data for clinical decision making and not just for research.

“2011 will be remembered as the re-emergence of artificial intelligence” he said when describing the Jeopardy competition between IBM’s “Watson” AI system and the top human Jeopardy winners of all time. Watson won, which was a huge boon to AI and IBM’s “DeepQA” or deep question answer software that uses multiple evidence types and algorithms to find answers, and then weighs the probability that an answer is correct. Watson draws from numerous information databases to link common words and phrases, find and generate hypotheses, score the evidence and generate confidence levels. The speed of the processing is amazing as Watson can process 500 gigabytes per second (the equivalent of a ~ 1 million books).

Siegel said AI can be applied to the health domain to enable more accurate clinical decision making based on the ability of the system to process information with high confidence based on available data. This would be especially critical in hospital emergency rooms where there are a significant number of errors due to limitations in knowledge and other cognitive factors such as “premature closure” - where a physician comes up with an answer and stops there, rather than going further in assessing all the relevant options – computers don’t stop at one option. He said many previous AIs didn’t have an easy and rapid interface so are not well adopted. In an ER situation Siegel said, “You want an answer in 3 seconds.” Similar to Google, he added, people want to be able to input symptoms into a database and receive output diagnoses at a certain confidence level. For Electronic Medical Records (EMRs) Siegel said you cannot even search for the term “rash” or any other terms within a patient record or across records. “Institutes across the country are generating large, rich clinical data but none of this data is searchable, said Siegel. He added that “we need standard (imaging, clinical genome etc) data” to provide inputs into AI databases like Watson. He was optimistic that we are going to see “an explosion of smart applications” related to AI for EMRs and other part of the medical community.

Georgetown Resources

The afternoon talks of the symposium focused on Georgetown informatics educational programs and resources. Dr. Sona Vasudevan, Associate Professor and Director of the MD/MS program in Systems Medicine discussed the educational programs. Of particular interest is the MD/MS dual degree program in Systems Medicine that enables a medical student to spend one year doing bioinformatics research. This program provides critical training to medical students since biomedical informatics will soon become an integral part of clinical practice.

Dr. Cathy Wu, Professor, and Director of the Protein Information Resource (PIR) spoke about the program, which houses the world’s largest publicly available protein sequence database called the Universal Protein Resource (UniProt).

Dr. Yuriy Guesv, Sr. Bioinformatics Scientist within the ICBI discussed the Georgetown Database of Cancer, which is a platform that integrates multiple types of omics and clinical data for numerous cancer studies for use by the clinical and translational research communities.

Rachel Kidwiler, Program Director and AVP, Research Information Systems, discussed REDCap, a tool to enable standardized clinical study data collection.

Dr. Nawar Shara, Assistant Professor, and Director, Biostatistics and Epidemiology, MedStar Research Institute, discussed Explorys, an enterprise-level platform that for clinical data – enables the aggregation, analysis, management, and research of big data.

Networking

A combined poster session and reception was held after the talks with posters judged by crowdsourcing. Three “best poster” prizes were given and the winners are:

1st place - ”Mining Social Media for Healthcare”

Andrew Yates and Nazli Goharian; Department of Computer Science, Georgetown University

2nd place - “Screening of Novel Anti-Neuroinflammatory Agents to Treat Parkinson’s Disease”

Henry North1, Shalonda Williams1, Jau-Shyong Hong2, Xiang Simon Wang1*; 1Molecular Modeling and Drug Discovery Core for District of Columbia Developmental Center for AIDS Research (DC D-CFAR); Laboratory of Cheminfomatics and Drug Design, Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, Washington, District of Columbia 20059; 2Laboratory of Pharmacology, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709

3rd place (tie) - ”Cheminfomatic Studies of Human Ephs Receptorome”

Terry-Elinor Reid, Dejene Woldemariam, Tewodros Gashaw, Xiang Simon Wang; Molecular Modeling and Drug Discovery Core for District of Columbia Developmental Center for AIDS Research (DC D-CFAR); Laboratory of Cheminfomatics and Drug Design, Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, Washington, District of Columbia 20059

3rd place (tie) – “Practical Aspects of the Development and Implementation of Post Submission Quality Checks for the Data from the Breast and Colon Cancer Family Registries”

Andrea Gabriela Barbo, Mauricio Oberti, Sweta Ladwa, Peter B McGarvey, Subha Madhavan, Anca D Dragomir; Innovation Center for Biomedical Informatics, Georgetown University Medical Center