Precision Medicine Informatics
1. ClinGen: HI faculty are leading and organizing community data standards for curation of genetic variations within the NIH-funded ClinGen program, to augment precision medicine research. Informatics research has included:
- Novel standard called Minimal Variant Level Data (MVLD) to guide reporting of clinical lab test results for molecular diagnostics;
- Natural Language Processing tools to automate the extraction of variation, treatments and outcome relationships for diseases from the biomedical literature;
- a computational approach for selection of therapies targeting drug resistant variation that won the Marco Ramoni award at AMIA
- a workflow for molecular simulation and functional analysis of protein variants; and,
- a Network Approach to Recommending Targeted Cancer Therapies.
2. CPTAC: The Clinical Proteomic Tumor Analysis Consortium (CPTAC) of NCI’s Office of Cancer Clinical Proteomics Research is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to tumor samples with characterized genomic and transcript profiles. The combination of proteomics, transcriptomics, and genomics data from the same tumor samples provides an unprecedented opportunity for tumor proteogenomics as illustrated by several consortium wide high-profile Nature and Cell publications.
Read More
Advanced Bioinformatics Tools and Research Platforms:
HI develops innovative scientific software to enable translational research. Our projects include muti-omics data analysis, vaccine safety research, clinical data analysis, high definition data visualization, natural language processing, and mobile application development. Some of our open science projects include:
G-DOC: Georgetown Database of Cancer (G-DOC) is a precision medicine platform that enables the integrative analysis of multiple data types to understand disease mechanisms, biomarker discovery, data management and education.
Virtual Research Environment (VRE): VRE is a secure cloud platform for research and education. BI took on the task of creating a virtual research environment (VRE) in the cloud leveraging the Google Cloud Platform (GCP) for provisioning computing resources, securely storing and sharing data. VRE was designed and developed to overcome barriers met by the research community while complying with institutions’ policies and current state and federal policies and regulations. VRE is being developed to become GHUCCTS’ preferred and recommended secure cloud service for research, education and data sharing needs. When used appropriately, VRE can also be used to store, manage, and analyze electronic protected health information (ePHI).
VRE also enables our programs and investigators to participate in and share de-identified / standardized data with the community and research networks. VRE is a multi-mission platform that can facilitate the advancement of science, education, and services.
Clinical Study Data Collection and Surveys: REDCap (Research Electronic Data Capture) is a research tool developed at Vanderbilt University as a secure web application to allow users to build and manage online surveys and databases, and to support data capture for research studies.
Read More
Health Data Science
HI faculty and staff conduct research using public and proprietary datasets to advance Health Data Science and Precision Medicine with the goal of deriving actionable knowledge from genomics, electronic health records, registries, patient-reported, public health and other datasets some of which include:
- Immuno Oncology Registry: a centralized research data warehouse for ImmunoOncology that is enabling novel hypothesis generation and retrospective outcomes research at the 10 DC-Baltimore based MedStar Health network hospitals.
- Pediatric cancer outcomes registry: A database of pediatric cancer patients that were diagnosed with various cancers at Lombardi Cancer Center’s Pediatric Oncology Program and were enrolled or treated as per Children’s Oncology Group (COG) protocols between 1990 and 2014.
- Rembrandt brain tumor registry: REMBRANDT includes genomic data from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma, and a number that are mixed or of an unknown subclass. Outcomes data include more than 13,000 data points.
- VA suicide ideation project: Using novel approaches to extract acoustic and semantic features from audio interviews to predict suicidal tendencies in military veterans. A classifier was built that differentiates suicidal from non-suicidal veterans based on acoustic features of speech and sentiment analysis of transcribed narratives.
Read More
Data Science Challenges
Some of our active and recent challenges:
- PrecisionFDA Challenge: PrecisionFDA and HI launched the Brain Cancer Predictive Modeling and Biomarker Discovery Challenge-This challenge asks participants to develop machine learning and/or artificial intelligence models to identify biomarkers and predict patient outcomes using gene expression, DNA copy number, and clinical data.
- COVID-19 Data Visualization Challenge: This initiative strives to generate insights that could lead to a better understanding of impact of physical distancing and the outbreak. We aim to bring amazing talent to work on the data and generate insights that can benefit the global community’s work to understand and control the spread of this pandemic.
Educational Activities
In 2019, we launched a Master of Science program in Health Informatics & Data Science (HIDS). HIDS is an accelerated, career-ready program, focused on current and emerging technologies. Students will gain competency in health data science, big data analytics, artificial intelligence and machine learning applications. The curriculum aligns with the core competencies in medical informatics defined by of the American Medical Informatics Association. HIDS is an industry-driven program, focused on current and emerging technologies that will inform healthcare and is well poised to create a pipeline of top talent of students and trainees for GHUCCTS research as well as educate the next generation of leaders in informatics to transformation healthcare.
SDOH Data Analytics:
In coordination with the AIM-AHEAD consortium, the HI team members have designed and developed a comprehensive set of core competencies and resources related to Social Determinants of Health (SDOH). These resources are accessible to the GHUCCTS community for training, research, and development. The collaborative effort, led by a multidisciplinary team, includes features that align with current research trends and leverages data science, health informatics and healthcare expertise. The workshops aim to educate a diverse audience, including scholars, researchers, and analysts, through hands-on sessions and access to data, tools, and methods. Coming soon….
AI for Health Care Applications:
HI team published a series of self-contained python notebooks, each with an accompanying recorded tutorial and example datasets. The notebooks provide a written narrative of the python libraries that are used to clean/build training sets, define AI model architecture, and evaluate model performance. Help sessions with some live presentations were also be provided.
Access to Research Datasets and Data Governance
HI has done important work on the management and distribution of biomedical data in support of research projects. Many of these informatics projects that started at GHUCCTS BI have developed independent funding from the NIH and other HHS agencies.
Several datasets are currently hosted, curated and managed by BI and available to the GHUCCTS community and investigators for research use. | View Datasets
HI data managers also act in the role of an honest brokers to provide electronic medical records, registry and other patient data collected during clinical care and operations for research purposes this includes access to:
- PHI for research purposes
- Limited Data Sets
- Cohort Discovery (aggregate number) or De-Identified data.
Data Access Policies