The BDHSC continues to encourage research collaboration that leverages existing data to address critical issues related to health behavior, clinical care, healthcare delivery, and population health. In its fourth year, the BDHSC awarded nearly $160,000 to 8 investigators from across the USC system. The details of the research projects are presented below.
The purpose of the Pilot Project Program is to stimulate and promote interdisciplinary research in Big Data health sciences by supporting meritorious applications that utilize existing data sources in order to address critical issues related to health behavior, patient care, healthcare delivery, and population health. The program will support research that uses a variety of data sources, including electronic health records data, social media data, geospatial data, genomic data, bio-nanomaterial data, and other publicly available or acquirable data. The issues to be addressed by the pilot projects can also include a variety of health outcomes at individual, community, health system, or population levels.
Ray Bai, Leveraging Side Information for Improved Prediction and Inference in Computational Drug Repositioning
PI: Ray Bai
Computational drug repositioning is a useful methodology for identifying novel therapeutic potentials and indications of existing, already approved drugs. Given that new drug development is often costly and lengthy, many researchers aim to reuse already approved drugs to treat diseases beyond their original scope. However, in order to optimize drug discovery, it is imperative to take into account side information, such as chemical structure and disease susceptibility genes. Only recently have methods for drug repositioning based on matrix completion (i.e. filling in the missing entries of a partially observed matrix) begun to utilize side information. Moreover, the overwhelming focus of computational drug repositioning has been on prediction, while important questions such as inference and feature selection have largely been ignored. This project will develop three novel methodologies for leveraging side information in drug repositioning: 1) a kernel-based approach for integrating side information from multiDrugbank and Pharmgkb.
Yuche Chen, Developing A Simulation-Based Big Data Approach to Measure Healthcare Accessibility Disparity
PI: Yuche Chen
Healthcare accessibility is defined as the relative convenience of achieving healthcare services at certain locations. It is a critical factor that influences population health and wellbeing. To identify, explain, and address disparities and inequities in healthcare accessibility, it is necessary to develop a system that can accurately assess the disparities and explore the factors that contribute to the disparities. In literature, traditional healthcare accessibility measurement tends to overestimate accessibility for individuals residing in underrepresented areas, such as low-income urban areas. To address this issue, there is a growing interest in developing measurement approaches that integrate multiple socioecological factors and reflect the complex interactions between them. One such approach involves using agent-based traffic simulations to simulate the healthcare-related travel of residents, with consideration of socioecological heterogeneity and its impacts. We propose to integrate a microscopic traffic simulation platform with big data resources to assess healthcare accessibility in South Carolina (SC), a state that has a limited healthcare infrastructure, a shortage of healthcare professionals, and significant disparities in healthcare accessibility. Specifically, we will first develop a platform that can accurately replicate residents’ journeys to healthcare facilities, incorporating precise spatial and temporal information. And we will use cellphone visitation data to verify the accuracy of the simulation model and its ability to replicate actual travel patterns. We will then create statistical models that utilize machine learning techniques to investigate the link between healthcare accessibility and healthcare outcomes while controlling for social-demographic factors at census tract level. Finally, we will create a framework for scenario analysis that can assess the potential benefits of transportation strategies and policies in improving health outcomes at the census tract level. Following the pilot project, our intention is to assess the effectiveness of our approach in identifying underrepresented communities and evaluate how modifications to transportation infrastructure can enhance healthcare accessibility and ultimately improve healthcare outcomes for these communities. The research experiences and publications obtained from this pilot study could become a foundation for the PI to apply for other NIH grants in Big Data techniques in public health.
Tessa Hastings, Jennifer Grier & Debbie Barrington, Associations between Multi-level Community Factors as Social Determinants of Pediatric Chronic and Infectious Disease
Pediatric chronic disease and infection are significant public health challenges in the United States. The prevalence of chronic disease in children and adolescents has increased significantly in recent decades, and children are also highly susceptible to viral infection. These conditions can have a profound impact on children’s health and well-being, and can lead to significant costs for families and the healthcare system. This grant proposal seeks to identify the factors that contribute to the growing burden of pediatric chronic and infectious disease in order to improve disease modeling methods and drive development of targeted intervention strategies. We will focus our investigation on the possible impact of neighborhood socioeconomic factors at the county level, the zip code level, and the census tract level within the Prisma Health patient population in the Upstate and Midlands of South Carolina. This study aims to 1) examine potential relationships between multilevel community factors associated with pediatric hypertension and pediatric infectious diseases, i.e., SARS-CoV-2 and RSV infection, 2) evaluate additional correlations between neighborhood community factors and other pediatric chronic diseases, including asthma, type 2 diabetes and obesity, and 3) compare and contrast study findings for the associations between neighborhood socioeconomic disadvantage and pediatric chronic diseases including hypertension, type 2 diabetes, obesity and asthma, with the associations between neighborhood deprivation and pediatric infectious diseases, including RSV infection. This research has the potential to make a significant impact on the health and wellbeing of children in the United States.
Jungmi Jun, Global Conversations on "Tobacco Harm Reduction" on Twitter: The Big Tobacco's Involvement and Interference with Tobacco Control Across Countries
PI: Jungmi Jun
We propose to conduct an analysis of global conversations on Twitter related to tobacco harm reduction (THR). Our study aims to investigate three main areas: (1) the tobacco industry’s involvement and connections with government, policy makers, scientists, and other stakeholders; (2) the industry’s marketing strategies and lay users’ perceptions of products with THR claims; and (3) the geographical and temporal distribution of sentiment towards THR within the US and globally.
Our focus is on the current THR controversy, which emerged from 2014 to the present, coinciding with the introduction of new generation products with THR claims. During this time, big tobacco companies have actively used THR claims to promote their novel products and improve their corporate image on social media. Our research addresses public health concerns regarding the industry’s THR claims and its potential to interfere with tobacco control efforts.
Mufaro Kanyangarara, Association Between Host Immune Response, Vaginal Bacterial Community Composition and Trichomonas Vaginalis Infection
Growing evidence shows that alterations in vaginal microbiota composition may be involved in the pathogenesis of Trichomonas vaginalis (TV), a neglected sexually transmitted infection (STI) that is poorly understood. The current proposal aims to elucidate the complex relationships between host immune response, vaginal microbiota and TV risk among adolescent girls and young women SC. To accomplish this, we will leverage a prospective longitudinal cohort of at-risk 300 adolescent girls and young women seeking care at pediatric clinics. Next-generation sequencing of 16S ribosomal RNA genes will be used on cervicovaginal swabs collected at baseline and during prospective follow-up visits at days 30, 60, 90, 180 and 365. If successful, the results have promise to yield informed clinical differentials and biomarker indicators of TV risk and novel targets for TV diagnosis, treatment and potentially vaccines. Furthermore, findings will serve as preliminary data for future extramural funding proposals
Stella Self, Quantifying Risk of Exposure to Lone Star Ticks with Machine Learning Methods
PI: Stella Self
This project proposes to develop a ‘nowcast’ model for predicting the risk of exposure to lone star ticks (A. americanum) using machine learning methods. Lone star ticks vector the pathogens responsible for a number of tick-borne diseases, including ehrlichiosis, tularemia and southern tick associated rash illness (STARI). Diagnosis of these diseases generally requires evaluating a patient’s potential for exposure to lone star ticks. Determining if a patient has been exposed is often difficult because tick bites are generally painless, and many patients do not remember being bitten. We propose to develop a machine learning model to quantify lone star tick activity in real time which can be used to assess a patient’s potential for exposure to lone star ticks. This model will be validated using data from active tick surveillance conducted in South Carolina from 2020 to 2023. Predictive factors will include weather data, land cover data, and Google Trends data on searches for terms related to lone star ticks. Deliverables include a database of all the necessary predictive factors, a manuscript describing the validated model, and a publicly available R Shiny application to allow users to obtain model predictions for times and places of their choosing. The model developed in this proposal will serve as proof of concept for a larger R01 proposal under development by the PI.
Qian Wang, Prediction of Pharmacokinetics of Antibiotics by Systematic Analysis of Patient's EHR, Proteome, and Metabolome Data
PI: Qian Wang
Due to the complexity of disease/drug effects on the pharmacokinetics of a drug, we propose this pilot study to evaluate the overall effect of disease/drug on the pharmacokinetics of antibiotics systematically by analyzing electronic health records (EHRs), proteomics, and metabolomics data collectively. We will use amoxicillin, the most prescribed antibiotic agent in the USA as a drug model. Our hypothesis is that patient health status, which can be assessed by the patient’s EHR, proteome, and metabolome, can be used to predict antibiotic PK. This, consequently, can assist in antibiotic dose adjustment in a timely manner to achieve precision medicine and decrease the occurrence of antibiotic resistance. We hope to establish the correlation between patient conditions (diseases/drug uses) with PK of amoxicillin through patient EHR, proteome, and metabolome. Two specific aims will be included in this proposal: (1) Deep Neural Network (DNN) will be used to extract patient information and establish the correlation between patient medical history and amoxicillin PK, and (2) Using patient proteome and metabolome to further improve the accuracy and sensitivity of the prediction of amoxicillin PK. The success of this proposal can assist clinicians in adjusting amoxicillin doses for each patient without additional tests which should increase effectiveness and potentially decrease the incidence of antibiotic resistance. Our ultimate goal is to create biomarker panels that can assist physicians in the optimization of antibiotic doses for each patient promptly.
Jingkai Wei, Early Initiation of Statins from Midlife on Incident Dementia: Emulation of Target Trials using Data of South Carolina Alzheimer's Disease Registry
PI: Jingkai Wei
Dementia has become a major public health issue in the U.S. Since there is no cure for dementia, successful prevention strategies are urgently needed. Hypercholesterolemia is found to be associated with dementia. Therefore, statins, the most effective lipid-lowering treatment in most cases, may have potential to reduce the risk of dementia. Observational cohort studies have shown inverse associations between statin use and risk of Alzheimer’s disease and mild cognitive impairment. However, evidence from observational studies is not sufficient to conclude with causal effects. Results from several large randomized controlled trials (RCTs) showed no difference of cognitive decline. While the reasons for null findings in the RCTs are unknown, the time for statins is a potential explanation. While midlife hypercholesterolemia is predictive of incident dementia, late-life hypercholesterolemia is not. As the development of cognitive decline and dementia starts early in midlife, inversible cognitive decline and impairment may have been caused by existing hypercholesterolemia, and statins initiated in late life may have missed the window for preventing dementia, while existing RCTs are mostly focused on older adults. While an ideal RCT should initiate statins among midlife adults and last for decades to observe incident cases of dementia in late life, such a trial is not feasible in practice. In addition, White population had higher proportion for statins use and better control compared to Black populations. Given the fact that the prevalence of dementia is higher among Black population, initiation of statins among patients with hypercholesterolemia from midlife may reduce the racial disparities on dementia. To fill the gap of early initiation of statins on prevention of incident dementia among individuals with hypercholesterolemia, we will emulate target trials using observational data. This proposal brings together experts in dementia, causal inference, cardiovascular disease, epidemiology, and biostatistics, and propose to use the linked datasets of the South Carolina Alzheimer’s Disease Registry, the State Health Plan of South Carolina, and the Medicaid, which include information of individuals with consecutive information of medication use for about 30 years. We aim to 1) emulate a target trial of initiating statins from midlife (45 to 64 years) among patients with hypercholesterolemia using observational data, and to estimate the effect on preventing incident dementia in late life, 2) estimate the effectiveness of initiating statins from midlife with different lengths of time after diagnosis of hypercholesterolemia, and 3) evaluate the effectiveness of statins initiated from midlife on dementia among individuals with hypercholesterolemia by racial groups. We hypothesize that statins initiated from midlife will achieve significant reduction on incident dementia with timely initiation, and the effectiveness will be achieved in all racial groups. This proposal is innovative in that it uses causal inference methods to emulate target trials of early statins use on preventing dementia, which may provide the best possible evidence for clinicians and public health professionals to make decisions, given an RCT is not available.