The purpose of the Pilot Project Program is to stimulate and promote interdisciplinary research in Big Data health sciences by supporting meritorious applications that utilize existing data sources in order to address critical issues related to health behavior, patient care, healthcare delivery, and population health. The program will support research that uses a variety of data sources, including electronic health records data, social media data, geospatial data, genomic data, bio-nanomaterial data, and other publicly available or acquirable data. The issues to be addressed by the pilot projects can also include a variety of health outcomes at individual, community, health system or population levels.

In its inaugural year, the BDHSC Pilot Project Program aims to encourage research collaboration on campus that will leverage existing data to address critical issues related to health behavior, clinical care, healthcare delivery, and population health. A total of $498,875 was awarded to 12 UofSC faculty across campus. 

1. PI – Swann Arp Adams, Ph.D., MS

Title: An Investigation of Racial and Geospatial Disparities in the Utilization, Adherence, and Economic Cost to Targeted Therapies for Breast Cancer

Abstract: Targeted therapies (TTs) for breast cancer are considered the ‘cutting edge’ of treatment to extend survival, particularly human epidermal growth factor 2 positive (HER2+) tumors. Given these medications’ potential impact on survival, it is critical that we understand utilization and adherence patterns, particularly among vulnerable populations. An examination of the utilization and adherence of targeted therapies will inform future interventions at multiple levels (patient, provider, and policy), as well as serve as a model for other states, aimed at ensuring that ALL women receive the most novel and innovative treatments available, which have been proven to significantly improve survival. We propose to leverage previous federal funding (1R15CA179355-01A1) to expand our cohort of breast cancer survivors in South Carolina diagnosed between 2002 and 2010 linked with complete treatment data from Medicaid and Employee State Health Plan administrative data. With this proposal, we aim to expand upon the existing cohort to include all breast cancer cases diagnosed through 2017 with Medicaid or State Health Plan treatment data through 2020. With our multidisciplinary team comprised of a cancer epidemiologist, health economist, nurse scientist, and clinician, health geographer, social worker, and pharmacy health services researcher, we aim to: describe and compare the utilization of TTs for the treatment of BrCA among women in South Carolina (overall, by race, by geography, and SES); describe and compare the adherence to TTs for the treatment of BrCA among women in South Carolina (overall, by race, by geography, and SES); and describe and compare the economic cost of TTs for the treatment of breast cancer among women in South Carolina (overall, by race, by geography, and SES). The results from this work will extend the field of racial and geographic breast cancer disparities into the novel and innovative field of targeted therapies representing personalized medicine applications. Additionally, this work will provide important preliminary work and team collaborations for planned grant applications by not only the PI (June 2021), but also the junior faculty members of the team.

2. PI – John M. Brooks, Ph.D.

Title: Personalizing Evidence for Shoulder Fracture Patients Using the Instrumental Variable Causal Forest Nonparametric Machine Learning Algorithm

Abstract: We propose adapting a new novel nonparametric machine learning algorithm — the Instrumental Variable Causal Forest Algorithm (IV-CFA) 24,54-62 — to personalize evidence on the effects of early surgery on both benefits and costs for Medicare patients with proximal humerus fractures. The “Base” Causal Forest Algorithm (CFA) to personalize evidence was introduced in 2017 and the developers acknowledged that estimates from base CFA were susceptible to confounding bias with observational data.24,54-61 In response to this problem, CFA developers built the IV-CFA version in 2019 that supports instrumental variable-based treatment assignment that displayed positive properties in simulation modeling.66 To date, there have been no published studies using IV-CFA with observational data and instrumental variables in healthcare. Our “big” database with 72,823 Medicare beneficiaries with proximal humerus fractures and an instrumental variable with a demonstrated strong impact on early surgery choice provides the perfect setting to apply, assess, and advance the use of IV-CFA approach with observational healthcare data. We will assess the ability of the IV-CFA to create personalized cost-effectiveness ratios associated with early surgery for Medicare patients with proximal humerus fractures. This study offers a unique opportunity in Big Data Analytics to leverage existing data improve the quality and efficiency of healthcare. Successful completion of this study will place University of South Carolina researchers at the cutting-edge of the development of this exciting and valuable approach.

3. PI – Guoshuai Cai, Ph.D. 

Title: Big Genomic Data Analysis to characterize TME-methylation-expression Regulatory Axis in Colorectal Tumors

Abstract: In this exploratory project, we propose to study the important role of tumor microenvironment (TME) in colorectal tumor development via regulating tumor intrinsic methylation mediated expression, by systematically analyzing publicly available big genomics data of CRC. Our three aims are 1) identify the variation of TME cell composition in CRC and its mutation drivers and clinical effects, 2) study the genome-wide aberration of transcriptional regulatory methylation in CRC and its potential mutation drivers, underlying pathways and clinical effects and 3) Investigate the interactions among TME cell composition, methylation and expression in CRC. Upon the success of this proposed study, new clues on how colorectal tumor-intrinsic mechanisms cooperate with the extrinsic factors in TME will be found, which will significantly benefit the searching new effective targets from tumor microenvironment and epigenetics for treating CRC. The data collected in this pilot study could open critical avenues to translational collaborations and future NIH funding opportunities on colon cancer research.

4. PI – Ronda Hughes, Ph.D., MHS, RN, FAAN 

Title: Using EHR and Community Data to Predict Medication-Related Post-Discharge Acute Care Utilization

Abstract: For reasons that remain incompletely understood, African-Americans experience a disproportionately large share of the cancer burden. The proposed study will test the hypothesis that one contributing, but so far overlooked, factor is greater risk of multiple primary cancers (MPCs). Advances in the early detection and treatment of cancer have led to improvements in overall cancer survival. The increased likelihood of surviving a first cancer diagnosis has increased the population of individuals at risk for developing a second primary cancer. The upward trend in MPC occurrence, which currently constitutes more than 12% of all cancer diagnoses in South Carolina, accentuates the need for research on this topic. An important but understudied question is whether MPCs are an important contributing factor to the overall cancer-related racial disparities. This study aims to determine: (1) racial/ethnic differences in the incidence rate of MPCs, and the impact of MPCs on racial differences in cancer mortality rates, (2) racial/ethnic differences in the pattern of MPCs by type of malignancy and the temporal order of these malignancies, and (3) the contribution of tobacco use, alcohol use, and obesity to the incidence rate of MPCs, and the contribution of these factors on racial/ethnic differences in the risk for MPCs. Guided by the National Institute on Minority Health and Health Disparities Research Framework, the research team will use data from the South Carolina Central Cancer Registry, linked to data from Medicaid, the State Health Plan, and inpatient hospitalization/ emergency department claims data from the SC Revenue and Fiscal Affairs Office to conduct a series of incidence analyses and multivariable logistic regressions to investigate these three research aims. The long-term results of this line of inquiry can potentially lead to recommendations for informing patient education/care, improved clinical decision making, reducing healthcare costs, and eliminating cancer disparities. One short-term outcome includes the career development of Dr. Owens (principal investigator) who will have an opportunity to enhance his skills in the management and analysis of a large longitudinal, quantitative cancer database through a partnership with seasoned collaborators (Wooten & Alberg). A second short-term outcome of the proposed study will provide the foundational skills and primary data for a larger multiple primary cancer research project, and position the investigators to apply for federal funding through the National Institutes for Health (R01 or R21).

5. PI – Anwar T. Merchant, Sc.D., MPH, DMD 

Title: Electronic health records to estimate effects of dental treatment on systemic health

Abstract: Poor oral health has been associated with increased risk of heart disease, stroke, diabetes, hypertension, Alzheimer’s disease, rheumatoid arthritis, adverse birth outcomes, and other adverse health outcomes in numerous observational studies. However, evidence evaluating the causal effect of dental treatment on systemic outcomes is scarce. Generating this information would provide the basis for actionable, evidence-based steps to increase coordination between medical and dental care providers and improve overall health. In this pilot project we propose to create a virtual cohort by linking electronic medical records from over 300,000 Kaiser Permanente, Georgia (KPGA) members with electronic dental records from KPGA’s dental insurance partner Delta Dental and use instrumental variable methods to evaluate the causal effect of dental treatment on systemic outcomes. To demonstrate feasibility of this approach in this pilot study we will assess the causal effect of periodontal treatment on type-2 diabetes incidence and glycemic control among individuals with type-2 diabetes.

6. PI – Bankole Olatosi, Ph.D. 

Title: Leveraging the power of Big Data for predicting future STDs among PLWH: A Pilot Study

Abstract: This study proposes to use big data science techniques to develop an algorithm for predicting future transmission of sexually transmitted diseases (Chlamydia and Gonorrhea) among all People Living with HIV (PLWH) in South Carolina. After a decline in 2009, annual increases in rates of sexually transmitted diseases (STDs) have been reported across the United States (US) with Chlamydia (CT) and gonorrhea (GC) the most reported. In 2018, CT total of 1.8 million cases was the highest ever reported to the CDC with GC (0.6 million) the second most commonly reported STD. GC is reported to be highest among low socioeconomic status women of color, while GC was highest among men who have sex with men. Recent studies show a growing incidence of STDs among People Living with HIV (PLWH). Literature estimates approximately 5-10% of PLWH receiving HIV medical care as actively infected with CT or GC at any period. Since the primary mode of CT/GC transmission is sexual transmission, this holds implications for HIV transmission from PLWH to uninfected persons. PLWH infected with an STD post HIV diagnosis are also at risk of becoming infected with drug resistant viral strains and being exposed to other STDs. Based on these issues, this proposal plans on using Big Data science techniques in predicting future STD infection among SC PLWH so to guide future targeted interventions.

7. PI – Caroline Rudisill, Ph.D., MSc 

Title: Evaluating a technology EMR-based strategy to intervene on social determinants of health-related needs in a large health system

Abstract: Health policy reform and political interest and commitment over the last decade has encouraged increased attention on factors outside of the health care system impacting individuals’ health outcomes. Health systems are now investing in these issues such as housing for homeless patients (Montefiore in New York) and healthy free food for food insecure patients with diabetes (Geisinger in Pennsylvania). Since June 1, 2019, Prisma Health patients with SDoH-related needs who are in ambulatory care management, inpatient case management and community health programs are referred electronically to local resources via a digital platform called NowPow. This pilot grant will provide funding to link three data sets (NowPow referrals, patient EMR records and South Carolina Department of Health and Environmental (SCDHEC)/US Census Bureau data) and examine the first year of electronic referrals via NowPow for SDoH-related needs at Prisma Health. The linkage between NowPow, the EMR and SCDHEC and Census data allows for rich health services research and geospatial analysis as well as funding potential. This pilot grant aims to do the following: (1) build a linked dataset of the study period June 1, 2019 – June 30, 2020 for about 4,000 electronic referrals via NowPow for research purposes, (2) investigate the characteristics of individuals making and receiving referrals via NowPow and (3) understand which parts of the community are most being touched by SDoH-related referrals via geospatial analysis. This pilot grant will make it possible for currently collected data to be available for research purposes. This is in anticipation of introducing systematic Prisma Health-wide annual screening and electronic referrals in Summer 2020. This pilot work will position the team via pilot data analysis, publications and other dissemination efforts for an R01 that would build and analyze linked datasets related to SDoH screening and related referrals for Prisma Health’s 1.2 million patients across South Carolina.

8. PI – Benjamin Schooley, Ph.D., M.B.A. 

Title: Using Natural Language Processing to Generate Treatment Decision Themes from Clinical Encounter Notes for Patients with Shoulder Conditions

Abstract: In this project we take advantage of a unique and novel data source developed by the project team over the last three years, the orthopaedic data repository (OPDR). We apply Natural Language Processing (NLP) and Machine Learning (ML) tools and techniques to analyze ~2,000 unstructured clinical encounter notes for atraumatic rotator cuff tear (ARCT) patients to understand the clinical evidence and decision criteria for making personalized treatment decisions. The resulting NLP-ML framework will enable critical clinical evidence for ARCT patients, currently hidden in unstructured notes, to be systematically extracted and analyzed from large observational datasets (e.g., EHRs), and effectively structured to augment future decision making.

9. PI – Homayoun Valafar, Ph.D. (Awarded $45,445)

Title: Utilization of Artificial Intelligence assisted data analytics to better predict progression of chronic kidney disease to end stage than currently available predictive markers

Abstract: Chronic kidney disease (CKD) is a significant public health concern. Patients with CKD have higher risks of adverse events (AEs) such as cardiovascular disease and death. As important, patients with CKD are at risk for progression to end stage kidney disease (ESKD).

Development of ESKD is associated with a several fold increase in AEs, and marked increase in resource utilization thus associated with a dramatic increase in health care expenditure. In addition, the ideal treatment for ESKD remains kidney transplantation, a highly expensive form of therapy as well. Despite these findings, rates of progression of CKD to ESKD is highly variable among individual patients. Despite the availability of robust predictive markers for CKD progression, notably the current estimated glomerular filtration rate (eGFR) and albuminuria, there remains wide inter-individual variability at similar levels of these markers. Artificial intelligence (AI) and machine learning (ML) algorithms have been utilized to improve accuracy but limitations of available literature include small sample sizes and limited patient populations to single institutions. The current proposal seeks to leverage the national dataset available through the VHA electronic health record (EHR) to use a variety of readily available clinical structured and unstructured parameters to develop algorithms that can better predict with greater precision and accuracy in the greatest number of individuals the risk of progression of CKD to ESKD.

10. PI – Yuan Wang, Ph.D., MPhil 

Title: Topological Network Analysis and Graph-Based Deep Learning of Multimodal MRI: An ENIGMA-Epilepsy Study

Abstract: Epilepsy is marked by sudden recurrent episodes of sensory disturbance, loss of consciousness, or convulsions, affecting over 50 million people worldwide. Approximately one third of epilepsy patients are resistant to anti-epileptic drug treatment and require additional diagnostic procedures such as electroencephalographic (EEG) evaluation to localize the epileptogenic zone, neuronal network capable of generating seizures, for surgical resection. This approach, however, relies heavily on the expertise of the specialist clinicians reading the EEG. Neuroimaging techniques such as magnetic resonance imaging (MRI) thus play a critical role in the diagnosis of patients with focal epilepsy through identifying visible lesions. Yet, currently around 20 – 45% of focal epilepsy patients do not show lesions on MRI, let alone generalized epilepsy cases that are by default non-lesional. A data-driven approach has shown evidence in improving imaging diagnosis and prognosis of epilepsy, but existing studies tend to be limited in sample sizes and power. In this project, we overcome the limitation by leveraging the first and currently the largest international neuroimaging database on epilepsy provided by the Enhancing Neuroimaging Genetics through Meta-analysis (ENIGMA) Consortium and its Global Alliance for Worldwide Imaging in Epilepsy (ENIGMA-Epilepsy). The global initiative integrates neuroimaging data from over 2,100 epilepsy patients from 24 sites in 14 countries, thus providing unprecedented power of analysis and a unique opportunity to answer complex clinical questions in epilepsy. We will achieve two specific aims in this project using structural and diffusion MRI (sMRI and dMRI) in the ENIGMA-Epilepsy database: 1) detect subtle structural brain abnormalities associated with both focal and generalized epilepsy syndromes with topological network analysis on sMRI and dMRI; 2) predict epilepsy treatment outcomes by building graph-based deep learning algorithms on sMRI and dMRI.

11. PI – Whitney Zahnd, Ph.D. 

Title: A Spatial Approach to Evaluating Potential and Realized Access to Broadband Services – A ‘Super Determinant of Health’

Abstract: The Federal Communications Commission (FCC) has identified broadband access as a “super-determinant” of health. Increasing access to broadband services has been included as part of federal health improvement objectives (e.g. Healthy People 2020) over the past decade. Increased access to broadband services improves access to health care services such as telehealth, internet-based health information, as well as both work and educational opportunities. Increasingly, the relationship between access to broadband services and health outcomes, such as cancer, has become important to federal agencies as evidenced by burgeoning partnerships between the FCC and the National Cancer Institute (NCI). The FCC regularly collects information on access to broadband services, defined as the presence of any broadband provider within a census block (i.e., potential access). However, these data are provided by broadband service providers, and there is concern that broadband access is overestimated. Recently, data from the U.S. Census Bureau’s American Community Survey on self-reported broadband access in the home (i.e., realized access) has become available at small geographic units, such as census tract, but few studies have examined the geographic distribution of broadband access as characterized by these data. The goal of this pilot study is to evaluate potential and realized accessibility to broadband services throughout census tracts in the contiguous United States using geospatial and statistical methods. In aims 1 and 2, we will use exploratory spatial data analysis (ESDA) approaches to explore the spatial distribution of potential and realized access to broadband services and to determine the agreement between potential and realized access to broadband services. In aim #3, we will use spatial regression modeling to identify predictors of potential and realized access to broadband services. This pilot study will involve an interdisciplinary collaboration between Dr. Whitney Zahnd (PI; Health Services Epidemiologist; Arnold School of Public Health) and Dr. Nathaniel Bell (co-I; Medical Geographer; College of Nursing) who have complementary expertise in health services research, medical geography, and utilization of secondary databases to study the social determinants of health. This pilot study has three specific deliverables: 1) expand the research foci and networks of the study team; 2) contribute to the burgeoning research literature on broadband access and health through presentation at academic meetings and peer- reviewed publications; 3) generate preliminary findings for inclusion in future grant proposals (e.g., NCI’s Notice of Special Interest on Geospatial Approaches in Cancer Control and Population Sciences).

12. PI – Jiajia Zhang, Ph.D.

Title: Improving Mental Health Utilization through Advanced Statistical Modeling using Multiple Hospital Electronic Health Record

Abstract: Although psychologists, psychiatrists and other healthcare professionals have made strides over decades in suicide prediction and prevention, critical knowledge gaps still exist. The increased availability of health information technology and advanced analytics can improve the knowledge of the dynamic process of suicide behaviors (suicide ideation/attempts). Effective health care has been identified as a major protective factor for suicide; yet currently, health services utilization is a challenge for most of those at risk of suicide attempt. The available but underutilized Health Sciences South Carolina data sources has provided us with an opportunity to improve both suicide prediction via advanced statistical modelling. In this proposed study, advanced statistical modelling will be applied to address the knowledge gaps in suicide prediction and health care services utilization. First, we will use data from HSSC to define a unique high-risk patient cohort, including 1) patients with previous suicide ideation and 2) patients with a mental health diagnosis, including all high-risk patients age>10 in SC. Second, all hospital visits recorded in the HSSC system will be extracted. We will use the integrated data to develop advanced statistical models to predict the suicide risk (i.e., suicide ideation/attempt) for the patient cohort (having suicide ideation or mental health diagnosis) and to identify the longitudinal pattern of health services utilization. The innovations of the proposed research include the use of advanced statistical methods to reveal the dynamic patterns among suicide risk, health care services utilization, suicide ideation/attempt, which can capture the changes of suicide risk over time and across contexts and provide risk prediction within different time windows. The resultant prediction models can provide important evidence for suicide prediction and can help inform both the targets (e.g. “who”) and timing (“when”) of suicide prevention and clinical care among patients at high risk of suicide in SC and beyond.