The BDHSC continues to encourage research collaboration that leverages existing data to address critical issues related to health behavior, clinical care, healthcare delivery, and population health. In its third year, the BDHSC awarded nearly $250,000 to 10 investigators from across the UofSC system. The details of the research projects are presented below. Additionally, among the awarded proposals, Matt Halderman and Ana Pocivavsek’s proposals are co-funded by the School of Medicine (SOM) and South Carolina IDeA Networks of Biomedical Research Excellence (SC INBRE) respectively.
The purpose of the Pilot Project Program is to stimulate and promote interdisciplinary research in Big Data health sciences by supporting meritorious applications that utilize existing data sources in order to address critical issues related to health behavior, patient care, healthcare delivery, and population health. The program will support research that uses a variety of data sources, including electronic health records data, social media data, geospatial data, genomic data, bio-nanomaterial data, and other publicly available or acquirable data. The issues to be addressed by the pilot projects can also include a variety of health outcomes at individual, community, health system, or population levels.
A Two-Pronged Big Data Approach to Critically Analyze Strongyloides Stercoralis Infections Among Rural, Impoverished South Carolina Residents
Strongyloides stercoralis is a parasitic roundworm of the human gastrointestinal tract that causes chronic infection with potentially fatal results. If untreated, infected children can develop irreversible long-term morbidity, including cognitive and growth delays. Sadly, vulnerable populations living in rural poverty are the most susceptible to infection and this chronic disease further exacerbates the cycle of poverty. Strongyloides is known to persist throughout the rural, impoverished southeastern United States, but high-quality prevalence data is lacking due to absence of ongoing surveillance. South Carolina has a long history of Strongyloides infection; however, no contemporary investigations have been performed since appropriated funding ended in 1984. This project aims to elucidate the prevalence of human Strongyloides infections in South Carolina using two complementary approaches. First, to estimate prevalence, the PI will perform active surveillance using Strongyloides serology testing via strategic sampling of a subset of banked serum samples from the ALL-IN COVID-19 study. Second, passive surveillance will be conducted via electronic health records query at Prisma Health system for Strongyloides cases. Demographic, socioeconomic, risk factor, and health outcomes data will be collected for all positive cases and two matched negative controls. These factors will be evaluated for Strongyloides infection transmission risk association and clinical profiles. Lastly, geospatial statistics will be employed to create an infectious disease forecast model for public health intervention. In closing, the PI is an early-career faculty at the UofSC School of Medicine with a strong interest in health disparities, and this award would provide critical pilot data for future R01 grant proposals on infectious diseases of poverty.
Abnormalities in Diurnal Behavior and Gene Expression Profile in the Brain: Relevance to Neurodevelopmental and Psychiatric Disorders
The proposed Big Data Health Science study is designed around the central hypothesis that diurnal behaviors are related to rhythmic differential gene expression. Our goal is to leverage information learned from our hypothesis-driven experimental animal system, evaluate an artificial intelligence and machine learning (AI-ML) approach to phenotyping behavioral dynamics (sleep and wake state behaviors), and investigate rhythmic molecular mechanisms in the brain through analysis of genomic data. We will leverage re-analysis of existing data sets collected from a well-validated experimental animal system relevant to the study of neurodevelopmental insults in the development of psychiatric illness. To this end, we hypothesize that elevated kynurenic acid during the prenatal period (‘EKyn’) disrupts diurnal behaviors and gene expression rhythms, and that Big Data analytics approaches can be leveraged to decipher these dynamics. Applications that we develop can be applied to validating preclinical experimental systems for behavioral and genetic endophenotypes associated with mental health disorders. In Aim #1, we will apply AI-ML approaches to differentiate homeostatic sleep and arousal parameters using rodent data that combines electroencephalography (EEG) and electromyography (EMG) to evaluate diurnal dynamics of rodent behavior in our neurodevelopmental insult paradigm (‘EKyn’) In Aim #2, we will evaluate the rhythmicity of gene expression in the brain and classify biologically-relevant pathways in the brain that are disrupted in our neurodevelopmental insult paradigm (‘EKyn’) Taken together, preliminary data generated from the proposed Big Data Health Science project will be submitted for extramural funding to uncover behavioral (sleep and wake rhythms) and genomic (gene expression rhythms) endophenotypes linking neurodevelopment, molecular alterations and neuropsychiatric illness.
Determining A Functional Hemoglobin Threshold to Define Anemia in Children And Women
Anemia affects approximately 40% of children, 30% of non-pregnant women, and 37% of non-pregnant women across the world, and is a key contributor to maternal mortality, birth outcomes, and child development. The World Health Organization (WHO) is currently reviewing the evidence around hemoglobin cutoffs used to define anemia, which were developed more than 50 years ago and based on predominantly white adult populations from North America and Europe. To inform the diagnosis of anemia, we have developed a strategy to examine hemoglobin concentrations that correspond with meaningful differences in functional health outcomes. Using data from children, pregnant and non-pregnant women from all Prisma Health sites in the Upstate of South Carolina, our objectives are to: 1) identify hemoglobin thresholds associated with optimal health outcomes, such as birth weight, preterm birth, depression, and child developmental milestones, and 2) determine whether the etiology of anemia modifies the hemoglobin thresholds. A combination of area under the receiver operating characteristic curves, generalized additive models, and machine learning techniques will be used to generate hemoglobin thresholds. Results will inform WHO guidelines on the hemoglobin thresholds to define anemia and provide some of the first preliminary evidence for the potential role etiology of anemia in the association between hemoglobin concentrations and functional outcomes. This study will also form the basis for extramural grant applications to examine similar research questions in other US states, other countries, and additional populations, such as adolescents, adult men, and elders.
Developing A Novel Network-Based Big Data Approach to Measure Healthcare Utilization Disparity: A Feasibility Study
Healthcare utilization is a critical factor that influences population health and wellbeing. To identify, explain, and address disparities and inequities in healthcare utilization, it is necessary to develop a valid measurement approach that can accurately capture the disparities and explore the factors that contribute to the disparities in a timely manner. Increasing attention is being paid to developing constructs or measurement approaches that can reflect complex interplays of factors at multiple socioecological levels. The availability of healthcare Big Data (e.g., large place visitation data sampled from mobile devices and electronic health records [EHR]) and advanced Big Data analytics makes it possible to use Big Data approaches to address existing knowledge gaps in measurement methodology including a lack of real-world evidence, limited availability of real-time and large-coverage datasets, and a dearth of studies applying multilevel perspectives. In this pilot project, we propose to develop a network-based big data approach to measure and visualize disparities in healthcare utilization in South Carolina (SC). Specifically, we will first develop a machine learning-based network prediction model to construct a statewide healthcare visitation network using cellphone-based place visitation data and ground-truth EHR data (for model training and refining); based on the validated statewide healthcare visitation network, we will detect actual catchment areas of healthcare facilities and develop healthcare utilization measures (indices) using geographically constrained network partition and aggregation. We will then test the performance and utility of the network-based big data approach in revealing healthcare utilization patterns using multivariate geo-visualization. Leveraging our fruitful collaboration with the state’s health department and health agencies and successful experiences with implementing NIH-funded Big Data studies since 2017, we will be able to develop this network-based big data approach for analyzing healthcare utilization disparity, which, with proven efficacy, will contribute to the paradigm shift from sampling-based study to population-based real-world study and the examination of interplays between factors at various socioecological levels. The research experiences and publications obtained from this pilot study could become a foundation for the PI to apply for other NIH grants in Big Data techniques and implications in public health.
Electronic Health Record Data to Evaluate the Effects of Cardiorespiratory Fitness on Chronic Kidney Disease
Chronic kidney disease (CKD) is a major global public health issue, affecting over 10% of the population worldwide. Over the past three decades the global mortality rate declined significantly for cardiovascular disease (CVD) and cancer, but a similar decline was not seen for CKD. CKD is also a risk multiplier in patients with hypertension and diabetes, therefore poses significant burden for individuals, healthcare systems and societies with increased hospitalization, productivity loss, morbidity and early mortality. Physical activity (PA) has been identified as a strong risk factor for CVD, some cancers, and premature death. However, the beneficial effects of PA on CKD are rarely investigated with majority of evidence from cross-sectional studies. Intervention studies are primarily focused on patient in the later stages with small sample size and short followup of the intervention trials. Lastly, African American (AA) were nearly two times more likely to develop CKD compared with white and had a lower magnitude of improvement in cardiorespiratory fitness (CRF) responses from formal exercise interventions. There is a gap in our understanding of the type and level of CRF (threshold) necessary to attenuate the risk of developing CKD. This is especially true for different age groups, sex and races, as the exercise volume necessary for health benefits is age-specific and may even be sex- and racespecific. A better understanding of these differences regarding the CRF-CKD association will lead to a comprehensive and more effective approach in the prevention and management of CKD in high-risk populations. In this pilot project we propose to identify a cohort of 761,520 US Veterans (713,425 men and 48,095 women, 17.6% AA, 5.3% Hispanic), ages 30-96 years, with sequential CRF data measured objectively by an ETT from the electric health record data of the VA system. This information provides us with a unique opportunity to assess the role of CRF in the prevention of CKD specific to age, sex, and race. Specifically, we propose to define the exercise threshold and identify values of CRF that are linked to the lowest risk of CKD. It is hypothesized that higher CRF is independently linked to a lower risk of incident CKD and differs by age, sex, and race. We will use the non-Veteran population from the Cooper Center Longitudinal Study (CCLS) to validate our study findings.
Examining Socioeconomic and Racial/Ethnic Disparities in Pedestrian and Bicycle Crashes Across South Carolina
This innovative study will compile South Carolina crash and population data and analyze differences in crash scores by census block group socioeconomic disadvantage and race/ethnicity. Empirical evidence illustrating the disproportionate effects pedestrian and bicyclist crashes have on disadvantaged groups can advance advocacy and policy making efforts to address logistical and infrastructure concerns in under resourced areas of South Carolina and beyond.
Identifying Risk Factors Associated with Health Disparities and Recovery Strategies in Perinatal Polysubstance Use Disorder and Adverse Birth Outcomes: A Multi-Data Source Analysis Approach
Substance use among young pregnant women is a significant public health concern, which has negative impact to maternal health and fetal birth outcomes including low birthweight, poor brain development, inadequate nervous system, poor behavioral and memory issues. However, most current studies focus on single substance use in pregnancy, which do not reflect the reality that polysubstance, meaning pregnant women use multiple substances, such as smoking and alcohol, is more common in pregnancy. Unfortunately, research started to show polysubstance use in pregnancy has more severe consequences to maternal and fetal health, and it is mostly unknown but critical area that requires more significant research, caused by the current data limitations mainly from self-reported data and difficulties of collecting such data and associated legal complications and public health implications. As such, we are motivated to propose this project to explore multilevel determinants for exposure and treatment to perinatal polysubstance use disorder using Twitter and statewide health utilization data, and further propose a clinical study to integrate such findings from multiple sources to have a holistic view about polysubstance use in pregnancy. This proposed project aims to 1) use natural language processing, text mining and machine learning techniques to extract just-in-time polysubstance use disorder (PSUD) data among pregnant women on Twitter, and then explore their exposure and treatment, communication patterns, risky health perceptions, sentiment, and maternal and fetal health outcomes associated with polysubstance use in pregnancy; 2) analyze electronic health records to identify clinical, sociodemographic and geographic factors to accessibility to PSUD treatment; and 3) explore individual- and community-level stressors for adverse birth outcomes associated with prenatal PSUD through expert consultation and content analysis. These results will serve to inform clinicians, public health officials, and policy makers on polysubstance use prevention and intervention programs and policy changes among this vulnerable group.
Incidence and Outcomes of a Cholinesterase Inhibitor-Anticholinergic Prescribing Cascade in Older Adults
Interventions such as deprescribing during a comprehensive geriatric assessment can reduce the adverse health and economic impact of identified prescribing cascades in older adults.5,16-18 Our long-term goal is to develop and test a deprescribing intervention in a cluster-randomized trial. The overall objective of this application, which is the next step toward this long-term goal, is to estimate the incidence rate and identify correlates of this prescribing cascade and examine its association with the risk of delirium, falls, cognitive decline, and death. We will conduct this study using real-world longitudinal data from the United Kingdom (US) Clinical Practice Research Data (CPRD). We will use available data from 2000 to 2020. The CPRD is one of the largest databases of longitudinal medical records in the world and has been validated for epidemiological research for a range of conditions. Our overall hypothesis is that the incidence of cholinesterase inhibitor-anticholinergic prescribing cascade and its associated adverse health outcomes has been increasing over the past two decades due to the global rise in the elderly population.
Prediction of Metabolic Syndrome Using Dietary Data and Machine Learning Methods
Multiple dietary patterns or indices have been developed to define a healthy diet and have been associated with metabolic syndrome, though not all studies are supportive. The complexity of diet can be difficult to capture with dietary patterns composed of typically only 8 to 15 food groups combined into unidimensional scores. Machine learning methods can account for high dimensional data such that many more dietary variables could be used to describe healthy dietary patterns, though these methods have been applied to dietary data infrequently to date. We propose to apply machine learning approaches to dietary assessment data, including macronutrients, micronutrients, foods, and food groups, to identify dietary patterns predictive of incident metabolic syndrome using data from a well-characterized prospective cohort study. We will compare the predictive accuracy of multiple machine learning approaches, including artificial neural networks, support vector machines, and classification and regression tree analyses, to traditional logistic regression approaches, and thus are addressing BDHSC Strategic Objective #5: Methodologic Advances. The successful completion of the project will formalize a nascent collaboration and generate high quality preliminary data for external grant proposals.
Uncover the Potential Age-varying Dynamics of Physical Activity and Cognition Using Behavioral Risk Factor Surveillance System
The older population is growing rapidly worldwide, and the prevailing rate of neuropsychological diseases such as Alzheimer’s disease and related dementias (ADRD) has become an urgent public health issue. The subjective experience of cognitive decline or memory complaints are early signals in the lengthy pre-clinical phase of ADRD. One modifiable factor in sustaining cognitive health and reducing risks of ADRD is regular physical activity participation. Current literature considers the beneficial effect of physical activity on cognition is static across time and race/ethnicity groups from middle age to late adulthood. This traditional perspective may not be accurate, as aging is a dynamic and ongoing process in which the biological, physical, and cognitive systems change continuously throughout adulthood. Individuals in the same race/ethnicity group also share similar demographic, social-behavioral, and contextual characteristics as they age. Thus, there may be specific age time window(s) in which specific race/ethnicity groups do or do not respond favorably to PA engagement to obtain the cognitive benefits.
This pilot study is designed to fill the literature gap by investigating the potential dynamics and disparities in the association between PA and cognition from midlife to late life. Specifically, this study will leverage the novel Time-varying Effect Modeling to analyze nationally representative big data collected from the Behavioral Risk Factor Surveillance System (BRFSS, merged with the optional cognitive decline module) to advance knowledge in physical activity research and ADRD prevention. Answering the proposed research question can help researchers and practitioners to identify specific age time windows in which subgroups of individuals may require extra resources and tailored interventions to help them sustain cognition and reduce ADRD risk. Findings from this pilot study have great potential to facilitate new developments in the methodology, theory, and practice of aging and health disparity research, which, in turn, will lead to further grant funding for studying the underlying dynamics of health behavior and outcomes at the population level.