Research Data Repositories
This collection, listed below, contains datasets managed by the Big Data Health Science Center that are of interest to the BDHSC research community. Please send an email to Dilek Akgun at AKGUN@mailbox.sc.edu if you have any queries or datasets that you’d want to share with the data science community.

How to Get Access to Research Data Repositories
Submit a Research Data Request
Request a specific dataset for research purposes.
BDHSC Research Datasets
Data Coordinators
Data Description
The cellphone-based population flows were extracted from SafeGraph data by the Geoinformation and Big Data Research Lab at the Center for GIScience and Geospatial Big Data (CeGIS) in collaboration with BDHSC for academic research purposes. This data contains the monthly and weekly visitations flows originating from over 230,000 Census Block Groups (CBGs) to over 5 million Points of Interest (POIs) in the US from 01/01/2018 to 08/30/2022. These visitation flows are called “Origin-Destination-Time (ODT)” flows because each flow refers to a visitation record (number of visitors) from a CBG (Origin) to POI (Destination) during a specific period (Time). In total, this dataset has 9.5 billion ODT flows and can be requested in two formats: 1) ODT flows filtered with time (year, month, week) and geographic location, and 2) ODT flows aggregated spatially and/or temporally.
Level of Access
USC Researchers
Mode of Access
Remote online access
Data Coordinators
Data Description
The Twitter data were collected by the Geoinformation and Big Data Research Lab at the Center for GIScience and Geospatial Big Data (CeGIS) for academic research purposes. This is a live dataset that contains worldwide tweets covering over 10 years from 2012 to present (real-time tweets are being collected around the clock). The total number of tweets as of December 2022 is around 18.6 billion. There are two types of Twitter data in the database: geotagged tweets and randomly sampled tweets. The geotagged tweets are continuously collected using the official Twitter Streaming API with geo filters. The randomly sampled tweets were downloaded from the Internet Archive. All tweets have been cleaned and converted to CSV files with each row for a single tweet. These tweets can be requested in two formats: 1) individual tweets ID filtered with designated keywords (e.g., COVID, HIV, Hurricane, Climate change), time period (year, month, day, hour), and geographic location (e.g., Columbia, SC; New York City; Japan); and 2) spatially and/or temporally aggregated format (e.g., number of tweets in each county during a period; daily number of tweets mentioning COVID-19 in the US).
Level of Access
USC Researchers
Mode of Access
Remote online access
Request Information
If you have any further questions about the BDHSC Datasets, you can reach out to us by filling out the following form, and we will respond promptly. Please provide a detailed description to ensure that we have the information needed to assist you.