-
BRFSS conducted telephone surveys across the U.S. to collect data on health risk behaviors, chronic conditions, and use of preventative services.
-
A subset of the data for Alabama containing both categorical and numerical data was retrieved from the BFRSS database.
-
Client's internal team performs initial analysis to determine if there is a lack of medical care in Alabama, and why.
-
The data set was stored in CSV format on a secure hard drive, along with the codebook, and placed in a securely locked room. Though stored securely, ethical use requires careful management to avoid bias or misclassification in population-level analysis.
-
Data set and codebook are copied to a flash drive, delivered to the new location, and immediately imported into the new SQL database.
-
The analyst finds that the SQL database has fewer records than the original CSV file. The records that are missing might compromise the analysis's validity. For instance, missing values for heart attack (CVDINFR4) or Coronary Heart Disease (CVDCHR4) could skew conclusions about heart disease.
-
A new analyst hired in June 2018 is tasked with re-reviewing the master dataset to determine key health priorities for the new facility.
-
Characters unreadable by the SQL database that exist in the original CSV Data set prevents some rows to fail during import, creating a partial data loss. No mention is made of any data validation process or error logs.