BRFSS Alabama Subset Data Lifecycle

  • Data Collection via BRFSS

    The CDC conducted phone surveys through BRFSS to gather health-related behavior/condition data from adults residing in Alabama. Topics included heart disease, cholesterol, blood pressure, and access to medical care. As a self-reporting method, it may carry limitations in accuracy. The dataset includes both quantitative data (weight, age, diagnoses) and categorical data (smoking status, insurance coverage).
  • Initial Data Storage

    The data collected from Alabama residents was saved as a CSV file and stored on a secure hard drive in a locked room. A codebook documented the variables and data formatting. At this stage, the data remained untouched since collection, preserving its original integrity. The data had not yet been altered or used for analysis, ensuring high reliability. Legal and ethical concerns minimal as BRFSS data is anonymized, but should be handled responsibly as public health data.
  • Data Migration to New Facility

    Due to organizational expansion, the CSV file and codebook were transferred on a flash drive to a new location and imported into a newly established SQL database for analysis. This transfer introduced a new risk, as moving files manually and importing them into a new system can affect formatting and encoding. The original data had now been accessed/handled by IT personnel. The data was now being managed/altered.
  • Data Import Issues Discovered

    A newly-hired data analyst found that SQL import errors caused data corruption. Special characters in the CSV format were unreadable and some rows failed to import, leading to a mismatch in record counts and loss of data. This causes distorted analytical results and leads to misinformed healthcare priorities. This loss impacts the quality and validity of any analysis done on the SQL version.
  • Data Recovery and Analysis

    An analyst team is reviewing discrepancies between the CSV and SQL versions to recover missing data and verify the accuracy of the available information for future research. The original CSV remains the source of truth. The data will be stored and managed in SQL, using structured tables with appropriate constraints and audit trails. Ethical responsibility requires the organization to ensure data is complete and accurate before it is used to inform decisions about healthcare.