Methodology

Risk Environment Framework

Organization and data selection for the OEPS is guided by a risk environment framework (Rhodes 2002). The “risk environment” as a framework for understanding and reducing drug-related harms emphasizes the broader contextual domains in which opioid use occurs.

Our approach is further rooted in the socioecological model of substance use, which includes multiple levels of the physical and social environments that interact and overlap to impact health. We apply this socio-ecological model of health and use it to further build on previous research on risk environments (Cooper et al 2016, Ciccarone 2017).

Data and research models must reflect this transdisciplinary and multi-level approach. The risk environment framework shifts the focus of drug-related harm research away from individuals, and toward environmental factors driving or enabling trends at the community level. We see this approach encouraging greater understanding of the spatial and community contexts in which opioid use harm occurs.

Multiple Spatial Scales

Most datasets are available at multiple spatial scales, including Census tract, ZIP Code or ZIP Code Tract Area (ZCTA), county, and state. You can filter and explore available datasets by spatial scale here.

Data Themes

Data included in the OEPS is grouped thematically for ease of exploration, indices development, and model integration. Data is stratfied across six themes:

  • Social: Household Characteristics, Demographic measures, Race and Ethnicity, Disability measures, Incarceration rates, Veteran population, Educational Characteristics, Residential Segregation Measures, and Community Overlay Variables
  • Economic: Employment Trends, Poverty Measures, and Income Inequality
  • Environment: Access to Healthcare Providers, Housing Characteristics, Internet Access, Greenspace Measures, Urbanicity/Rurality, and Alchohol Outlet Density
  • Policy: State, county, and local policies that may influence access to treatment and/or criminal justice
  • Outcomes: Opioid Indicators and Hepatitis C Rate measures

Note: Historic Covid-19 data was moved to the US COVID Atlas -- a free and open source pandemic data archive and visualization tool, also led by the Healthy Regions and Policies Lab.

The OEPS also includes geography boundary shapefiles from the US Census Bureau’s TIGER/Line (multiple years) for Census tracts, ZCTAs, counties, and states.

Census Vintages

The OEPS data warehouse holds values for variables that span many different decades. Depending on the data year, different Census vintages must be used for the spatial analysis. In general, for any data from 2010 or older, use 2010 Census data. For 2011-2022 data, use the 2018 vintage, and for datasets from 2023 and later, use 2020 vintage. In our prepared data packages, a single set of geography files will be included (easy!). If you are directly downloading individual CSVs, you can use the CSV download tables to determine which geography vintage the data should be joined to.

Data Standards

"No data" values our CSVs will simply have a blank entry. This means that for a given variable and geographic unit there is no value in the source dataset. Keep in mind that "0" is a valid value for many different measures and should not be treated as "no data".

Most variable names are no more than 10 characters (with some exceptions) for ease of data wrangling with shapefiles and GIS software. Some variable names are therefore shortened or abbreviated from the source data.

Numbers are either integers or rounded to two decimal places. In our data registry you can find a full list of every variable and its data type.

We use a HEROP_ID field in all geography files and CSVs to serve as a unified geographic identifier for joins. Other common identifiers, like GEOID, ZIP5, and FIPS may also be included, depending on the file. The HEROP_ID is similar to what the American FactFinder used (now data.census.gov), and it consists of three parts:

  1. The 3-digit Summary Level Code for this geography. Common summary level codes are:
    • 040 -- State
    • 050 -- County
    • 140 -- Census Tract
    • 860 -- Zip Code Tabulation Area (ZCTA)
  2. The 2-letter string US
  3. The standard GEOID for the given unit (length depends on what type of unit)
    • GEOIDs are, in turn, hierarchical aggregations of FIPS codes

Expanding out the FIPS codes for the five summary levels shown above, the full IDs would look like:

Summary LevelFormatHEROP_ID LengthExample
State040US + STATE7040US17 (Illinois)
County050US + STATE + COUNTY10050US17019 (Champaign County)
Tract140US + STATE + COUNTY + TRACT16140US17019005900
ZCTA860US + ZIP CODE10860US61801

The advantages of this composite ID are:

  1. Unique across all geographic areas in the US
  2. Will always be forced to string formatting
  3. Easy to programmatically change back into the more standard GEOIDs
Converting HEROP_IDs to GEOIDs (integers)

HEROP_IDs can be converted back to standard GEOIDs by removing the first 5 characters, or by taking everything after the substring "US". Here are some examples of what this looks like in different environments:

EnvironmentExample
ExcelREPLACE(A1, 1, 5, "")
Rgeoid <- str_split_i(HEROP_ID, "US", -1)
Pythongeoid = HEROP_ID.split("US")[1]
JavaScriptconst geoid = HEROP_ID.split("US")[1]

Documentation

Please refer to the Metadata Docs for more information about individual datasets and variables.