Reusability

Introduction

A reusable approach incorporates setup and post-processing, which take place before and after calling for data from the API. Let’s say that you are interested in the fraction of women who have given birth in the U.S., broken down by age.

The non-reusable part

Use search_in_columns() to find tables that are relevant to our question.

WB_TABLES <- hercacstables::search_in_columns(
    hercacstables::METADATA_FOR_ACS_GROUPS,
    Group = "\\d$", # the group must end in a digit, so only "all races" tables
    Universe = "wom[ae]n", # matches woman and women
    Description = c("birth", "year") # the census doesn't use "age" much
)

Group	Universe	Description	ACS1	ACS5
B13002	Women 15 to 50 years	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Marital Status and Age	TRUE	TRUE
B13004	Women 15 to 50 years in households	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Presence of Spouse or Unmarried Partner	TRUE	TRUE
B13008	Women 15 to 50 years	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Marital Status and Nativity	TRUE	TRUE
B13010	Women 15 to 50 years for whom poverty status is determined	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Marital Status and Poverty Status in the Past 12 Months	TRUE	TRUE
B13012	Women 16 to 50 years	Women 16 to 50 Years Who Had a Birth in the Past 12 Months by Marital Status and Labor Force Status	TRUE	TRUE
B13014	Women 15 to 50 years	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Marital Status and Educational Attainment	TRUE	TRUE
B13015	Women 15 to 50 years	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Marital Status and Receipt of Public Assistance Income in the Past 12 Months	TRUE	TRUE
B13016	Women 15 to 50 years	Women 15 to 50 Years Who Had a Birth in the Past 12 Months by Age	TRUE	TRUE

It looks like table B13016 is exactly what we are looking for.

Setup

Define a glossary of variable meanings

First, create a glossary table that maps from the Census variables you need to the real-world meanings that you actually care about.

The following example glosses 12 variables from “B13016.” Each specific variable encodes four columns’ worth of data.

GLOSSARY_OF_WOMEN_AND_BIRTHS <- "B13016" |>
    hercacstables::unpack_group_details() |>
    dplyr::filter(
        .data$Dataset == "ACS1", # there are details for 1- and 5-year datasets
        dplyr::if_all(c("A", "B"), \(.)nchar(.) > 0) #
    ) |>
    dplyr::mutate(
        `Gave Birth` = stringr::str_detect(.data$A, "had"),
        `Lower Age` = stringr::str_extract(.data$B, "^\\d{2}"),
        `Upper Age` = stringr::str_extract(.data$B, "(?<!^)\\d{2}"),
        dplyr::across(tidyselect::ends_with("Age"),
                      as.integer)
    ) |>
    dplyr::select(
        !tidyselect::any_of(c("A", "B"))
    )

Dataset	Group	Index	Variable	Gave Birth	Lower Age	Upper Age
ACS1	B13016	3	B13016_003E	TRUE	15	19
ACS1	B13016	4	B13016_004E	TRUE	20	24
ACS1	B13016	5	B13016_005E	TRUE	25	29
ACS1	B13016	6	B13016_006E	TRUE	30	34
ACS1	B13016	7	B13016_007E	TRUE	35	39
ACS1	B13016	8	B13016_008E	TRUE	40	44
ACS1	B13016	9	B13016_009E	TRUE	45	50
ACS1	B13016	11	B13016_011E	FALSE	15	19
ACS1	B13016	12	B13016_012E	FALSE	20	24
ACS1	B13016	13	B13016_013E	FALSE	25	29
ACS1	B13016	14	B13016_014E	FALSE	30	34
ACS1	B13016	15	B13016_015E	FALSE	35	39
ACS1	B13016	16	B13016_016E	FALSE	40	44
ACS1	B13016	17	B13016_017E	FALSE	45	50

Calling the API

Define a fetching function

Having defined a reusable glossary, we can now define a reusable fetching function.

fetch_women_and_births <- function(...) {
    hercacstables::fetch_data(
        variables = GLOSSARY_OF_WOMEN_AND_BIRTHS$Variable,
        survey_type = "acs",
        table_or_survey_code = "acs1",
        ...
    )
}

Send the request to the API

Run the fetch_data() command by itself so that you can cache it. In this example, we will ask for the most recent data for the whole country.

RAW_WOMEN_AND_BIRTHS <- fetch_women_and_births(
    year = hercacstables::most_recent_vintage("acs", "acs1"),
    for_geo = "us",
    for_items = "*"
)

us	Group	Index	Value	Year
1	B13016	3	91,622	2023
1	B13016	4	548,375	2023
1	B13016	5	989,964	2023
1	B13016	6	1,201,077	2023
1	B13016	7	792,711	2023
1	B13016	8	269,902	2023
1	B13016	9	125,282	2023
1	B13016	11	10,712,296	2023
1	B13016	12	9,996,049	2023
1	B13016	13	9,825,750	2023
1	B13016	14	10,392,627	2023
1	B13016	15	10,419,909	2023
1	B13016	16	10,755,886	2023
1	B13016	17	11,862,580	2023

Postprocessing

Wrangle the result into a helpful format

The raw data do not answer our question all by themselves. We are interested in rates, not counts. Using the dplyr::summarize() function will let us get rid of the superfluous information about marital status.

wrangle_women_and_births <- function(.raw_api_output, ...) {
    .raw_api_output |>
        dplyr::inner_join(
            GLOSSARY_OF_WOMEN_AND_BIRTHS,
            by = c("Group", "Index")
        ) |>
        dplyr::summarize(
            `Recent Mothers` = sum(.data$Value * .data$`Gave Birth`),
            `All Women` = sum(.data$Value),
            Rate = .data$`Recent Mothers` / .data$`All Women`,
            .by = tidyselect::all_of(c(..., "Lower Age", "Upper Age"))
        )
}

WOMEN_AND_BIRTHS <- wrangle_women_and_births(RAW_WOMEN_AND_BIRTHS)

Lower Age	Upper Age	Recent Mothers	All Women	Rate
15	19	91,622	10,803,918	1%
20	24	548,375	10,544,424	5%
25	29	989,964	10,815,714	9%
30	34	1,201,077	11,593,704	10%
35	39	792,711	11,212,620	7%
40	44	269,902	11,025,788	2%
45	50	125,282	11,987,862	1%

The benefit of this approach is that we can reuse the fetching and wrangling functions.

Across Places

In this example, we pull data for three different counties in Wisconsin.

SE_WI_WOMEN_AND_BIRTHS <- fetch_women_and_births(
    state = 55,
    for_geo = "county",
    for_items = c("059", "079", "101"), # Kenosha, Racine, and Milwaukee
    year = 2023
) |>
    wrangle_women_and_births(
        "county"
    ) |>
    dplyr::mutate(
        county = dplyr::case_match(.data$county,
                                  "059" ~ "Kenosha",
                                  "079" ~ "Milwaukee",
                                  "101" ~ "Racine")
    )

For a simpler table, we’ll exclude the counts of mothers and women, then pivot the rate data by county.

Lower Age	Upper Age	Kenosha	Milwaukee	Racine
15	19	0%	1%	0%
20	24	1%	4%	2%
25	29	16%	6%	18%
30	34	5%	9%	17%
35	39	2%	8%	10%
40	44	3%	1%	1%
45	50	0%	1%	0%

Through time

In this example, we pull data about Texas from three years. We have to make three separate calls, one for each year. It’s an idiosyncrasy of this particular API.

TEXAS_WOMEN_AND_BIRTHS <- 2021:2023 |>
    purrr::map(
        \(.y) fetch_women_and_births(
            for_geo = "state",
            for_items = 48L,
            year = .y
        )
    ) |>
    purrr::list_rbind() |>
    wrangle_women_and_births(
        "Year"
    )

For a simpler table, we’ll exclude the counts of mothers and women, then pivot the rate data by year.

Lower Age	Upper Age	2021	2022	2023
15	19	2%	1%	1%
20	24	8%	7%	7%
25	29	11%	11%	10%
30	34	11%	11%	10%
35	39	7%	6%	7%
40	44	2%	3%	2%
45	50	1%	1%	1%

Lower Age	Upper Age	Kenosha	Milwaukee	Racine
15	19	0%	1%	0%
20	24	1%	4%	2%
25	29	16%	6%	18%
30	34	5%	9%	17%
35	39	2%	8%	10%
40	44	3%	1%	1%
45	50	0%	1%	0%

Lower Age	Upper Age	2021	2022	2023
15	19	2%	1%	1%
20	24	8%	7%	7%
25	29	11%	11%	10%
30	34	11%	11%	10%
35	39	7%	6%	7%
40	44	2%	3%	2%
45	50	1%	1%	1%

Lower Age	Upper Age	Kenosha	Milwaukee	Racine
15	19	0%	1%	0%
20	24	1%	4%	2%
25	29	16%	6%	18%
30	34	5%	9%	17%
35	39	2%	8%	10%
40	44	3%	1%	1%
45	50	0%	1%	0%

Lower Age	Upper Age	2021	2022	2023
15	19	2%	1%	1%
20	24	8%	7%	7%
25	29	11%	11%	10%
30	34	11%	11%	10%
35	39	7%	6%	7%
40	44	2%	3%	2%
45	50	1%	1%	1%

Lower Age	Upper Age	Kenosha	Milwaukee	Racine
15	19	0%	1%	0%
20	24	1%	4%	2%
25	29	16%	6%	18%
30	34	5%	9%	17%
35	39	2%	8%	10%
40	44	3%	1%	1%
45	50	0%	1%	0%

Lower Age	Upper Age	2021	2022	2023
15	19	2%	1%	1%
20	24	8%	7%	7%
25	29	11%	11%	10%
30	34	11%	11%	10%
35	39	7%	6%	7%
40	44	2%	3%	2%
45	50	1%	1%	1%