The goal of lobbyR is to provide a suite of tools for querying, cleaning, and analyzing U.S. Senate Lobbying Disclosure Act (LDA) data via the official REST API. It is designed for journalists, researchers, and transparency advocates who want to explore federal lobbying disclosures in a reproducible and programmatic way. The package includes helpers for searching by issue, client, registrant, and date, as well as for flagging duplicates, identifying client-registrant conflicts, and securely storing your API key.
You can install the development version of lobbyR from GitHub with:
# install.packages("devtools")
devtools::install_github("Lobbying-DisclosuRe/lobbyr")This is a basic example which shows you how to solve a common problem:
library(lobbyR)
# Set your API key (you'll be prompted to enter it securely)
if (FALSE) { # \dontrun{
# just doing this so it doesn't run
set_senate_api_key()
} # }
# Query filings for tax, company, or bill issues in the first quarter for a specific client/registrant
seven_eleven_filings <- get_filings(
issues = c("fees", "foods", "immigration"),
issue_joiner = "or",
client_name = "7 Eleven, Inc.",
ending_date = "2025-1-25", # format yyyy-mm-dd
starting_date = "2020-04-01", # format yyyy-mm-dd
tidy_result = TRUE,
ignore_disclaimer = FALSE
)
#> Iterating ■■■■■■■■■■■■■■■■ 50% | ETA: 2sIterating ■■■■■■■■■■■■■■■■■■■■■■■ 75% | ETA: 1s DISCLAIMER: This data is known to contain errors and requires additional filtering and cleaning to ensure correct results.
#>
#> See documentation for more guidance and filtering examples.
#>
#> FACT CHECKING:
#>
#> +If you're looking to fact-check, use the filing document url to look at the source of the information as it was filed.
#>
#> +Ensure that there is only one filing for a given registrant in each filing_period for each year to avoid double counting the amount spent or earned on lobbying.
#>
#> DOUBLE COUNTING:
#>
#> If, for example, in the same quarter of a year an entity has a filing called '1st Quarter - Report', '1st Quarter - Termination' and '1st Quarter - Amendment', you must make sure to only count one of those (the latest is usually the most accurate) otherwise you risk double counting.
#>
#> +The helper column called, 'double count risk' should have insights into some of these instances, but it's not perfect. So, double check.
#>
#> +Registrations and terminations are separate from quarterly lobby spending and must be filtered out to determine an entity's yearly spending on lobbying.
#>
#> MORE HELPFUL HINTS:
#>
#> +If an entity name appears as a registrant, but also appears as a client. Do not sum the values. Instead, use the value in the registrant's expenses field to gauge the amount spent on lobbying by the registrant.
#>
#> SOURCE: Federal lobbying disclosures maintained in the U.S. Senate Lobbying Disclosure Act Database and queried through the official Lobbying Disclosure REST API v1 - Read more here - https://lda.senate.gov/api/redoc/v1/
# Flag and clean duplicate filings
dupes_flag_test <- flag_dupes(seven_eleven_filings, find_duplicates = TRUE, attempt_cleaning = TRUE)
#> This function either removed or identified lobbying filings that, if left in, could lead to doublecounting of spending on lobbying. It is not perfect. Please see documentation on tips for fact-checking these by hand.
# Flag and remove potential double-counting between registrant and client
flagged_conflict <- flag_client_registrant_conflict(seven_eleven_filings, flag_conflict = TRUE, clean_doublecounts = TRUE)
#> This function either removed or identified lobbying filings that, if left in, could lead to doublecounting of spending on lobbying. It is not perfect. Please see documentation on tips for fact-checking these by hand.get_filings()
chamber_df <- get_filings(
issues = c("tax", "trade", "bill"),
issue_joiner = "or",
filing_period = "first_quarter",
client_name = "Chamber of Commerce of the U.S.A.",
registrant_name = "Chamber of Commerce of the U.S.A.",
ending_date = "2025-01-25",
starting_date = "2015-04-01",
tidy_result = TRUE,
ignore_disclaimer = TRUE
)
#> ⠙ Iterating 1 done (0.45/s) | 2.2sIterating ■■■■■ 14% | ETA: 15sIterating ■■■■■■■■■■ 29% | ETA: 13sIterating ■■■■■■■■■■■■■■ 43% | ETA: 12sIterating ■■■■■■■■■■■■■■■■■■ 57% | ETA: 8sIterating ■■■■■■■■■■■■■■■■■■■■■■ 71% | ETA: 5sIterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 3s Disclaimer is muted. But you should read it, and can do that by removing ignore_disclaimer = TRUE from DisclosuR callflag_dupes()dupes_flag_test <- flag_dupes(chamber_df, find_duplicates = TRUE, attempt_cleaning = TRUE)
#> This function either removed or identified lobbying filings that, if left in, could lead to doublecounting of spending on lobbying. It is not perfect. Please see documentation on tips for fact-checking these by hand.flag_client_registrant_conflict()flagged_and_clean_conflict <- flag_client_registrant_conflict(seven_eleven_filings, flag_conflict = TRUE, clean_doublecounts = TRUE)
#> This function either removed or identified lobbying filings that, if left in, could lead to doublecounting of spending on lobbying. It is not perfect. Please see documentation on tips for fact-checking these by hand.set_senate_api_key()keyring package.get_filings().
set_senate_api_key()checkme and flag columns to guide
manual review. But there’s always other cases. Feel free to flag and
report them to me.This data is known to contain errors and requires additional filtering and cleaning to ensure correct results. See documentation for more guidance and filtering examples. If you’re looking to fact-check, use the filing document URL to review the source. Registrations and terminations are separate from quarterly lobby spending and must be filtered out to determine an entity’s yearly spending on lobbying. If an entity appears as both registrant and client, do not sum the values; instead, use the registrant’s expenses field to gauge total lobbying spend.
set_senate_api_key().httr2dplyrtidyrstringrkeyringreadrpurrrGNU LGPLv3
Pull requests and issues are welcome. Please see the repository for guidelines.
Working on this still If you use this package in your research or reporting, please cite the data from the U.S. Senate Lobbying Disclosure Act Database and this package.
This package was developed by Chris Cioffi. For questions, issues, or suggestions, please visit: https://github.com/Lobbying-DisclosuRe/lobbyr
Thanks to the U.S. Congress for providing open data and the guidance of American University’s Aarushi Sahejpal. Thanks to AI tools and the R community for help and how-tos in creating some regexes and code patterns.
⁂