Pull any NYC Open Data dataset from a Socrata JSON endpoint
Source:R/nyc_any_dataset.R
nyc_any_dataset.RdDownloads data from any NYC Open Data Socrata JSON endpoint and returns the result as a tibble. This function is useful for datasets that are not included in the curated catalog returned by [nyc_list_datasets()].
Usage
nyc_any_dataset(
json_link,
limit = 10000,
timeout_sec = 30,
clean_names = TRUE,
coerce_types = TRUE
)Arguments
- json_link
A single Socrata dataset JSON endpoint URL, such as `"https://data.cityofnewyork.us/resource/erm2-nwe9.json"`.
- limit
Number of rows to retrieve. Defaults to 10,000.
- timeout_sec
Request timeout in seconds. Defaults to 30.
- clean_names
Logical. If `TRUE`, column names are converted to snake_case using [janitor::clean_names()]. Defaults to `TRUE`.
- coerce_types
Logical. If `TRUE`, the package attempts lightweight, heuristic-based type coercion after downloading the data. Columns are converted only when at least 95 percent of non-missing values can be parsed as the target type. This helps avoid unsafe conversions when source data are inconsistent.
Details
NYC Open Data datasets have Socrata JSON endpoints that usually follow this pattern:
`https://data.cityofnewyork.us/resource/<dataset_uid>.json`
For example, the 311 Service Requests dataset has the Socrata UID `"erm2-nwe9"`, so its JSON endpoint is:
`https://data.cityofnewyork.us/resource/erm2-nwe9.json`
Users can find a dataset's UID from the NYC Open Data Portal URL, the API documentation page for the dataset, or the output of [nyc_list_datasets()].
`nyc_any_dataset()` bypasses the package catalog and sends a request directly to the supplied JSON endpoint. Unlike [nyc_pull_dataset()], it does not look up defaults such as catalog keys, date fields, or default ordering.
This function is intended for direct endpoint access. For catalog-based workflows using readable keys or Socrata UIDs, use [nyc_pull_dataset()].
Examples
# Examples that hit the live NYC Open Data API are guarded so CRAN checks
# do not fail when the network is unavailable or slow.
if (interactive() && curl::has_internet()) {
# Build a JSON endpoint from a Socrata UID
uid <- "erm2-nwe9"
endpoint <- paste0("https://data.cityofnewyork.us/resource/", uid, ".json")
out <- try(nyc_any_dataset(endpoint, limit = 3), silent = TRUE)
if (!inherits(out, "try-error")) {
head(out)
}
}