Skip to contents

Downloads data from any NYC Open Data Socrata JSON endpoint and returns the result as a tibble. This function is useful for datasets that are not included in the curated catalog returned by [nyc_list_datasets()].

Usage

nyc_any_dataset(
  json_link,
  limit = 10000,
  timeout_sec = 30,
  clean_names = TRUE,
  coerce_types = TRUE
)

Arguments

A single Socrata dataset JSON endpoint URL, such as `"https://data.cityofnewyork.us/resource/erm2-nwe9.json"`.

limit

Number of rows to retrieve. Defaults to 10,000.

timeout_sec

Request timeout in seconds. Defaults to 30.

clean_names

Logical. If `TRUE`, column names are converted to snake_case using [janitor::clean_names()]. Defaults to `TRUE`.

coerce_types

Logical. If `TRUE`, the package attempts lightweight, heuristic-based type coercion after downloading the data. Columns are converted only when at least 95 percent of non-missing values can be parsed as the target type. This helps avoid unsafe conversions when source data are inconsistent.

Value

A tibble containing rows from the requested NYC Open Data endpoint.

Details

NYC Open Data datasets have Socrata JSON endpoints that usually follow this pattern:

`https://data.cityofnewyork.us/resource/<dataset_uid>.json`

For example, the 311 Service Requests dataset has the Socrata UID `"erm2-nwe9"`, so its JSON endpoint is:

`https://data.cityofnewyork.us/resource/erm2-nwe9.json`

Users can find a dataset's UID from the NYC Open Data Portal URL, the API documentation page for the dataset, or the output of [nyc_list_datasets()].

`nyc_any_dataset()` bypasses the package catalog and sends a request directly to the supplied JSON endpoint. Unlike [nyc_pull_dataset()], it does not look up defaults such as catalog keys, date fields, or default ordering.

This function is intended for direct endpoint access. For catalog-based workflows using readable keys or Socrata UIDs, use [nyc_pull_dataset()].

Examples

# Examples that hit the live NYC Open Data API are guarded so CRAN checks
# do not fail when the network is unavailable or slow.
if (interactive() && curl::has_internet()) {
  # Build a JSON endpoint from a Socrata UID
  uid <- "erm2-nwe9"
  endpoint <- paste0("https://data.cityofnewyork.us/resource/", uid, ".json")

  out <- try(nyc_any_dataset(endpoint, limit = 3), silent = TRUE)
  if (!inherits(out, "try-error")) {
    head(out)
  }
}