Skip to contents

CRAN status CRAN downloads Lifecycle: stable Project Status: Active Codecov test coverage R-bloggers R Weekly R-CMD-check

Version 0.2.0 introduces a streamlined, catalog-driven interface for NYC Open Data.

Instead of maintaining dozens of individual dataset wrappers, the package now provides three core functions:

  • nyc_list_datasets() — Browse available datasets in the built-in catalog
  • nyc_pull_dataset() — Pull any cataloged dataset by key, with filtering, ordering, and optional date controls
  • nyc_any_dataset() — Pull any NYC Open Data dataset directly via its Socrata JSON endpoint

The catalog currently includes 30+ curated NYC Open Data datasets, covering topics such as:

  • 311 Service Requests
  • For-Hire Vehicles (FHV)
  • Juvenile Justice (rearrest rates + caseloads)
  • School Discharge Reporting
  • Violent & Disruptive School Incidents
  • Detention Admissions
  • Borough/Community District Reports
  • Street Tree Census
  • Urban Park Ranger Animal Condition Responses
  • Permitted Events (Historical)
  • and more

Datasets pulled via nyc_pull_dataset() automatically apply sensible defaults from the catalog (such as default ordering and date fields), while still allowing user control over:

  • limit
  • filters
  • date / from / to
  • where
  • order
  • clean_names
  • coerce_types

This redesign reduces maintenance burden, improves extensibility, and provides a more scalable interface for working with NYC Open Data.

All functions return clean tibble outputs and support filtering via
filters = list(field = "value").


Installation

From CRAN

install.packages("nycOpenData")

Development version (GitHub)

devtools::install_github("martinezc1/nycOpenData")

Example

library(nycOpenData)

# Get 5,000 most recent 311 requests
data <- nyc_pull_dataset(key = "nyc_311", limit = 5000)

# Filter by agency and city
filtered <- nyc_pull_dataset(
  key = "nyc_311",
  limit = 2000,
  filters = list(agency = "NYPD", city = "BROOKLYN")
)

head(filtered)
## # A tibble: 6 × 37
##   unique_key created_date           agency agency_name complaint_type descriptor
##   <chr>      <chr>                  <chr>  <chr>       <chr>          <chr>     
## 1 67613985   2026-01-26T02:06:05.0… NYPD   New York C… Noise - Resid… Banging/P…
## 2 67609553   2026-01-26T02:02:09.0… NYPD   New York C… Noise - Resid… Banging/P…
## 3 67610990   2026-01-26T01:58:58.0… NYPD   New York C… Illegal Parki… Blocked H…
## 4 67615428   2026-01-26T01:56:49.0… NYPD   New York C… Noise - Resid… Banging/P…
## 5 67609568   2026-01-26T01:48:16.0… NYPD   New York C… Noise - Resid… Loud Musi…
## 6 67612476   2026-01-26T01:47:10.0… NYPD   New York C… Noise - Resid… Loud Musi…
## # ℹ 31 more variables: location_type <chr>, incident_zip <chr>,
## #   incident_address <chr>, street_name <chr>, cross_street_1 <chr>,
## #   cross_street_2 <chr>, intersection_street_1 <chr>,
## #   intersection_street_2 <chr>, address_type <chr>, city <chr>,
## #   landmark <chr>, status <chr>, community_board <chr>,
## #   council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>,
## #   x_coordinate_state_plane <chr>, y_coordinate_state_plane <chr>, …

Learn by example

  • vignette("nyc-311", package = "nycOpenData") – Working with NYC 311 data end-to-end

About

nycOpenData makes New York City’s civic datasets accessible to students,
educators, analysts, and researchers through a unified and user-friendly R interface.
Developed to support reproducible research, open-data literacy, and real-world analysis.


Comparison to Other Software

While the RSocrata package provides a general interface for any Socrata-backed portal, nycOpenData is specifically tailored for the New York City ecosystem.

  • Ease of Use: No need to hunt for 4x4 dataset IDs (e.g., erm2-nwe9); use nyc_pull_dataset() with a human-readable catalog key.
  • Pre-configured Logic: Wrappers include default sorting (e.g., created_date DESC) and optimized limit handling specific to NYC’s massive data volumes.
  • Open Literacy: Designed specifically for students and researchers to lower the barrier to entry for civic data analysis.

Contributing

We welcome contributions! If you find a bug or would like to request a wrapper for a specific NYC dataset, please open an issue or submit a pull request on GitHub.


Authors & Contributors

Maintainer

Christian A. Martinez 📧 c.martinez0@outlook.com
GitHub: @martinezc1

✨ Contributors

Special thanks to the students of PSYC 7750G – Reproducible Psychological Research at Brooklyn College (CUNY) who have contributed functions and documentation:


Academic Context

This package is developed as a primary pedagogical tool for teaching data acquisition and open science practices at Brooklyn College, City University of New York (CUNY).