Crime Trend Analysis Methodology

Public Analyst.ai is built around three commitments: plain English for readers, strict statistical thresholds for the anomaly engine, and open backtestsso you know how often we've been wrong. The platform-wide rules are first; per-city data sources, NIBRS migration year, reporting lag, and the live forecast backtest table for each city follow at the bottom.

JUMP TO · Categories · Exclusions · Anomaly thresholds · Forecasts · San Francisco · Chicago · Los Angeles · New York · Oakland · Seattle

The 10 categories we track

Every neighborhood and citywide chart on this site uses the same 10 categories, chosen as the multi-city common denominator: each is an FBI UCR Part 1 / NIBRS Group A category that every modern US police department reports under federal standard. Adding new cities means writing per-city ingest mappers, not redesigning the analytics layer.

  • Homicide. Homicide + manslaughter (FBI UCR convention).
  • Robbery. All robbery subtypes (commercial, street, carjacking, residential).
  • Aggravated Assault. Aggravated Assault subcategory only — the NIBRS Part 1 measure of serious violence. Simple assault is excluded because it varies enormously by city/policing practice and would dominate the assault chart.
  • Sexual Assault. Rape + Sex Offense (excl. prostitution) + Human Trafficking. Some cities pre-redact location data for these categories per state law — see the per-city sections below.
  • Burglary. Residential, commercial, hot-prowl, other.
  • Theft from Vehicle. Larceny – From Vehicle, Theft From Vehicle, and Larceny – Auto Parts (catalytic-converter wave). The Bay Area's defining crime category — kept separate from larceny.
  • Other Larceny. Larceny: shoplifting, pickpocket, from building, bicycle, purse-snatch, other.
  • Motor Vehicle Theft. Stolen vehicles (completed + attempted). Distinct from theft-from-vehicle. Recovered vehicles are not counted (they're status updates on a previously reported MVT).
  • Vandalism. Malicious Mischief + Vandalism.
  • Arson. Arson. Low volume but FBI Part 1 — surfaced via rare-event / streak-break signals.

What we deliberately exclude

These categories appear in raw incident data but don't appear in our trend charts — the reason is documented per-category so you can decide whether you agree.

  • Simple Assault. Varies in policing/reporting practice across cities and time periods; would dominate the assault chart and obscure trends in serious violence.
  • Domestic Violence + family-against-children offenses. Each city's bundle is opaque (DV, child abuse, family disputes mixed). Combining with stranger-violence trends would mislead. Deserves its own module with proper DV reporting nuance.
  • Drug offenses + quality-of-life arrests. Reflect policing policy, not victim behavior. Enforcement priorities have shifted multiple times in the analysis window.
  • Weapons-possession charges. Possession ≠ act of violence; conflating inflates the 'violent' trend.
  • White-collar (Fraud, Forgery, Embezzlement). Systematically under-reported in police data; misleading to compare.
  • Admin records, traffic, suicide. Not crimes against persons or property. Counted incidents would inflate volume without reflecting actual harm.

On demographics

The City page includes a city-level demographic profile (median income, poverty rate, age distribution, etc.) for background context. Demographics are never juxtaposed with neighborhood-level crime data — no choropleth maps colored by income, no per-nbhd crime-vs-poverty tables, no demographic covariates in anomaly detection or forecasts. The bias risk is the juxtaposition, not the existence of the numbers: when a reader sees a high-crime neighborhood next to its income or race stats, the brain reads correlation as causation regardless of authorial intent. So we expose city-level facts as background context and stop there.

Anomaly thresholds

Strict thresholds (≈ p < 0.01) keep the false-positive rate manageable — every (city × neighborhood × category × signal-type) cell gets evaluated each month, which is hundreds to thousands of tests. Each rule pairs a statistical test with an absolute-count floor — statistically “significant” movement on tiny counts isn't meaningful.

  • Spike. Current 12-month total > baseline mean + 2.5σ AND ≥ 20 incidents ANDcurrent 6-month total > 6-month baseline + 1σ.
  • Drop. Current 12-month total < baseline mean − 2.5σ AND baseline mean ≥ 20 ANDcurrent 6-month total < 6-month baseline − 1σ. Drops are surfaced with the same prominence as spikes — the platform reads as fear-mongering otherwise.
  • Rare event. Any incident in the last 90 days AND no prior comparable incident in the previous 5 years.
  • Streak break. Incident in the current month AND a gap of ≥ 24 months since the previous one ANDbaseline rate < 6/year (so the “streak” was real).
  • Sustained shift. Recent 12-month total vs. prior 12-month total: |Z| > 2.576 (≈ p < 0.01) AND ratio differs from 1.0 by ≥ 25% AND both windows ≥ 20 incidents.
  • Zero event. Zero incidents in the full analysis window for the city. Informational backdrop only — never a chip.

Forecasts

Forecasts use Prophet on monthly counts. Two skip rules apply:

  • Low count.If a (neighborhood, category) cell averages < 2 incidents/month over the trailing 24 months, no forecast — point estimates on near-zero series produce useless wide intervals.
  • Violent at neighborhood level. Homicide / Robbery / Aggravated Assault / Sexual Assault are skipped per neighborhood and surfaced via rare-event and streak-break signals instead. Citywide forecasts of these still run.

Every forecast shows a 95% prediction interval. Point estimates without ranges are irresponsible — we always pair the two. Horizon is 12 months max, after which intervals become useless and credibility evaporates on year-2 checks. Each city's actual backtest table is below.

Time-of-day, day-of-week, and seasonality

The neighborhood and city pages include hour-of-day, day-of-week, and month-of-year distributions per category. One platform-wide caveat: hour 0 is mildly inflatedin some cities. Some incident reports default the time field to midnight when the actual time is unknown, so the 12am bar overstates true activity. Real diurnal pattern still shows through (lowest 4–5am, peaks at noon and evening) — the inflation is consistent across cities and doesn't change relative shape.

Platform-wide caveats

  • Methodology may evolve. Calibration after each ingest cycle may surface threshold adjustments. The rules in effect at any given monthly run are documented in the source-of-truth file pipeline/src/flags/detectors.py.
  • Per-capita rates exclude non-residential geographies. Park-only or industrial-only neighborhoods (e.g. Golden Gate Park, the Presidio) have near-zero residents but real visitor populations. We show absolute counts for these and suppress the per-capita ratio.
  • Sensitive crime locations are pre-redacted upstream.Some police departments aggregate sexual-assault and domestic-violence locations to district centroids before publication, per state law. Counts and trends are accurate; we're just displaying them at the location precision the source provides. See each city's section below for its specific policy.

San Francisco

41 neighborhoods · NIBRS-era data from 2018+ · reporting lag 7 days.

Data sources

  • Crime incidents. SFPD Incident Reports (DataSF) — Socrata resource wg3w-h783. NIBRS-era only (2018+).
  • Neighborhood polygons. 41 polygons. DataSF Analysis Neighborhoods (resource j2bu-swwd) — the official boundary set used by city government for analysis.
  • Population.US Census ACS 5-year (variable B01003_001E), summed across the city's tract crosswalk.
  • Census county. 06-075 (San Francisco County, coterminous with the city).

Analysis window

January 2018March 2026. Pre-2018 records are excluded. The 7-day lag means the briefing currently shown reports on March 2026 — earlier than calendar-current because we wait for the buffer to clear before treating a month as settled.

Forecast backtest — trained through 2024-12

We trained the forecast model on data ending 2024-12 and predicted the following 12 months. Coverage is the share of months whose actual count fell inside the 95% CI; MAPE is mean absolute percentage error; bias is mean of (point − actual): positive means we systematically over-predicted.

CategoryMonthsCoverageMAPEBias / mo
Aggravated Assault1250.0%26.4%+49.9
Arson1291.7%29.9%+5.8
Burglary1291.7%29.2%+107.8
Homicide1291.7%71.0%+0.1
Motor Vehicle Theft120.0%70.4%+238.0
Other Larceny12100.0%5.7%+55.0
Robbery1291.7%28.3%+41.1
Sexual Assault1291.7%116.8%+6.0
Theft from Vehicle1291.7%137.3%+770.7
Vandalism1258.3%17.4%+90.9

San Francisco-specific caveats

  • Sexual-assault and DV locations are pre-redacted. SFPD aggregates incident locations for sexual-assault and domestic-violence reports to police-district centroids before publication, per California Penal Code §293. Counts and trends are accurate; we simply display them at the precision the source provides.
  • 2024 larceny reclassification. SFPD revised some Larceny coding categories during 2024. The change shifts where certain reports land between 'theft from vehicle' and 'other larceny', so cross-2024 comparisons within those two buckets carry a footnote on the per-month archive pages.
  • Park and non-residential neighborhoods. Golden Gate Park, Lincoln Park, McLaren Park, and the Presidio have near-zero residents but real visitor populations. Per-capita rates are suppressed for these geographies; only absolute counts are shown.

Chicago

77 neighborhoods · NIBRS-era data from 2018+ · reporting lag 7 days.

Data sources

  • Crime incidents. CPD Crimes - 2001 to Present (Chicago Data Portal) — Socrata resource ijzp-q8t2. NIBRS-era only (2018+).
  • Neighborhood polygons. 77 polygons. Chicago's 77 community areas — defined by University of Chicago researchers in the 1920s and still the city's standard analytical unit. Sourced from the Chicago Data Portal (resource igwz-8jzy).
  • Population.US Census ACS 5-year (variable B01003_001E), summed across the city's tract crosswalk.
  • Census county. 17-031 (Cook County — broader than the city; only city tracts are summed).

Analysis window

January 2018March 2026. Pre-2018 records are excluded. The 7-day lag means the briefing currently shown reports on March 2026 — earlier than calendar-current because we wait for the buffer to clear before treating a month as settled.

Forecast backtest — trained through 2024-12

We trained the forecast model on data ending 2024-12 and predicted the following 12 months. Coverage is the share of months whose actual count fell inside the 95% CI; MAPE is mean absolute percentage error; bias is mean of (point − actual): positive means we systematically over-predicted.

CategoryMonthsCoverageMAPEBias / mo
Aggravated Assault128.3%20.0%+204.3
Arson1266.7%41.9%+10.4
Burglary1266.7%18.0%-145.0
Homicide1266.7%41.8%+13.1
Motor Vehicle Theft1233.3%68.9%+972.9
Other Larceny1241.7%28.7%+1173.9
Robbery120.0%84.5%+394.9
Sexual Assault12100.0%6.3%+4.8
Vandalism1250.0%12.1%+255.0

Chicago-specific caveats

  • IUCR codes, not NIBRS. Chicago publishes incidents under the IUCR coding scheme (the predecessor to NIBRS). CPD started submitting NIBRS to the FBI in 2021, but the public dataset has stayed on IUCR for stability — so our category mapper translates IUCR primary types and descriptions into the same UCR Part 1 buckets used elsewhere on the site. The buckets are equivalent; the source vocabulary is different.
  • Theft-from-vehicle is undercount before 2024. CPD didn't break out theft from a vehicle as its own IUCR code until 2024 (when code 0710 “THEFT FROM MOTOR VEHICLE” was introduced under THEFT, joined in 2025 by code 0760 “BURGLARY FROM MOTOR VEHICLE” under BURGLARY). Pre-2024 incidents that today would be coded as theft from a vehicle were filed under the generic “FORCIBLE ENTRY” / “UNLAWFUL ENTRY” burglary codes with no vehicle subdivision, so the theft-from-vehicle bucket is artificially low and the burglary bucket is artificially high before 2024. Year-over-year comparisons across that boundary should be read with that shift in mind.
  • Aggravated battery folded into aggravated assault. IUCR splits BATTERY and ASSAULT as separate primary types. UCR Part 1 has no “battery” category — aggravated battery and aggravated assault are both reported under “aggravated assault.” We follow the UCR convention so Chicago's aggravated-assault counts are comparable to SF's and Oakland's. Simple battery and simple assault are excluded for the same reason as in other cities.
  • Domestic-battery rows are excluded. CPD codes domestic battery as a separate IUCR description (“AGG. DOMESTIC BATTERY …”). Per the platform-wide stance on DV reporting, those rows are excluded from the aggravated-assault bucket — the bundle of domestic / family / acquaintance violence belongs in a dedicated DV module rather than mixed into stranger-violence trends.
  • 77 community areas, not block groups. Chicago's neighborhood unit is the community area — 77 long-stable polygons defined by the University of Chicago in the 1920s, still the city government's standard analytical unit. We spatial-join each incident's lat/lng to a community-area polygon at ingest, so neighborhood assignment doesn't depend on CPD's pre-joined community_area number (which is null on a small fraction of recent rows).
  • City demographics use Cook County medians. Cook County (FIPS 17-031) is broader than Chicago city. Per-tract counts (population, households, housing units) are summed only across the ~790 city tracts; medians (rent, home value, household income, age) come from the county because medians don't aggregate across tracts and lean slightly different from a Chicago-city-only median would.

Los Angeles

114 neighborhoods · NIBRS-era data from 2020+ · reporting lag 14 days.

Data sources

  • Crime incidents. los-angeles — Socrata resource k7nn-b2ep. NIBRS-era only (2020+).
  • Neighborhood polygons. 114 polygons. Hand-curated tract groupings, dissolved from US Census TIGER tract polygons. The crosswalk is reviewed manually as misassignments surface.
  • Population.US Census ACS 5-year (variable B01003_001E), summed across the city's tract crosswalk.
  • Census county. 06-037.

Analysis window

January 2020March 2026. Pre-2020 records are excluded. The 14-day lag means the briefing currently shown reports on March 2026 — earlier than calendar-current because we wait for the buffer to clear before treating a month as settled.

Forecast backtest — trained through 2024-12

We trained the forecast model on data ending 2024-12 and predicted the following 12 months. Coverage is the share of months whose actual count fell inside the 95% CI; MAPE is mean absolute percentage error; bias is mean of (point − actual): positive means we systematically over-predicted.

CategoryMonthsCoverageMAPEBias / mo
Aggravated Assault1241.7%13.5%+151.3
Arson1291.7%22.2%+0.5
Burglary1266.7%31.0%+207.3
Homicide12100.0%33.4%+2.5
Motor Vehicle Theft120.0%62.2%+785.4
Other Larceny120.0%69.5%+1352.1
Robbery1216.7%25.9%+134.6
Sexual Assault1250.0%35.1%+50.5
Theft from Vehicle120.0%33.2%+474.7
Vandalism120.0%43.2%+683.9

New York

59 neighborhoods · NIBRS-era data from 2018+ · reporting lag 14 days.

Data sources

  • Crime incidents. new-york — Socrata resource qgea-i56i. NIBRS-era only (2018+).
  • Neighborhood polygons. 59 polygons. Hand-curated tract groupings, dissolved from US Census TIGER tract polygons. The crosswalk is reviewed manually as misassignments surface.
  • Population.US Census ACS 5-year (variable B01003_001E), summed across the city's tract crosswalk.
  • Census county. 36-061047005081085.

Analysis window

January 2018March 2026. Pre-2018 records are excluded. The 14-day lag means the briefing currently shown reports on March 2026 — earlier than calendar-current because we wait for the buffer to clear before treating a month as settled.

Forecast backtest — trained through 2024-12

We trained the forecast model on data ending 2024-12 and predicted the following 12 months. Coverage is the share of months whose actual count fell inside the 95% CI; MAPE is mean absolute percentage error; bias is mean of (point − actual): positive means we systematically over-predicted.

CategoryMonthsCoverageMAPEBias / mo
Aggravated Assault1250.0%8.2%+181.0
Arson1266.7%55.1%+15.0
Burglary12100.0%7.4%-16.5
Homicide1266.7%44.6%+8.2
Motor Vehicle Theft1250.0%17.0%+164.3
Other Larceny1283.3%16.2%+1802.2
Robbery1250.0%21.4%+250.7
Sexual Assault1250.0%13.1%+95.8
Theft from Vehicle12100.0%10.5%+63.4
Vandalism1291.7%7.5%+176.9

Oakland

35 neighborhoods · NIBRS-era data from 2021+ · reporting lag 30 days.

Data sources

  • Crime incidents. OPD CrimeWatch (Oakland Open Data) — Socrata resource ppgh-7dqv. NIBRS-era only (2021+).
  • Neighborhood polygons. 35 polygons. Hand-curated tract groupings, dissolved from US Census TIGER tract polygons. The crosswalk is reviewed manually as misassignments surface.
  • Population.US Census ACS 5-year (variable B01003_001E), summed across the city's tract crosswalk.
  • Census county. 06-001 (Alameda County — broader than the city; only city tracts are summed).

Analysis window

January 2021March 2026. Pre-2021 records are excluded. The 30-day lag means the briefing currently shown reports on March 2026 — earlier than calendar-current because we wait for the buffer to clear before treating a month as settled.

Forecast backtest — trained through 2024-12

We trained the forecast model on data ending 2024-12 and predicted the following 12 months. Coverage is the share of months whose actual count fell inside the 95% CI; MAPE is mean absolute percentage error; bias is mean of (point − actual): positive means we systematically over-predicted.

CategoryMonthsCoverageMAPEBias / mo
Aggravated Assault1283.3%17.9%+0.2
Arson1283.3%23.9%-2.1
Burglary1283.3%10.9%+19.4
Homicide1258.3%18.2%+8.1
Motor Vehicle Theft1225.0%40.8%+197.3
Other Larceny1266.7%10.0%-46.8
Robbery128.3%108.4%+119.5
Sexual Assault1275.0%22.2%+2.0
Theft from Vehicle12100.0%70.5%+358.6
Vandalism12100.0%30.9%+59.3

Oakland-specific caveats

  • OPD migrated to NIBRS in 2021. Pre-2021 OPD records use the older Summary UCR taxonomy and aren't directly comparable to NIBRS-era counts. The analysis window starts 2021-01 — pre-2021 data is excluded from baselines, anomaly detection, and forecasts.
  • City demographics use Alameda County medians. The city-profile page sums tract-level ACS counts for Oakland city specifically (population, households, housing units). Median values (median rent, home value, household income, age) come from Alameda County because medians don't aggregate across tracts. The county is broader than Oakland city, so those medians lean slightly higher than a true Oakland-city-only median would.

Seattle

20 neighborhoods · NIBRS-era data from 2018+ · reporting lag 7 days.

Data sources

  • Crime incidents. seattle — Socrata resource tazs-3rd5. NIBRS-era only (2018+).
  • Neighborhood polygons. 20 polygons. Hand-curated tract groupings, dissolved from US Census TIGER tract polygons. The crosswalk is reviewed manually as misassignments surface.
  • Population.US Census ACS 5-year (variable B01003_001E), summed across the city's tract crosswalk.
  • Census county. 53-033.

Analysis window

January 2018March 2026. Pre-2018 records are excluded. The 7-day lag means the briefing currently shown reports on March 2026 — earlier than calendar-current because we wait for the buffer to clear before treating a month as settled.

Forecast backtest — trained through 2024-12

We trained the forecast model on data ending 2024-12 and predicted the following 12 months. Coverage is the share of months whose actual count fell inside the 95% CI; MAPE is mean absolute percentage error; bias is mean of (point − actual): positive means we systematically over-predicted.

CategoryMonthsCoverageMAPEBias / mo
Aggravated Assault1283.3%10.6%+23.1
Arson1291.7%47.8%+2.3
Burglary1291.7%13.1%+62.4
Homicide1258.3%118.3%+2.9
Motor Vehicle Theft120.0%61.6%+287.7
Other Larceny12100.0%8.1%-8.4
Robbery1258.3%24.8%+26.9
Sexual Assault12100.0%13.0%-5.1
Theft from Vehicle1291.7%10.1%+87.2
Vandalism1258.3%17.6%+91.6

← Back to home