Matching datasets
Data prep playbook: shape the sheet, name geography columns clearly, keep codes separate, and lock detected columns before matching.
Catch the latest guidance on matching datasets, admin packs, and sessions. Browse the help center, skim the workflow FAQ, and review recent releases—all from the resource hub.
Start with the guide that matches your task: preparing a matching dataset, curating an admin pack, or navigating the pipeline.
Data prep playbook: shape the sheet, name geography columns clearly, keep codes separate, and lock detected columns before matching.
Normalized GeoParquet requirements, alias hygiene, and how versioning/metadata keep packs trustworthy in crosswalk exports.
From profiling through QA/export: what each stage saves, when to rerun, and how sessions stay reproducible.
Clean, clearly labeled columns make profiling instant and keep every session reusing the same mapping. Prep the sheet first, then upload from the Matching Dataset hero CTA.
Prep steps
Make detection trivial
Recommended sheet spec
Upload profile
Common fixes
What stays in the bundle
The crosswalk bundle keeps your cleaned headers, locked geography columns, alias rules, and QA notes. Prepping the file up front means exports stay consistent across runs.
Actionable answers for the three flows you run most: preparing a matching dataset, curating an admin pack, and keeping sessions reproducible.
Three rules to avoid re-uploads: one location concept per column, no merged headers, and separate latitude/longitude.
Headers that never confuse profiling
Keep a single header row, avoid merged cells, and trim whitespace. Split free-form 'province / district / village' cells into dedicated columns so GeoMatcher can detect the hierarchy cleanly.
File format and size sweet spot
CSV/TSV up to ~200 MB load fastest. XLSX works best under ~100k rows; if the sheet is wide, export to CSV first and remove empty columns. Keep lat/lon as two numeric columns in decimal degrees.
If column detection picks the wrong fields
On the Matching Dataset page, choose the geography columns manually and lock them before starting a session. Rename vague headers like 'location' to their actual level (province, district, etc.) for the next upload.
Treat your admin pack like a product: predictable schema, clear versioning, and a handful of real-world aliases.
Must-have fields
Flat/normalized rows only: one admin unit per row with geometry, `admin_level` (starting at 1 for the top level), `code`, and `name`. Include `parent_code` when possible. Wide tables with per-level columns are rejected.
How many aliases should I keep?
Include 3-5 common spellings or abbreviations per feature (e.g., 'DKI Jakarta', 'Jakarta', 'Jakarta Raya'). Keep them in an 'aliases' column as a JSON array or comma-separated text; avoid duplicates or unrelated nicknames.
Versioning and replacing packs
Upload a new pack with an updated version string instead of overwriting. Sessions record the pack hash, so you always know which version drove a crosswalk. Note changes (e.g., alias refresh, geometry corrections) in the description.
Think of a session as a frozen experiment: it stores the admin pack, thresholds, alias rules, and QA notes you used.
When to start a new session
Spin up a new one when you change the admin pack, tweak thresholds, or want to test fresh alias rules without losing the previous run. Keep the original session for auditability.
Rescuing unmatched rows quickly
Filter for unmatched rows, add alias rules for repeating patterns, then rerun matching so the rules apply everywhere. If codes exist in your dataset, map those first to boost scores.
Exporting and sharing
Finish QA, then download the crosswalk bundle from the Session page. It includes the crosswalk CSV, cleaned dataset, QA summary, and the admin pack hash used for that run.
Latest releases and improvements to GeoMatcher.
December 2, 2025
November 17, 2025