Bridge Files
With each data release, Overture generates bridge files that connect GERS IDs to the IDs from the source data. These files are a key component of GERS and offer two critical capabilities: reverse lookup of source features and insight into Overture's conflation process.
Here's how to get the bridge files:
Provider | Location |
---|---|
Amazon S3 | s3://overturemaps-us-west-2/bridgefiles/<RELEASE> |
Microsoft Azure Blob Storage | https://overturemapswestus2.blob.core.windows.net/bridgefiles/<RELEASE> |
The latest Overture data <RELEASE>
is:
2025-06-25.0/
Currently, Overture only generates bridge files for these source datasets: Esri Community Maps, geoBoundaries, Instituto Geográfico Nacional (España), Meta Places, Microsoft Places, OpenStreetMap, PinMeTo.
Partitioning and schema
Bridge files are released as Parquet files, partitioned by dataset
, theme
, and type
and structured in this way:
\bridgefiles
\<RELEASE>
\dataset=OpenStreetMap
\theme=divisions
\type=division
\type=division_area
\theme=buildings
\type=building
\theme=transportation
\type=segment
\dataset=Esri Community Maps
\theme=buildings
\type=building
\dataset=PinMeTo
\theme=places
\type=places
\dataset=meta
\theme=places
\type=places
\dataset=Microsoft
\theme=places
\type=places
\dataset=Instituto Geográfico Nacional (España)
\theme=buildings
\type=building
\dataset=geoBoundaries
\theme=divisions
\type=division
\type=division_area
The bridge file schema includes the following columns:
Column | Data type | Description |
---|---|---|
id | string | represents the GERS ID and is populated from the id column in the Overture data schema |
record_id | string | represents the id of the feature as it is in the source data provider (e.g. n2757802019@9) and is populated by parsing the sources column in the Overture data schema |
update_time | string | represents the time the feature or dataset was updated, depending on the data provider; also populated by parsing the sources column in the Overture schema |
dataset | string | represents the name of the dataset the feature has been provided in; also populated by parsing the sources column in the Overture data schema |
theme | string | represents the theme the feature is a part of, provided by the creator of the bridge file itself |
type | string | represents the type of the feature, either derived from the data or provided by the creator of the bridge file |
between | array | represents the portion of the normalized length of the GERS feature the dataset way takes, represented by a range between 0 and 1 |
dataset_between | array | represents the portion of the normalized length of the dataset way the GERS feature takes, represented by a range between 0 and 1 |
Example: examining the source data for the building
dataset
In this example, we'll trace the buildings data in the latest release back to the underlying source datasets. We'll examine an area near the US-Mexico border outside San Diego. First, let's get the buildings in our area of interest:
CREATE TABLE IF NOT EXISTS border_buildings AS
(SELECT
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=buildings/type=building/*')
WHERE
bbox.xmin > -117.048198 AND bbox.xmax < -117.044608
AND bbox.ymin > 32.535068 AND bbox.ymax < 32.600154);
You'll notice the table has 4367 building
features. Now let's look at the building
count by data source:
SELECT
sources[1].dataset AS source,
count(*)
FROM border_buildings
GROUP BY source;
┌────────────────────────┬──────────────┐
│ source │ count_star() │
│ varchar │ int64 │
├────────────────────────┼──────────────┤
│ Esri Community Maps │ 412 │
│ OpenStreetMap │ 1539 │
│ Google Open Buildings │ 1751 │
│ Microsoft ML Buildings │ 665 │
└────────────────────────┴──────────────┘
Now we'll use the latest bridge file to find additional information about data in the Overture corpus that didn't make it into the release. We'll join the table we created from the release data with the bridge file data to create a new table that has detailed view of the source mappings. Remember: we only have bridge files for Esri Community Maps data and OpenStreetMap data.
CREATE TABLE IF NOT EXISTS border_buildings_corpus AS
(SELECT
border_buildings.id AS gers_id,
dataset,
record_id AS dataset_record_id
FROM
border_buildings
JOIN
read_parquet('s3://overturemaps-us-west-2/bridgefiles/2025-05-21.0/dataset=*/theme=buildings/type=building/*') bridge
ON border_buildings.id = bridge.id
ORDER BY border_buildings.id, bridge.dataset);
You might notice this new table created from our join has only 2,021 records compared to 4,367 building
records in our original query of the latest release data. This is because we have incomplete bridge file coverage for buildings; we don't generate bridge files for Microsoft ML Buildings and Google Open Buildings because those sources don't have meaningful IDs for reverse lookup. However the bridge files that do exist for buildings reveal important patterns:
- Multiple sources per building: a single Overture building may be conflated from multiple source datasets
- One-to-many mapping: each source contribution gets its own bridge file record
Let's dig into this a bit more. We can identify the building
features in the release that have multiple source mappings.
-- Identify buildings conflated from multiple sources
SELECT gers_id,
COUNT(DISTINCT dataset) as source_count,
STRING_AGG(DISTINCT dataset, ', ') as datasets
FROM border_buildings_corpus
GROUP BY gers_id
HAVING COUNT(DISTINCT dataset) > 1;
There are 70 buildings with that are mapped to two data sources. Here's a snippet of the query result:
┌──────────────────────────────────┬──────────────┬────────────────────────────────────┐
│ gers_id │ source_count │ datasets │
│ varchar │ int64 │ varchar │
├──────────────────────────────────┼──────────────┼────────────────────────────────────┤
│ 08b29a4c428ebfff02002b827866f466 │ 2 │ OpenStreetMap, Esri Community Maps │
│ 08b29a4c428c3fff0200bb4d5defac52 │ 2 │ Esri Community Maps, OpenStreetMap │
│ 08b29a4c428e5fff0200cddeda1c3c68 │ 2 │ OpenStreetMap, Esri Community Maps │
│ 08b29a4c4280bfff0200729b26aa9ec7 │ 2 │ Esri Community Maps, OpenStreetMap │
│ 08b29a4c428c9fff0200cc521ce08155 │ 2 │ OpenStreetMap, Esri Community Maps │
│ 08b29a4c428cbfff0200df6833fcb919 │ 2 │ Esri Community Maps, OpenStreetMap │
│ 08b29a4c428cbfff0200d5d23faeeec7 │ 2 │ Esri Community Maps, OpenStreetMap │
│ 08b29a4c42809fff02000729055d0147 │ 2 │ Esri Community Maps, OpenStreetMap │
│ 08b29a4c428ccfff02002cbabfaf31e0 │ 2 │ Esri Community Maps, OpenStreetMap │
│ 08b29a4c4280bfff02003a1e8ccaeb68 │ 2 │ Esri Community Maps, OpenStreetMap │
...
Let's pull out one example:
gers_id | dataset | dataset_record_id | update_time |
---|---|---|---|
08b29a4c428ebfff02002b827866f466 | Esri Community Maps | esri_ChulaVistaCA13510 | 2024-10-15T00:00:00.000Z |
08b29a4c428ebfff02002b827866f466 | OpenStreetMap | w1182486582@1 | 2023-06-16T14:22:10.000Z |
This shows that building 08b29a4c428ebfff02002b827866f466
was created by a conflation process that included data from OpenStreetMap (way 1182486582, version 1) and Esri Community Maps (esri_ChulaVistaCA13510). The conflation process may have included other data sources that have not been mapped to GERS and released as bridge files.
Next steps
- Examine the source data for building
08b29a4c428ebfff02002b827866f466
by looking up the OSM ID in OpenStreetMap - Explore the other components of GERS: registry, changelog, and reference map
- Follow our GERS tutorial