DuckDB
DuckDB is powerful analytics tool that allows you to query remote files and download only the data you want. You'll need to install at least DuckDB 1.1.0, which supports reading and writing geoparquet.
Installation
Install DuckDB. You'll need extensions to work with spatial data in the cloud. Run the following commands in the DuckDB command line:
INSTALL SPATIAL;
to install the duckdb_spatial extension.INSTALL httpfs;
orINSTALL azure;
to read from either Amazon S3 (httpfs) or Microsoft Azure Blob Storage (azure).
Downloading Overture Data
These SQL queries will access Overture data in the cloud and download only the attributes that you request.
- Places
- Buildings
- Roads
- Mountains
- Counties
- As GeoParquet
Download all pizza restaurants in New York City as a GeoJSON
file.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY( -- COPY <query> TO <output> saves the results to disk.
SELECT
id,
names.primary as name,
confidence AS confidence,
CAST(socials AS JSON) as socials, -- Ensure each attribute can be serialized to JSON
geometry -- DuckDB understands this to be a geometry type
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
WHERE categories.primary = 'pizza_restaurant'
AND bbox.xmin BETWEEN -75 AND -73 -- Only use the bbox min values
AND bbox.ymin BETWEEN 40 AND 41 -- because they are point geometries.
) TO 'nyc_pizza.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
See the places schema to learn more about each attribute or the common schema concepts to better understand names.primary
from names.common
.
Access Overture buildings data hosted on Azure to download named buildings around Detroit, Michigan. Saves as newline-delimited GeoJSON.
LOAD spatial; --noqa
COPY(
SELECT
id,
names.primary as primary_name,
height,
geometry
FROM read_parquet('az://overturemapswestus2.blob.core.windows.net/release/2025-05-21.0/theme=buildings/type=building/*', filename=true, hive_partitioning=1)
WHERE names.primary IS NOT NULL
AND bbox.xmin BETWEEN -84.36 AND -82.42
AND bbox.ymin BETWEEN 41.71 AND 43.33
LIMIT 100
) TO 'detroit_buildings.geojsonseq' WITH (FORMAT GDAL, DRIVER 'GeoJSONSeq');
Download roads in Paris around the Arc de Triomphe and save as GeoJSON.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
class,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=transportation/type=segment/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin > 2.276
AND bbox.ymin > 48.865
AND bbox.xmax < 2.314
AND bbox.ymax < 48.882
) TO 'paris_roads.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
Download mountain peaks in Oregon, USA from the base theme, convert the elevation attribute to feet, and save the result as a GeoJSON file.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
CAST(elevation * 3.28084 AS INT) AS elevation_ft,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=base/type=land/*', filename=true, hive_partitioning=1)
WHERE subtype = 'physical' AND class IN ('peak','volcano') AND elevation IS NOT NULL
AND bbox.xmin BETWEEN -124.71 AND -116.47
AND bbox.ymin BETWEEN 41.99 AND 46.30
) TO 'oregon_peaks.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
County-level geometries for Pennsylvania from the divisions theme and outputs a GeoPackage.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY(
SELECT
id,
division_id,
names.primary,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM
read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=divisions/type=division_area/*', hive_partitioning=1)
WHERE
subtype = 'county'
AND country = 'US'
AND region = 'US-PA'
) TO 'pennsylvania_counties.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG');
DuckDB v1.1.0 supports reading and writing GeoParquet directly. It recognizes the geometry
type and will write the appropriate metadata.
LOAD spatial; -- noqa
COPY(
SELECT
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=places/type=place/*')
LIMIT 100000
) TO 'places.parquet';
DuckDB recognizes the input as geoparquet and will automatically cast the geometry column to a GEOMETRY
type.
The COPY
command writes a parquet file of 100,000 places named places.parquet
with the appropriate geoparquet metadata.
To download the data as shapefile
, replace the last line of any of the above queries:
) TO 'overture_data.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');
or as a geopackage
:
) TO 'overture_data.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG');
The DuckDB Spatial extension supports more than 50 geospatial file formats.
You can also use DuckDB to loop over the Parquet files in a directory and convert the data to a spatial format of your choice. The spatial extension uses GDAL so you could export to flatgeobuf if that's the data format that works best for you. Here's a bash script that shows you how:
for f in *.parquet;
do duckdb -c \
"install spatial;
load spatial;
COPY (SELECT id, st_geomfromwkb(geometry) from read_parquet('$f'))
to '$f.fgb' WITH (FORMAT GDAL, DRIVER 'flatgeobuf');";
done
Regional Extracts
We can use data from Overture's division theme to define our bounds for a regional extract.
This example stores the GERS ID for the division in the division_id
variable. If you know the GERS ID, you can simply set it with: SET variable division_id = '<GERS ID>'
.
If you don't know the ID, you can set the variable to the result of a query for that specific division, as shown here:
INSTALL spatial; --noqa
LOAD spatial; --noqa
-- SET variable division_id = <GERS ID HERE>
-- Or, search for it:
SET variable division_id = (
SELECT
id
FROM
read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=divisions/type=division/*.parquet')
WHERE
names.primary = 'Marion County' AND subtype = 'county'
LIMIT 1
);
-- Fetch the bounds and geometry of the Region
CREATE OR REPLACE TABLE bounds AS (
SELECT
id AS division_id, names.primary, geometry, bbox
FROM
read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=divisions/type=division_area/*.parquet')
WHERE
division_id = getvariable('division_id')
);
-- Extract the bounds and geometry of the division into variables for faster table scan
SET variable xmin = (select bbox.xmin FROM bounds);
SET variable ymin = (select bbox.ymin FROM bounds);
SET variable xmax = (select bbox.xmax FROM bounds);
SET variable ymax = (select bbox.ymax FROM bounds);
SET variable boundary = (select geometry FROM bounds);
-- Create a local GeoParquet file.
COPY(
SELECT
*
FROM
read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=buildings/type=building/*.parquet')
WHERE
bbox.xmin > getvariable('xmin')
AND bbox.xmax < getvariable('xmax')
AND bbox.ymin > getvariable('ymin')
AND bbox.ymax < getvariable('ymax')
AND ST_INTERSECTS(
getvariable('boundary'),
geometry
)
) TO 'extract.parquet';
-- Convert GeoParquet to line-delimited GeoJSON (or any other GDAL format)
COPY(
SELECT
id,
subtype,
class,
height,
names.primary as name,
geometry
FROM 'extract.parquet'
) TO 'extract.geojsonseq' WITH (FORMAT GDAL, DRIVER 'GeoJSONSeq');
This query is broken into discrete steps for efficiency:
- Using DuckDBs variables in the
WHERE
clause proved more performant than joining to the bounds table. - Storing the result first as GeoParquet and then converting to another format is significantly faster than doing it all in one step.
Next steps
- Visualize the data in QGIS or kepler.gl.
- Make an interactive map.
- Consult one of our Data Guides to learn more about the data and for more advanced queries.