DuckDB
DuckDB is powerful analytics tool that allows you to query remote files and download only the data you want.
Installation
Install DuckDB. You'll need extensions to work with spatial data in the cloud. Run the following commands in the DuckDB command line:
INSTALL SPATIAL;
to install the duckdb_spatial extension.INSTALL httpfs;
orINSTALL azure;
to read from either Amazon S3 (httpfs) or Microsoft Azure Blob Storage (azure).
Downloading Overture Data
These SQL queries will access Overture data in the cloud and download only the attributes that you request.
- Places
- Buildings
- Roads
- Mountains
- Counties
- As GeoParquet (Experimental)
Download all pizza restaurants in New York City as a GeoJSON
file.
LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';
COPY( -- COPY <query> TO <output> saves the results to disk.
SELECT
id,
names.primary as name,
confidence AS confidence,
CAST(socials AS JSON) as socials, -- Ensure each attribute can be serialized to JSON
ST_GeomFromWKB(geometry) as geometry -- GDAL understands this to be the geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
WHERE categories.primary = 'pizza_restaurant'
AND bbox.xmin BETWEEN -75 AND -73 -- Only use the bbox min values
AND bbox.ymin BETWEEN 40 AND 41 -- because they are point geometries.
) TO 'nyc_pizza.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
See the places schema to learn more about each attribute or the common schema concepts to better understand names.primary
from names.common
.
Access Overture buildings data hosted on Azure to download named buildings around Detroit, Michigan. Saves as newline-delimited GeoJSON.
LOAD spatial;
LOAD azure;
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=https;AccountName=overturemapswestus2;AccountKey=;EndpointSuffix=core.windows.net';
COPY(
SELECT
id,
names.primary as primary_name,
height,
ST_GeomFromWKB(geometry) as geometry
FROM read_parquet('azure://release/2024-08-20.0/theme=buildings/type=building/*', filename=true, hive_partitioning=1)
WHERE names.primary IS NOT NULL
AND bbox.xmin BETWEEN -84.36 AND -82.42
AND bbox.ymin BETWEEN 41.71 AND 43.33
) TO 'detroit_buildings.geojsonseq' WITH (FORMAT GDAL, DRIVER 'GeoJSONSeq');
Download roads in Paris around the Arc de Triomphe and save as GeoJSON.
LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
class,
ST_GeomFromWKB(geometry) as geometry -- GDAL understands this to be the geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=transportation/type=segment/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin > 2.276
AND bbox.ymin > 48.865
AND bbox.xmax < 2.314
AND bbox.ymax < 48.882
) TO 'paris_roads.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
Download mountain peaks in Oregon, USA from the base theme, convert the elevation attribute to feet, and save the result as a GeoJSON file.
LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
CAST(elevation * 3.28084 AS INT) AS elevation_ft,
ST_GeomFromWKB(geometry) as geometry,
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=base/type=land/*', filename=true, hive_partitioning=1)
WHERE subtype = 'physical' AND class IN ('peak','volcano') AND elevation IS NOT NULL
AND bbox.xmin BETWEEN -124.71 AND -116.47
AND bbox.ymin BETWEEN 41.99 AND 46.30
) TO 'oregon_peaks.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
County-level geometries for Pennsylvania from the divisions theme and outputs a GeoPackage.
LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';
COPY(
SELECT
id,
division_id,
names.primary,
ST_GeomFromWKB(geometry) as geometry
FROM
read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area/*', hive_partitioning=1)
WHERE
subtype = 'county'
AND country = 'US'
AND region = 'US-PA'
) TO 'pennsylvania_counties.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG');
The Nightly build (bleeding edge) of DuckDB includes support for reading and writing GeoParquet. For example, using at least DuckDB v1.0.1-dev4917, you can run the following query:
COPY(
SELECT
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
LIMIT 100000
) TO 'places.parquet';
DuckDB recognizes the input as geoparquet and will automatically cast the geometry column to a GEOMETRY
type.
The COPY
command writes a parquet file of 100,000 places named places.parquet
with the appropriate geoparquet metadata.
To download the data as shapefile
, replace the last line of any of the above queries:
) TO 'overture_data.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');
or as a geopackage
:
) TO 'overture_data.gpkg' WITH (FORMAT GDAL, DRIVER 'GeoPackage');
The DuckDB Spatial extension supports more than 50 geospatial file formats.
Next steps
- Visualize the data in QGIS or kepler.gl.
- Make an interactive map.
- Consult one of our Data Guides to learn more about the data and for more advanced queries.