Skip to main content

DuckDB

DuckDB is an analytics tool that allows you to query remote files and download only the data you want.

Installation

Install DuckDB locally. You'll need extensions to work with spatial data in the cloud. Using the DuckDB CLI, do the following:

  1. INSTALL SPATIAL; to install the duckdb_spatial extension.
  2. INSTALL httpfs; or INSTALL azure; to read from either Amazon S3 (httpfs) or Microsoft Azure Blob Storage (azure).

Example queries

Mountains

This query selects POIs in the mountain category from the Overture places dataset and outputs them to a GeoJSON file.

LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';

COPY(
SELECT
id,
names.primary as primary_name,
bbox.xmin as x,
bbox.ymin as y,
ST_GeomFromWKB(geometry) as geometry,
categories.main as main_category,
sources[1].dataset AS primary_source,
confidence
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/type=*/*', filename=true, hive_partitioning=1)
WHERE main_category = 'mountain' AND confidence > .90
ORDER BY confidence DESC
) TO 'overture_places_mountains_gt90.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
tip

To write the data to a shapefile, replace the last two lines with:

    ) TO 'overture_places_mountains_gt90.shp'
WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');

Buildings in Detroit

This query pulls building geometries and selected attributes from the Overture buildings dataset for an area in Detroit.

LOAD spatial;
LOAD azure;
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=https;AccountName=overturemapswestus2;AccountKey=;EndpointSuffix=core.windows.net';

SELECT
id,
names.primary as primary_name,
height,
ST_GeomFromWKB(geometry) as geometry
FROM read_parquet('azure://release/2024-07-22.0/theme=buildings/type=*/*', filename=true, hive_partitioning=1)
WHERE primary_name IS NOT NULL
AND bbox.xmin > -84.36
AND bbox.xmax < -82.42
AND bbox.ymin > 41.71
AND bbox.ymax < 43.33;

County polygons

This query grabs county-level geometries for Pennsylvania from the divisions dataset and outputs them to a GeoJSON file.

LOAD httpfs;
LOAD spatial;
SET s3_region='us-west-2';

SELECT
id,
division_id,
names.primary,
ST_GeomFromWKB(geometry) as geometry
FROM
read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=divisions/type=division_area/*', hive_partitioning=1)
WHERE
subtype = 'county'
AND country = 'US'
AND region = 'US-PA'