Skip to main content

DuckDB

DuckDB is an analytics tool that allows you to query remote files and download only the data you want.

Installation

Install DuckDB locally. You'll need extensions to work with spatial data in the cloud. Using the DuckDB CLI, do the following:

  1. INSTALL SPATIAL; to install the duckdb_spatial extension.
  2. INSTALL httpfs; or INSTALL azure; to read from either Amazon S3 (httpfs) or Microsoft Azure Blob Storage (azure).

Example queries

Mountains

This query selects POIs in the mountain category from the Overture places dataset and outputs them to a GeoJSON file.

LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';

COPY(
SELECT
id,
names.primary as primary_name,
bbox.xmin as x,
bbox.ymin as y,
ST_GeomFromWKB(geometry) as geometry,
categories.main as main_category,
sources[1].dataset AS primary_source,
confidence
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-06-13-beta.0/theme=places/type=*/*', filename=true, hive_partitioning=1)
WHERE main_category = 'mountain' AND confidence > .90
ORDER BY confidence DESC
) TO 'overture_places_mountains_gt90.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
tip

To write the data to a shapefile, replace the last two lines with:

    ) TO 'overture_places_mountains_gt90.shp'
WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');

Buildings in Detroit

This query pulls building geometries and selected attributes from the Overture buildings dataset for an area in Detroit.

LOAD spatial;
LOAD azure;
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=https;AccountName=overturemapswestus2;AccountKey=;EndpointSuffix=core.windows.net';

SELECT
id,
names.primary as primary_name,
height,
ST_GeomFromWKB(geometry) as geometry
FROM read_parquet('azure://release/2024-06-13-beta.0/theme=buildings/type=*/*', filename=true, hive_partitioning=1)
WHERE primary_name IS NOT NULL
AND bbox.xmin > -84.36
AND bbox.xmax < -82.42
AND bbox.ymin > 41.71
AND bbox.ymax < 43.33;

Country polygons

This query grabs country-level geometries and their attributes from the admins dataset and outputs them to a GeoJSON file.

LOAD httpfs;
LOAD spatial;
SET s3_region='us-west-2';

CREATE OR REPLACE VIEW admins_view AS (
SELECT
*
FROM
read_parquet('s3://overturemaps-us-west-2/release/2024-04-16-beta.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)
);

COPY (
SELECT
admins.id,
admins.subtype,
admins.iso_country_code_alpha_2,
names.primary AS primary_name,
sources[1].dataset AS primary_source,
areas.area_id,
ST_GeomFromWKB(areas.area_geometry) as geometry
FROM admins_view AS admins
INNER JOIN (
SELECT
id as area_id,
locality_id,
geometry AS area_geometry
FROM admins_view
) AS areas ON areas.locality_id == admins.id
WHERE admins.admin_level = 1
) TO 'countries.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');