Skip to main content

DuckDB

DuckDB is powerful analytics tool that allows you to query remote files and download only the data you want.

Installation

Install DuckDB. You'll need extensions to work with spatial data in the cloud. Run the following commands in the DuckDB command line:

  1. INSTALL SPATIAL; to install the duckdb_spatial extension.
  2. INSTALL httpfs; or INSTALL azure; to read from either Amazon S3 (httpfs) or Microsoft Azure Blob Storage (azure).

Downloading Overture Data

These SQL queries will access Overture data in the cloud and download only the attributes that you request.

Download all pizza restaurants in New York City as a GeoJSON file.

LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';

COPY( -- COPY <query> TO <output> saves the results to disk.
SELECT
id,
names.primary as name,
confidence AS confidence,
CAST(socials AS JSON) as socials, -- Ensure each attribute can be serialized to JSON
ST_GeomFromWKB(geometry) as geometry -- GDAL understands this to be the geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
WHERE categories.primary = 'pizza_restaurant'
AND bbox.xmin BETWEEN -75 AND -73 -- Only use the bbox min values
AND bbox.ymin BETWEEN 40 AND 41 -- because they are point geometries.

) TO 'nyc_pizza.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');

See the places schema to learn more about each attribute or the common schema concepts to better understand names.primary from names.common.


tip

To download the data as shapefile, replace the last line of any of the above queries:

) TO 'overture_data.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');

or as a geopackage:

) TO 'overture_data.gpkg' WITH (FORMAT GDAL, DRIVER 'GeoPackage');

The DuckDB Spatial extension supports more than 50 geospatial file formats.

Next steps