DuckDB
DuckDB is powerful analytics tool that allows you to query remote files and download only the data you want. You'll need to install at least DuckDB 1.1.0, which supports reading and writing geoparquet.
Installation
Install DuckDB. You'll need extensions to work with spatial data in the cloud. Run the following commands in the DuckDB command line:
INSTALL SPATIAL;
to install the duckdb_spatial extension.INSTALL httpfs;
orINSTALL azure;
to read from either Amazon S3 (httpfs) or Microsoft Azure Blob Storage (azure).
Downloading Overture Data
These SQL queries will access Overture data in the cloud and download only the attributes that you request.
- Places
- Buildings
- Roads
- Mountains
- Counties
- As GeoParquet
Download all pizza restaurants in New York City as a GeoJSON
file.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY( -- COPY <query> TO <output> saves the results to disk.
SELECT
id,
names.primary as name,
confidence AS confidence,
CAST(socials AS JSON) as socials, -- Ensure each attribute can be serialized to JSON
geometry -- DuckDB understands this to be a geometry type
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-11-13.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
WHERE categories.primary = 'pizza_restaurant'
AND bbox.xmin BETWEEN -75 AND -73 -- Only use the bbox min values
AND bbox.ymin BETWEEN 40 AND 41 -- because they are point geometries.
) TO 'nyc_pizza.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
See the places schema to learn more about each attribute or the common schema concepts to better understand names.primary
from names.common
.
Access Overture buildings data hosted on Azure to download named buildings around Detroit, Michigan. Saves as newline-delimited GeoJSON.
LOAD spatial; --noqa
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=https;AccountName=overturemapswestus2;AccountKey=;EndpointSuffix=core.windows.net';
COPY(
SELECT
id,
names.primary as primary_name,
height,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM read_parquet('azure://release/2024-11-13.0/theme=buildings/type=building/*', filename=true, hive_partitioning=1)
WHERE names.primary IS NOT NULL
AND bbox.xmin BETWEEN -84.36 AND -82.42
AND bbox.ymin BETWEEN 41.71 AND 43.33
LIMIT 100
) TO 'detroit_buildings.geojsonseq' WITH (FORMAT GDAL, DRIVER 'GeoJSONSeq');
Download roads in Paris around the Arc de Triomphe and save as GeoJSON.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
class,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-11-13.0/theme=transportation/type=segment/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin > 2.276
AND bbox.ymin > 48.865
AND bbox.xmax < 2.314
AND bbox.ymax < 48.882
) TO 'paris_roads.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
Download mountain peaks in Oregon, USA from the base theme, convert the elevation attribute to feet, and save the result as a GeoJSON file.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
CAST(elevation * 3.28084 AS INT) AS elevation_ft,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-11-13.0/theme=base/type=land/*', filename=true, hive_partitioning=1)
WHERE subtype = 'physical' AND class IN ('peak','volcano') AND elevation IS NOT NULL
AND bbox.xmin BETWEEN -124.71 AND -116.47
AND bbox.ymin BETWEEN 41.99 AND 46.30
) TO 'oregon_peaks.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
County-level geometries for Pennsylvania from the divisions theme and outputs a GeoPackage.
LOAD spatial; -- noqa
SET s3_region='us-west-2';
COPY(
SELECT
id,
division_id,
names.primary,
geometry -- DuckDB v.1.1.0 will autoload this as a `geometry` type
FROM
read_parquet('s3://overturemaps-us-west-2/release/2024-11-13.0/theme=divisions/type=division_area/*', hive_partitioning=1)
WHERE
subtype = 'county'
AND country = 'US'
AND region = 'US-PA'
) TO 'pennsylvania_counties.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG');
DuckDB v1.1.0 supports reading and writing GeoParquet directly. It recognizes the geometry
type and will write the appropriate metadata.
COPY(
SELECT
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
LIMIT 100000
) TO 'places.parquet';
DuckDB recognizes the input as geoparquet and will automatically cast the geometry column to a GEOMETRY
type.
The COPY
command writes a parquet file of 100,000 places named places.parquet
with the appropriate geoparquet metadata.
To download the data as shapefile
, replace the last line of any of the above queries:
) TO 'overture_data.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');
or as a geopackage
:
) TO 'overture_data.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG');
The DuckDB Spatial extension supports more than 50 geospatial file formats.
Next steps
- Visualize the data in QGIS or kepler.gl.
- Make an interactive map.
- Consult one of our Data Guides to learn more about the data and for more advanced queries.