Overview

A unified schema

Overture is developing one schema to structure all of our datasets. We follow the JSON schema standard in our schema design and we use OGC geometries to map the features in our datasets. The schema itself is written in YAML for readability and ease of use.

GeoParquet

Although JSON serves as our mental models for defining the Overture schema, we distribute our datasets in GeoParquet, a column-oriented format optimized for handling large-scale geospatial datasets.

There are key differences in how geometries and other feature properties are represented in our schema design and how they are represented in the GeoParquet files we deliver to our users. In Overture's schema design, we encode geometry objects as human-readable Point, LineString, Polygon, and MultiPolygon types. A feature consists of a single geometry object accompanied by a set of properties represented as key-value pairs.

This same feature can be represented as a single row in a GeoParquet file, with the geometry in one column — encoded as Well-Known Binary (WKB) or native arrow-encoded coordinate columns format — and other feature properties filling out additional columns in the file.

Top-level properties

In the Overture schema, all features have a unique id called a GERS ID, a geometry object that follows the OGC geometry specification, and the following top-level properties:

Loading ....

The data types for each property in the Overture schema design do not translate exactly to the permitted data types in Parquet and GeoParquet. We release our datasets with the top-level properties encoded in this way:

GeoParquet columns for top-level Overture properties

column_name	column_type	description
id	string	an Overture feature's unique id, part of the Global Entity Reference System (GERS)
geometry	binary	well-known binary (WKB) representation of the feature geometry
bbox	struct<xmin: float, xmax: float, ymin: float, ymax: float>	area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0.
theme	string	one of six Overture data themes
type	string	one of 14 Overture feature types
version	int32	version number of the feature, incremented in each Overture release where the geometry or attributes of this feature changed
sources	list<element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double, between: list<double>>>	array of source information for the properties of a given feature

Other key schema properties

Most but not all of the feature types in the Overture schema require data for the names, subtype, and class properties. The names property is complex enough to have its own schema, which we describe in detail here.

Properties may be specific to a feature type

Some properties in the Overture schema are only populated with data for specific feature types. For example, the place feature type must include data for the categories property, as required by the schema. The division_area and address feature types require the country property to be populated with ISO 3166-1 alpha-2 country codes. The segment feature type in the transportation theme is the only feature type that includes data for a complex set of properties that describe roads. The schema concepts section of this documentation describes these complexities in detail.

Schema conventions

In addition to following the JSON and GeoJSON specifications, the Overture schema has its own style and conventions. The notations, nomenclatures, specifications, and standards we have adopted are described below.

Notations

Snake case

We use snake case instead of camel case for all property names, string enumeration members, and string-valued enumeration equivalents. We do this because of case sensitivity and transformation issues in different databases and query engines. For example, Athena/Trino downcases everything, so text string splits in camel case property names get lost; in contrast, snake case passes through without issues.

Booleans

Boolean properties have a prefix verb "is" or "has" in a way that grammatically makes sense, e.g. has_street_lights=true and is_accessible=false.

Measurements

Measurements of real-world objects and features follow The International System of Units (SI): heights, widths, lengths, etc. In the Overture schema, these values are provided as scalar numeric value without units such as feet or meters. Overture does this to maximize consistency and predictability.

Quantities specified in regulatory rules, norms and customs follow local specifications wherever possible. In the schema, these values are provided as two-element arrays where the first element is the scalar numeric value and the second value is the units. Overture uses local units of measurement: feet in the United States and meters in the EU, for example. The exact unit is confirmed in the specification of the property but is not repeated in the data.

Regulations and restrictions

All quantities that relate to posted or ordinance regulations and restrictions are expressed in the same units as used in the regulation. The unit is explicitly included with the property in the data.

Opening hours and validity periods

Opening hours and the time frame during which time dependent properties are applicable are indicated following the OSM Opening Hours specification.

A unified schema​

GeoParquet​

Top-level properties​

Other key schema properties​

Properties may be specific to a feature type​

Schema conventions​

Notations​

Snake case​

Booleans​

Measurements​

Regulations and restrictions​

Opening hours and validity periods​