Overview
A unified schema
Overture is developing one schema to structure all of our datasets. We follow the JSON schema standard in our schema design and we use OGC geometries to map the features in our datasets. The schema itself is written in YAML for readability and ease of use.
GeoParquet
Although JSON serves as our mental models for defining the Overture schema, we distribute our datasets in GeoParquet, a column-oriented format optimized for handling large-scale geospatial datasets.
There are key differences in how geometries and other feature properties are represented in our schema design and how they are represented in the GeoParquet files we deliver to our users. In Overture's schema design, we encode geometry objects as human-readable Point, LineString, Polygon, and MultiPolygon types. A feature consists of a single geometry object accompanied by a set of properties represented as key-value pairs.
This same feature can be represented as a single row in a GeoParquet file, with the geometry in one column — encoded as Well-Known Binary (WKB) or native arrow-encoded coordinate columns format — and other feature properties filling out additional columns in the file.
Top-level properties
In the Overture schema, all features have a unique id
called a GERS ID, a geometry
object that follows the OGC geometry specification, and the following top-level properties:
The data types for each property in the Overture schema design do not translate exactly to the permitted data types in Parquet and GeoParquet. We release our datasets with the top-level properties encoded in this way:
GeoParquet columns for top-level Overture properties
column_name | column_type | description |
---|---|---|
id | string | an Overture feature's unique id, part of the Global Entity Reference System (GERS) |
geometry | binary | well-known binary (WKB) representation of the feature geometry |
bbox | struct<xmin: float, xmax: float, ymin: float, ymax: float> | area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0. |
theme | string | one of six Overture data themes |
type | string | one of 14 Overture feature types |
version | int32 | version number of the feature, incremented in each Overture release where the geometry or attributes of this feature changed |
sources | list<element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double>> | array of source information for the properties of a given feature |
Other key schema properties
Most but not all of the feature types in the Overture schema require data for the names
, subtype
, and class
properties. The names
property is complex enough to have its own schema, which we describe in detail here.
Properties may be specific to a feature type
Some properties in the Overture schema are only populated with data for specific feature types. For example, the place
feature type must include data for the categories
property, as required by the schema. The division_area
and address
feature types require the country
property to be populated with ISO 3166-1 alpha-2 country codes. The segment
feature type in the transportation theme is the only feature type that includes data for a complex set of properties that describe roads. The schema concepts section of this documentation describes these complexities in detail.
Schema conventions
In addition to following the JSON and GeoJSON specifications, the Overture schema has its own style and conventions. The notations, nomenclatures, specifications, and standards we have adopted are described below.
Notations
Snake case
We use snake case instead of camel case for all property names, string enumeration members, and string-valued enumeration equivalents. We do this because of case sensitivity and transformation issues in different databases and query engines. For example, Athena/Trino downcases everything, so text string splits in camel case property names get lost; in contrast, snake case passes through without issues.
Booleans
Boolean properties have a prefix verb "is" or "has" in a way that grammatically makes sense, e.g. has_street_lights=true
and is_accessible=false
.
Measurements
Measurements of real-world objects and features follow The International System of Units (SI): heights, widths, lengths, etc. In the Overture schema, these values are provided as scalar numeric value without units such as feet or meters. Overture does this to maximize consistency and predictability.
Quantities specified in regulatory rules, norms and customs follow local specifications wherever possible. In the schema, these values are provided as two-element arrays where the first element is the scalar numeric value and the second value is the units. Overture uses local units of measurement: feet in the United States and meters in the EU, for example. The exact unit is confirmed in the specification of the property but is not repeated in the data.
Regulations and restrictions
All quantities that relate to posted or ordinance regulations and restrictions are expressed in the same units as used in the regulation. The unit is explicitly included with the property in the data.
Opening hours and validity periods
Opening hours and the time frame during which time dependent properties are applicable are indicated following the OSM Opening Hours specification.