Overview
A unified schema
Overture is developing one schema to structure all of our datasets. We follow the JSON schema standard in our schema design and we use GeoJSON as a model for encoding feature geometries in our datasets. The schema itself is written in YAML for readability and ease of use.
GeoJSON and GeoParquet
Although JSON and GeoJSON serve as our mental models for defining the Overture schema, we distribute our datasets in GeoParquet, a column-oriented format optimized for handling large-scale geospatial datasets.
There are key differences in how geometries and other feature properties are represented in GeoJSON and GeoParquet. In Overture's schema design, we follow the GeoJSON specification and encode geometry objects as human-readable Point, LineString, Polygon, and MultiPolygon types. A feature in GeoJSON consists of a single geometry object accompanied by a set of properties represented as key-value pairs.
This same feature can be represented as a single row in a GeoParquet file, with the geometry in one column — encoded as Well-Known Binary (WKB) or native arrow-encoded coordinate columns format — and other feature properties filling out additional columns in the file.
Top-level properties
In the Overture schema, all features have a unique id
called a GERS ID, a geometry
object that follows the GeoJSON schema specification, and the following top-level properties:
The data types for each property in the Overture schema design do not map exactly to the permitted data types in Parquet and GeoParquet. We release our datasets with the top-level properties encoded in this way:
GeoParquet columns for top-level Overture properties
column_name | column_type | description |
---|---|---|
id | string | an Overture feature's unique id, part of the Global Entity Reference System (GERS) |
geometry | binary | well-known binary (WKB) representation of the feature geometry |
bbox | struct<xmin: float, xmax: float, ymin: float, ymax: float> | area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0. |
theme | string | one of six Overture data themes |
type | string | one of 14 Overture feature types |
version | int32 | version number of the feature, incremented in each Overture release where the geometry or attributes of this feature changed |
sources | list<element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double>> | array of source information for the properties of a given feature |
Other key schema properties
Most but not all of the feature types in the Overture schema require data for the names
, subtype
, and class
properties. The names
property is complex enough to have its own schema, which we describe in detail here.
Properties that may be specific to a feature type
Some properties in the Overture schema are only populated with data for specific feature types. For example, the place
feature type must include data for the categories
property, as required by the schema. The division_area
and address
feature types require the country
property to be populated with ISO 3166-1 alpha-2 country codes. The segment
feature type in the transportation theme is the only feature type that includes data for a complex set of properties that describe roads. The schema concepts section of this documentation describes these schema complexities in detail.
Schema conventions
In addition to following the JSON and GeoJSON specifications, the Overture schema has its own style and conventions. The notations, nomenclatures, specifications, and standards we have adopted are described below.
Notations
- snake case is used for all property names, string enumeration members, and string-valued enumeration equivalents
- boolean properties have a prefix verb "is" or "has" in a way that grammatically makes sense
e.g.
has_street_lights=true
is_accessible=false
Measurements
Measurements of real-world objects and features follow The International System of Units (SI): heights, widths, lengths, etc. In the Overture schema, these values are provided as scalar numeric value without units such as feet or meters. Overture does this to maximize consistency and predictability.
Quantities specified in regulatory rules, norms and customs follow local specifications wherever possible. In the schema, these values are provided as two-element arrays where the first element is the scalar numeric value and the second value is the units. Overture uses local units of measurement -- feet in the United States and meters in the EU, for example. The exact unit is confirmed in the specification of the property but is not repeated in the data.
Regulations and restrictions
All quantities that relate to posted or ordnance regulations and restrictions are expressed in the same units as used in the regulation. The unit is explicitly included with the property in the data.
Opening hours and validity periods
Opening hours and the time frame during which time dependent properties are applicable are indicated following the OSM Opening Hours specification.