This post presents some of the most relevant statistical information about the content of the OpenStreetMap database related to Germany.
These data can be further explored on the OpenAlfa Deutschland Straße website.
How to get the OSM database for Germany
Actually, OpenStreetMap is a single database that encompasses the whole world. It can be downloaded as a single “planet.osm” file from the main OSM site, of from one of its mirror sites.
But event the compressed planet.osm file is more than 30 GBytes in size, and processing it to extract the data about a single country may take several days on a typical server.
Instead, it is easier to download a pre-processed extract for the country of interest, available on some OSM mirrors. The list of sites with downloadable country and area extracts can be found here.
To get the OSM data for Germany, we will download the daily Germany extract from geofabrik.de. It can be downloaded with the command:
1 2 3 |
$ wget http://download.geofabrik.de/europe/germany-latest.osm.bz2 |
The file downloaded is a XML document compressed in bzip2 format, and is 3.2 GBytes in size. There are three types of XML elements in the document: nodes, ways and relations, commented below.
Note: geofabrik.de also makes available for download incremental files. Another post in this blog will be devoted to explaining how to maintain an updated copy of the database, downloading and importing these incremental files.
The incremental files available from geofabrik also allows to make an estimate of the maintenance activity done on the content of the database.
A graphical representation of the size of these files makes it easy to conclude that the maintenance effort for Germany is significative, and faily steady in time:
[visualizer id=”1683″]
Nodes
A node is a point in the map. The mandatory data for a node is a pair of coordinates (latitude,longitude). Every node is also assigned a unique numerical identifier.
There is also some administrative information associated to the node, namely a timestamp, version number, etc.
Optionally, a node can be associated a set of tags with additional information.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<node id="12" lat="51.3400541" lon="9.4818121" version="5" timestamp="2014-02-20T15:19:56Z" changeset="20677113" uid="250214" user="emilde"/> <node id="13" lat="51.3731042" lon="9.5130058" version="2" timestamp="2011-05-08T22:06:06Z" changeset="8087754" uid="134914" user="max60watt"> <tag k="highway" v="bus_stop"/> <tag k="name" v="Bleichplatz"/> <tag k="shelter" v="yes"/> <tag k="uic_ref" v="713807"/> </node> |
The example above shows the two first nodes found in the file downloaded. The first node only has the ID and coordinates of the node, together with the administrative information added as attributes of the XML element.
The second node also has a set of associated tags, that identify the node as a bus stop named “Bleichplatz”.
Node tags
In the file downloaded, 6,625,830 nodes have one or more attached tags.
A tag is just a pair (k,v), where k is the name (“key”) of the tag, and v is the value.
There is a set of normalized tag names (such as “name” or “highway”, as can be seen in the example above). But OSM does not enforce the use of those names. As a result, in the database there are a number of non-normalized tag names, tag names with typos, etc.
It is interesting to analyze statistically the total number of appearances of each tag name. The most frequent are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
k Total --------------------- -------- addr:housenumber 1641200 addr:street 1598324 addr:city 1433385 addr:postcode 1426662 addr:country 1271469 name 1115495 created_by 899386 amenity 830308 highway 612798 source 577144 |
Other tag names of interest, although the number of appearances is lower, are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
k Total ------------------ -------- natural 421993 shop 184098 tourism 168380 bicycle 124404 railway 117487 public_transport 115078 place 112842 building 86328 information 85531 bus 75750 |
Nodes of type ‘natural’
The number of appearances of the most frequent values assigned to the ‘natural’ tag of OSM nodes in Germany are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
v Total -------------- ------ tree 394867 peak 15756 spring 6553 cave_entrance 1171 stone 488 wood 431 water 393 bush 381 cliff 375 beach 242 rock 220 ... |
Ways
There are 11,620,714 ways in the analyzed database.
A way is an ordered sequence of nodes, identified by a unique numerical ID.
Ways are used to represent many different types of paths: roads, rivers, administrative boundaries, etc.
Optionally, there may be one or more tags assigned to a given way, to provide additional information.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
<way id="459168" version="33" timestamp="2014-06-04T09:30:35Z" uid="34927" user="flohoff" changeset="22730875"> <nd ref="1890502615"/> <nd ref="2898837885"/> <nd ref="1954798704"/> <nd ref="1954798688"/> <nd ref="1954798671"/> <nd ref="1954798669"/> <nd ref="290946929"/> <nd ref="1722885361"/> <nd ref="2716796"/> <nd ref="2896829934"/> <nd ref="2896829831"/> <tag k="hazmat" v="designated"/> <tag k="highway" v="tertiary"/> <tag k="maxspeed" v="50"/> <tag k="name" v="Österwieher Straße"/> <tag k="ref" v="K 42"/> <tag k="surface" v="asphalt"/> </way> |
In this sample, we can see the definition of a way of eleven nodes. The administrative information (version, changeset, etc.) is added to the way as attributes of the XML <way> element.
There are also several tags associated to the way in the sample above that identify it as a street (k=”highway”, v=”tertiary”) named “Österwieher Straße”, with a 50 Km/h speed limit.
Way tags
The most frequent names of way tags found in the Germany database are:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
k Total ------------- ------- source 1285196 highway 1087120 nhd:com_id 862429 gnis:fcode 862316 nhd:fdate 861714 nhd:reach_code 855447 gnis:ftype 839075 name 828578 ... |
We can see that “source” is the most frequent tag name for ways. This tag is used to identify the original source of the data for the way.
The next most frequent way tag name is “highway”. This tag is used to identify not only highways, but also streets, tracks, and in general all kinds of paths than are usually traversed by people.
A fair number of ways have also associated a “name” tag.
Other tags of interest that appear assigned to ways are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
k Total ------------- ------ natural 497296 waterway 395406 lanes 295916 amenity 266246 postal_code 248576 bridge 241651 railway 229142 leisure 205158 cycleway 89602 boundary 66530 admin_level 58304 addr:postcode 31734 tourism 8991 |
Values of tags ‘highway’ assigned to ways
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
v Total -------------- -------- track 1999395 residential 1531966 service 1224840 footway 892080 path 623737 unclassified 282343 secondary 255685 tertiary 234708 primary 126334 steps 122780 |
Values of tags ‘natural’ assigned to ways
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
v Total ------------- ------- water 186673 scrub 185633 wood 56066 wetland 16564 grassland 13556 tree_row 9892 heath 7505 cliff 7187 sand 6242 beach 3096 |
Values of tags ‘waterway’ assigned to ways
1 2 3 4 5 6 7 8 9 10 11 12 |
v Total -------------- ------ stream 205215 ditch 102561 drain 50116 river 15808 canal 9013 riverbank 7498 weir 2048 dam 1304 |
Relations
There are 385,081 relations in the OSM database for Germany being analyzed.
A relation in the OSM database is a set of member elements that conform a single entity. Relations are identified by a unique numeric ID, in the same way as nodes and ways. Also, a set of tags can appear assigned to a relation to provide additional information about it.
Members of a relation can be nodes, ways, and even other relations.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
<relation id="10995" version="212" timestamp="2014-06-04T10:24:40Z" uid="1642071" user="Dr_K_Nick" changeset="22731706"> <member type="way" ref="5203789" role="forward"/> <member type="way" ref="23896582" role="backward"/> <member type="way" ref="9377373" role="forward"/> <member type="way" ref="8591200" role="forward"/> <member type="way" ref="23897154" role="forward"/> ... <member type="node" ref="60975897" role=""/> <member type="node" ref="31115526" role=""/> <member type="node" ref="25711369" role=""/> <tag k="colour" v="#0000FF"/> <tag k="name" v="200 (gegen den Uhrzeigersinn)"/> <tag k="network" v="GVH"/> <tag k="operator" v="üstra"/> <tag k="ref" v="200"/> <tag k="route" v="bus"/> <tag k="type" v="route"/> </relation> |
In the example above, the relation is a bus route named “200 (gegen den Uhrzeigersinn)”, operated by “üstra”, that is part of the GVH network. The ordered sequence of ways define the traject followed by the route, and the nodes are the bus stops and other relevant points in the route.
Relation tags
Relations are usually given a tag named “type”. In the database being analyzed, the most frequent values assigned to the “type” tag are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
type Total ----------------- -------- multipolygon 125536 restriction 66289 route 59531 associatedStreet 43710 boundary 27261 public_transport 24422 TMC 8297 destination_sign 4683 turnlanes:turns 4436 route_master 3462 ... |
Among them, relations of type “boundary” are used to group the set of ways that delimit a region (administrative or of some other kind), as one or more closed polygons, that may contain inner “holes”.
Relations that have associated a “type=boundary” tag are also given a tag named “boundary”, whose value details the type of boundary. The most frequent values assigned to “boundary” tags are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
boundary Total ------------------------- -------- administrative 19672 postal_code 8205 political 537 protected_area 214 national_park 49 public_transport 46 civil 33 local_authority 23 religious_administration 21 ... |
We can see the greater number of boundaries found in the database are administrative boundaires, that delimit the regions of Germany at several levels: Bundesländer, Regierungsbezirke, Landkreise, etc…
The second most frequent number of boundaries are the areas covered by the postal codes.
Other boundary types found on the OSM database for Germany are different kinds of geographical areas of interest: protected areas, national parks, etc.
—