Let’s stop collecting data in the dark! Data coverage statistics are the equivalent of daylight and without them contributors are just stumbling around in the dark. Coverage analysis can help focus data improvement efforts, show progress, and highlight areas where data is excellent. It’s particularly illuminating when considering points of interest (POI) data, which is one of the most challenging subsets to populate and maintain.
A purely quantitative approach, such as a total number of records in a country, doesn’t really provide much insight here. Before we can measure the coverage of collected data, it’s important to predict granular distribution of POIs. We’ve started to perform such predictive analysis based on OSM road network data, as it has some interesting correlations to POI presence.
Using these prediction, we’re building an open-source analysis engine, capable of evaluating a variety of POI datasets and producing coverage metrics. This data allows us to shine a spotlight on any part of the world and clearly see the state of POI data.
In this talk we will dig into our approach for generating predictive data analytics and quality/coverage analysis, and examine how they’ve helped us see OpenStreetMap POI data in a whole new light.