The Extent of Geographic Resources Available on the Web
In this paper, we describe a methodology to estimate the extent of geographic resources available on the web without the need for secondary knowledge or complex geo-tagging. This is achieved by randomly selecting toponyms from the Ordnance Survey 50K gazetteer to create search queries and thus gather document counts from various web sources for Great Britain. The same gazetteer is then used to geo-code the results and enable mapping. To validate our approach, and demonstrate the effects of geo/non-geo and geo/geo ambiguity, we mapped the selected toponyms to Geograph, a community project that contains user generated geo-tagged photographs of the UK. Although success varies with resolution, the proposed approach is likely sufficient to be reliably used by applications exploring the geographic coverage of the web for cases where references to settlements are likely to be common. In our case, we applied the method to produce maps of web coverage for a range of sources at a resolution of 30km.
This paper is also being presented at the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008).