Applying Affinity Propagation to OpenStreetMap Urban Data: Amenity-Based Clusters
Posted on January 20, 2025 • 9 min read • 1,883 wordsMapping neighborhoods using OSM & affinity propagation Amenity-based urban analysis & clustering tech.
Effective neighborhood categorization is essential for urban planning and development. It allows for a systematic framework that helps identify different areas characteristics and needs. This approach improves safety vibrancy and economic stability by focusing development where it’s needed most. Today I’m going to explore how we can use OpenStreetMap (OSM) data together with a sophisticated clustering technique to create neighborhood categories. I hope it gives you another interesting way of using location intelligence. Instead of viewing cities through abstract theoretical models, we can develop specific practical categorizations to target initiatives effectively using a large open source datasets that anyone has access to, combined with modern clustering algorithms. I also look into where this innovative method fits in developing cities and towns like those found in Thailand
The amenities available to the inhabitants such as parks, schools healthcare facilities and restaurants play a pivotal role in community attractiveness. These are crucial factors influencing the quality of life of inhabitants, drawing new residents businesses as well. In addition, amenity concentration plays a key role in repopulating and attracting more urban and downtown investment. We do a much better job of studying a neighborhood based on human experience, rather than some metrics such as just social status and income. Despite a wide range of neighborhood categorization studies most use amenities only as auxiliary metrics rather than focusing on it. So what if we study neighborhoods based on human factors such as daily needs in their residential location and places that form neighborhood vitality. By shifting the emphasis on amenities, we unlock opportunities to implement focused development strategies in specific urban locations.
A notable hurdle in undertaking neighbourhood categorization is the access of data. Data especially of quality and coverage can be particularly difficult to obtain in less economically advanced regions such as in Thailand where urban development challenges such as unequal distribution is prevalent. OpenStreetMap provides a cost effective way of circumventing this by standardizing the acquisition of volunteered geographic data. As an open-source resource with a great variety of geo-coded amenity-focused points-of-interest(POI), OSM allows us to capture rich real world neighbourhood characteristics. Instead of using proprietary and hard-to-acquire data that often come with limitations. We get high-quality, accessible information on urban structure for location analysis which I believe makes it such an innovative resource for planners. So, OpenStreetMap and similar resources can play a large part in urban planning moving into the future as location technologies continue to innovate
So how do we sift through millions of data points for our study? Affinity propagation (AP) clustering becomes a potent and modern way to create clusters compared with other clustering methods. Compared with popular approaches such as K-Means that produces average points, and the need to predefine number of clusters. AP has the novel characteristic of creating natural groupings without forcing clusters and produces exemplary point within each clusters. The approach AP works based on the similarities between each data point. The key metric AP uses include responsibilities which gauge how a point fits its cluster using messages and the other is availability. These metrics are iteratively updated allowing a data set of “m” amount of points to naturally converge into groups that are well formed. Due to these properties AP produces results that are far more nuanced and specific based on each dataset when compared with other methods that might only cluster due to distance between two points. AP does a lot to help urban studies as it can be more suitable for data driven studies when compared to previous clustering methods that might be suitable for the datasets
In a study based on more than 4 million Points of Interest across 7,213 sub-districts within Thailand. This location-based methodology that incorporates both open source data together with a modern clustering algorithms reveals ten distinctive clusters based on amenities alone. By utilizing OpenStreetMap data to define all those geo-coded point of interest, together with the affinity propagation to organize that data; allows us to see and generate new real world urban classifications which may not exist under the framework of typical data models. As such our data suggests specific characteristics such as how and where cities have amenities, in relation to its suburban and hinterlands that make up its overall community ecosystem
Minimal Hinterlands: Predominantly rural, sparsely populated and low in most kinds of amenity concentration with a typical layout centered around farmers and small-time business owners.
Basic (Amenity) Hinterlands: Like “Minimal Hinterlands”, these locations still follow agriculture at their heart. They tend to be slightly higher in amenities like gas stations and local factories which show signs of increased integration into industrial chains.
Basic (Natural) Hinterlands: With less concentrated amenities than above these hinterland communities emphasize more on the surrounding natural assets like natural features and waterfalls, which contribute more to the local lifestyle rather than traditional urbanization.
Developing Hinterlands: Locations that are slightly above rural areas in urban growth these areas begin their ascent by having basic amenities like markets places, temples as well as visitor attractions. It signifies early stage suburban transformation.
Necessity Zones: These locations are classified as typical suburbs where their populations require important basic community needs. Necessities including shopping, hospitals, schools all in a small town environment is available to meet human living requirements.
Green Enclaves: More ideal for human flourishing they tend to include good access to natural and leisure activities, and higher quality living conditions compared with other types. In Thailand as with elsewhere its presence tend to signal urban communities where amenity diversity tend to be higher.
Historical Sites: These uniquely identifiable locations carry significance due to important landmarks, cultural relevance. These areas are distinct by often historical architecture, monuments and/or cultural attractions. This is one of the most recognizable cluster as human engagement tends to gravitate towards historical landmarks with unique attraction to visitors from all around the world.
Urban Villages: Areas which appear less condensed and generally feature less density of amenities within. They tend to reside at urban peripheries indicating that while more accessible compared with less urban clusters it also suggests that suburban density still needs more progress.
City Centers: These locations have higher number of POIs representing high concentrations of urban infrastructure which includes shops eateries and visitor spots. Typical urban clusters these act as the heartbeat for much economic activities where many amenities aggregate together.
Commercial Hubs: These dense commercial districts features a wide range of shops, amenities such as office spaces, hotels with other business needs. The highest density of such urban markers clearly highlight key areas of development in cities.
The results indicated clear differences between districts, not only confirming our typical human perspective of differences between areas but further demonstrating the application of data based methodology, which can be both insightful and novel. Using these clusters, planners and policy-makers have access to granular analysis of neighborhoods across urbanized area from rural, suburbia, city center and the business district. We see a practical value proposition where resources may be utilized optimally.
To understand more about our results Principal Component Analysis (PCA) is also useful, where by looking at various variables in terms of direction, we understand how neighborhoods evolve as the data shows. The horizontal axis (PC1), representing amenity concentration where districts shift from sparse hinterland areas to dense metropolitan hubs. This indicates how urbanizing tends to produce ever-increasing points of activity and commercial development, such as we see in Bangkok the capital of Thailand. The vertical axis(PC2) is categorized from bottom to the top where it categorizes environments from nature, or natural amenity rich places on the lower and more commercially oriented zones in the higher, showing that urbanization, human and natural development is not always parallel, instead they often are at polar opposites to one another within most environments in the world.
Through PCA our visualization shows that many areas, especially the most dense hubs tends to have an absence of ecological richness. As human development of all kinds increases from rural regions, towns, to city centers; they gradually transition into locations centered on economics with very few locations balancing this by also containing strong integration into nature. Our study using real world datasets demonstrates these long-understood tension between economics and environment in urbanization patterns where policy can better address
From this work our approach in data utilization shows real promise that other urban locations could benefit from the methodology. There are specific suggestions stemming from data-driven approach to enhance existing planning policies:
Targeted Interventions: Since we have identified very diverse and varying areas, resources should be strategically placed based on location needs. It allows planners to address critical needs of residents, not only making for fair distribution, but efficient for allocation as well.
Innovation Districts: Encourage districts where creative people from technology, education and community, which are not located centrally to gain momentum. This may bring opportunity where none was before if local economies are centered around a focal point such as those related to the above-mentioned
Creative Placemaking: To draw communities which tend to aggregate around amenities, planning based around attractions like landmarks and other community hubs becomes far more effective if resources are aggregated at centers. Our classification of areas, both natural and city can reveal where such planning might yield highest dividends
Ecological Urbanism: We noticed in the data few cities in Thailand, in fact likely few around the world combine nature as an integrated amenity within business and industrial clusters. Ecological development is not only essential but crucial in long term sustainable cities where nature and urban can synergize together rather than conflict.
These points do not exist in abstraction; this is the beauty of applying geo data analytics using resources like OSM. By mapping real-world cities and looking into metrics related to the way inhabitants live on the day-to-day, such findings become essential for urban planners moving into the future. In doing so data such as this could prove pivotal.
This methodology based on open-source geo-location data, coupled with AI techniques demonstrates not only insights into a developing region, but a case study into what all developing locations could potentially accomplish as open source data becomes more prevalent. These are not ideas for the future as these concepts, applications, methodologies and algorithms have existed for the better part of a decade. By utilizing these tools we can now see clear-cut actionable insights that did not exist only just recently. So, instead of looking into hypothetical situations data allows us to observe real human life and formulate real actionable results for our neighborhoods
Extra work by conducting side by side comparison between OSM data and existing real-world census as well as integration of other economic social variables can be included. This data driven application signifies what can be accomplished in modern urban data analytics that puts real world factors like people experience at its heart and should be an essential area of focus for everyone interested in urban geography and mapping in general.