Good news for customers! Foursquare has applied our Geosummarizer model to our US POI.
Back in January 2023, Foursquare released a new Geosummarizer model for a small subset of countries (CA, CH, ES, GB, JP, NL). The improved model resulted in an increase in the accuracy of the geocodes for POI by reducing the median distance between them in our dataset by 20%. Today we’re excited to announce that we’ve successfully applied this model to our US POI.
Clients receiving US POI can expect to see a ~24% improvement in median distance from the ground truth, and a net 5M+ POIs are expected to move to the correct buildings.
For those unfamiliar with the Geosummarizer, it is the model that helps select a final Lat/Long for a POI based on its analysis of geocodes from various inputs within its cluster. Examples of the inputs are our first-party user-generated content, third-party crawls, and vetted third-party sources. Accurate geocodes are critical to any Places dataset as they represent the final integrated view of what we see as the best physical representation of a real-world place.
For use cases such as mapping or delivery, an accurate geocode is critical as it’s a component in the algorithm that generates a “dropoff” point. A “dropoff” geocode is defined as the point in the center of a street orthogonally in front of the POI’s rooftop. These geocodes are particularly beneficial for use cases like navigation, vendor mapping, and delivery. Companies like Doordash and Uber use geocodes for picking up and delivering to the right places and rely on our accuracy to build a more accurate location-based product. For use cases like the ones above, an accurate geocode can make all the difference in achieving a successful outcome.
While the end result of the Geosummarizer model appears to be simple, there is a significant amount of complex work that goes on behind the scenes to generate a geocode. Let’s jump into this process to understand more.
Why a Geosummarizer?
Lat/Long is a critical piece in showing where a place is and where we see it in the real world. At Foursquare, we rely on billions of inputs from our consumer apps, such as Swarm, plus third-party data, and web crawls.
We ingest countless data points from multiple sources, but each source can have its own representation of where it believes a place should lie on the map. The objective of the Geosummarizer is to summarize all of these sources and generate a singular representation for the Lat/Long of the geocode. How we do this is a multi-step process that we’ll outline below.
Tiling
The first step in computing geocodes starts with discretizing a map into a grid so that we can work with smaller units of spatial context. This spatial context is our view of the physical world that includes physical entities like roads or buildings and where they are located. Once we have that information, we can convert and transform each of these small building blocks into vectors of numerical attributes. These attributes are highly driven by domain knowledge and a lot of experimentation. At the end of this step, all our inputs are transformed into a matrix of numbers so that they can be fed into any machine-learning algorithm.
Scoring
In addition to having these inputs, seen as the matrix X, we also need an output that we predict as the Y variable. The Geosummarizer is modeled as a regression task, so the Y variable is a real number that is computed using a score that functions to measure how close a point is to the ground truth.
The exact functional form of the score has been developed over the years. With this modeling, a larger score would mean that the point is closer to the ground truth, and a lower score would indicate a further distance. Once we have both our X and Y variables defined, we can learn a function that maps the input to the output. In other words, we can build machine learning pipelines consisting of feature selection, hyperparameter tuning, and model selection.
Training
Once we have trained the algorithm, we can make predictions on the cellular level to analyze the smaller grid units mentioned in the tiling phase. From there, we use the cell with the highest predicted value amongst each of the cells, as a final Lat/Long for that place. What goes hand-in-hand with training these machine learning models is evaluation, and we do a lot of it.
These evaluations range from the archaic method of looking at individual predictions that help us better understand what’s working, to more sophisticated and automated ways of running error analyses like learning curves or computing feature importance scores. Once we are satisfied with the numbers, we’ll release a new model for production and a new data set will be created. This data set is then shared with all of our clients so everyone can benefit from the improved accuracy.
An important thing to note is the scale of all of this work. Foursquare houses massive data sets and multiple machine learning models, one being uniquely trained for each country. While the Geosummarizer is a framework, running it successfully involves setting up and maintaining all data infrastructure and distributed pipelines.
Analyzing the improvements
Country | Production (Baseline) | Stage 2 release candidate | ||
Mean Score | Median Distance | Mean Score | Median Distance | |
US | 0.785 | 13.5 | 0.818 (+4.2) | 10.2 (-24.4) |
Foursquare boasts a large suite of metrics that are used to track precise changes in the models between iterations and to then fine tune to see what can be fixed. Generally, when we want to communicate the progress of a model iteration, we use the mean score and median distance as our two primary metrics. The median distance represents the distance of the prediction to the annotation, and the mean score functions as a way of quantifying the results with a score. This score ranges from zero to one, with one being the most accurate in relation to the ground truth and zero being the least accurate.
For each dataset, we have a continuous process of gathering hand-made annotations that represent the ground truth geocode of a POI in the real world. We then use these annotations as a single source of truth to derive our model metrics and compute comparative statistics.
When comparing the predictions of the Geosummarizer to these annotations, we can see that the overall value of the median distance has decreased and that the score has increased, meaning the geocode is getting closer to the ground truth.
Country | Moved Wrong Building to Correct Building | Moved Wrong Non-Building to Correct Building | Net Gain |
US | 6.7% | 4.3% | 10% |
Client-facing metrics are much simpler to define. Generally, these metrics are broken down into whether they are on the correct building, an incorrect building, or if they were incorrectly on a location that was not a building. Typically, these incorrect non-buildings will be placed on something like a parking lot or sidewalk in the vicinity of the POI. Looking at the table above, we can see the percentage of annotated entities that move from the wrong building and wrong non-building onto the correct building.
Our third metric is the net gain which measures percentage improvement. This score is calculated with the following formula: (correct after – correct before) / correct before.
It is important to note that these metrics and percentages are derived from the annotated sample for each dataset. The annotated datasets are analyzed at a scale that we feel enables us to stand by these percentages, but in certain instances, there may be minor deviations.
Geosummarizer improvements will continue to expand to other countries as we work through tailoring the model to understand the nuances of each country. For readers interested in learning more about the latest data improvements and Geosummarizer, we invite you to watch a recent webinar where Foursquare’s leading data team covered this process in-depth.
For anyone interested in working with Foursquare, we invite you to download our sample data here or reach out to us at Hello@foursquare.com and start your geospatial journey.