Foursquare’s Places API continues to provide businesses with powerful POI search results and help deliver personalized data-driven user experiences. Just two years ago we launched our updated Places API, offering an enhanced database and even more seamless delivery for enterprises, and are now diving into the origin and technology behind this award-winning API.
Motivation & Decision Making
It all started with v2
Foursquare’s v2 API was initially created with the purpose of powering our own consumer applications. However, it soon became obvious that there was a lot of value in allowing external developers to build their own applications and experiences on our first-party platform. That’s what led us to release our API to the public back in 2011. Over the years, we’ve made significant changes to our v2 API and have added new functionality to support third-party use cases. Since its initial launch, over 250,000 developers have built upon our v2 API to power their own apps and services.
However, that’s not to say everything went according to plan. As the years went by, limitations in our v2 API became apparent. Since it shares a backend with our own consumer applications, internal and external functionality becomes heavily coupled. Some endpoints contain dozens of optional parameters that can alter behavior in significant ways – most of which exist to support the 10+ years of features developed for our own apps that didn’t make sense to release in the public API. Each of those added implementation and maintenance complexity, making it difficult to isolate external traffic from our own.
This challenge became increasingly clear as Foursquare sought to build out a globally distributed API presence to better support developers and users outside the U.S. Decoupling external and first-party use cases would make deploying our backend infrastructure around the globe drastically simpler, and allow us new methods of granting access to the API that wasn’t tied to the concept of a Foursquare app user.
We also wanted to revisit our API framework itself: a set of internal libraries built on top of Twitter’s Finagle framework. It has served us well over the years and even now we have no plans to abandon it as long as our v2 API exists. However, a decade is an eternity in software engineering and there was a strong desire to explore the advances made in off-the-shelf frameworks in the meantime. Standards like OpenAPI, which would allow us to better ensure our public documentation was accurate and up to date, were also still in their infancy when our v2 API was created. With any luck, we’d find a modern platform to build upon that would serve us well for another 10 years.
Specs, Specs, Specs
Unlike our v2 API, which was originally just a public release of an internal product, the v3 API gave us a chance to essentially start from scratch, keeping in mind the lessons of our past and setting new goals for what we wanted to see in an external product. We had a decade of existing developer use cases and feedback to pull from and performed some market research with prospective new customers on what they’d like to see in a new API. The result of this was then our starting point for implementation: a set of initial product specs for endpoints and platform behavior developed in collaboration with our product and sales organizations.
On the engineering side, we further broke down general platform behavior into separate work streams such as API schema, versioning, documentation, billing, authentication, routing, and monitoring. Each of these became its own spec, owned by one or two engineers who were responsible for gathering background information and creating an implementation proposal that could be shared and discussed with others working on the v3 API. While technology leads and product marketers generally had the final say when conflicts arose, this approach allowed everyone on the involved teams to own a piece of the speccing process and breakdown of the work involved, providing them their own opportunity for personal growth alongside our API product
Choosing a Framework
While the design aspects of many of these workstreams (e.g. schema) were fairly independent of each other and of implementation details, we were still forced to make some important decisions early on in the process, most notably choosing a framework on the backend. However, this was a pretty open-ended proposition; we had done some exploratory work using Python as an API backend several years earlier, which at least left us with a desire to constrain ourselves to the JVM, a platform Foursquare is intimately familiar with as a (primarily) Scala-based company. Aside from being able to draw on a decade of experience running Scala code in production, sticking to the JVM gave us the option of reusing existing code and systems that have been well-battle-tested over the years.
There are a multitude of server frameworks on the JVM to choose from and choosing one would mean sifting through many. Therefore, our strategy was to simply parallelize information gathering and encourage everyone working on the project to take some time researching frameworks and report back on what they found. From there, we were able to select several for further prototyping based on factors such as features provided, community engagement, and compatibility with our existing stack.
The full decision-making process here could easily be a blog post of its own, but after giving everyone the chance to play around with three prototype APIs, we ultimately settled on Twitter’s Finatra framework, a modest but modern evolution of our existing Finagle-based stack.
To GraphQL or Not to GraphQL
Another core decision we were forced to tackle early on centered around the structure of the API itself. In the years since Foursquare’s v2 API was released, a new query language called GraphQL came into existence and steadily gained popularity as an alternative to REST APIs. Two core features of GraphQL, a self-describing interface to data (which would put our Places dataset front and center) and a mechanism for clients to selectively request individual response fields, were also key goals we had for the v3 API. While equivalent tools for interface documentation (e.g. the OpenAPI spec and Swagger) exist for REST APIs, foregoing GraphQL would still likely mean implementing some kind of field selection framework ourselves.
That’s not to say GraphQL is perfect. It still lacks robust tooling on some platforms (Scala does have Sangria, which seems to have since picked up steam again), and would have required some creative thinking with regards to billing for data access. Ultimately the question came down to what would provide the best experience for developers accessing our API, and mention of GraphQL support as even a nice-to-have was virtually non-existent in our feedback from existing and prospective API consumers. Therefore, we decided to forgo GraphQL (at least for the time being).
Building an API
With specs, prototyping, and (most) decision-making behind us, it was now time to start building. Mercifully, our breakdown of the project into separate work streams and endpoints meant this was an easily parallelizable process. Even where dependencies did exist, having well-defined interfaces between these pieces meant we were largely able to unblock development by rebasing on top of in-flight work.
As an example, classpath conflicts with a handful of Finatra dependencies meant it was some time before we were actually able to merge support for it. While working to untangle those, we were still able to branch off the incompatible changeset to begin developing the endpoints themselves in parallel and avoiding a bottleneck on a thorny technical issue. By the time our classpath conflicts were resolved, we already had functional endpoint code in place.
Beyond basic endpoint implementation, there were three key areas we were trying to tackle:
1. Sharing code with our v2 API
One cynical and oversimplified view of software engineering is that it is mostly a challenge of managing dependencies and reusing old code to solve a new problem. Regardless of the truth of that, code structure and reuse was a common theme in the development of our v3 API, where many of our early endpoints were close cousins to their v2 counterparts. As a general rule, sharing implementation logic makes long term code maintenance less of a burden, though for us it comes with a potentially large upfront refactoring cost: our v2 API was originally written with Foursquare’s consumer apps in mind and that history is deeply ingrained in its code. On the other hand, it’s not always desirable to couple implementation logic together from a product standpoint – in our case, we will very likely want to evolve our consumer apps and our c2 API separately from the v3 API in the future, and coding around a shared implementation layer could actually lead to more complexity in the long run. Never mind the risk involved in making heavy modifications to a less thoroughly tested but currently stable consumer API codebase.
Nowhere was this issue more pronounced than in our new v3/places/search endpoint. This endpoint was intended to be the replacement to our most modern v2 search endpoint, v2/Search/Recommendations, and is easily one of the largest and most complex systems at Foursquare in terms of the sheer scope of data and computation involved, both online and offline. It is a behemoth to even understand, let alone consider modifying. In the end, as we only cared about a subset of its functionality, we opted to not refactor the existing endpoint into a shared backend and instead architected a new code framework for the v3 endpoint to untangle the search flow into explicit intents (search by place name, search by category, etc). This still allowed us to largely reuse our existing data fetchers, models, scoring logic, and the offline jobs that supported them and struck a nice compromise: reusing modular components in a new endpoint architecture.
2. Migrating API logic to a new backing dataset
A primary goal for the v3 API was building on top of Foursquare’s integrated places dataset, as opposed to solely the first-party venue data currently powering our v2 API, City Guide, and Swarm apps. In the v2 API, this first-party data forms the entrypoint to numerous offline pipelines computing derived datasets that in turn power online consumer experiences and API functionality: ratings, geographic density, search models, and much more. Some of this could be reused as is, but others required reworking the pipelines to compute from the new dataset and deploy the results back into online production. Consequently, we also needed to fork some API code to read from these new datasets where semantic differences arose.
3. Multi-region support
As mentioned previously, one of the primary goals of our v3 API was being able to offer global deployment in multiple regions for both better performance and resiliency – a monumental challenge in and of itself. Foursquare has an amazing team of infrastructure engineers who collaborated with us here to spec out and prototype a completely new service platform built on top of Amazon’s Elastic Kubernetes Service, which now forms the model for all online service deployments at Foursquare.
Aside from the significant work performed by our infrastructure team to provision, configure, instrument, and network clusters around the globe, two significant challenges in any migration to a multi-region backend are determining exactly what data and services need to be replicated where and ensuring traffic in each region is self-contained and isolated outside of a failover scenario. Fortunately for us, we have a robust distributed tracing system instrumented throughout our backend, which made solving these problems much easier than digging through hundreds of thousands of lines of code – at a glance providing detailed information about the entire lifetime of an API request distributed across (potentially) dozens of backend services and data collections. This let us know exactly what needed to be replicated, and served as a critical debugging tool as we worked to isolate service discovery for each of them.
v3 is Born
The most exciting part of any large project is watching it all come together into a functional product and (hopefully) feeling validation for the many months of hard work in the process. It’s also a terrifying prospect: all the careful planning and pre-release testing in the world won’t guarantee a smooth product launch. Much can be done to ease into a release though, and in the case of API v3 we took an incremental approach to make our general availability (GA) launch more of a seamless transition:
1. Internal QA sessions (“bug bashing”)
Foursquare has a long history of “testing parties” for new releases: getting everyone involved into a room and running down a list of features to validate (or break, as the case may be). This results in a spreadsheet of bugs, quirks, spec non-compliance, and other issues causing a less-than-ideal user experience, which can then be prioritized for follow-up. Think collaborative brainstorming, but for breaking in a new product. Some strategies we’ve found useful:
- Begin with clearly defined behavior to test.
- Coordinate throughout to avoid duplicating effort.
- Prioritize documenting reproducible buggy behavior. There is plenty of time to follow up and triage root causes afterwards.
2. “Dark test” traffic from our v2 API
“Dark testing,” or duplicating production traffic off an existing API endpoint in order to load test a new one, is another long-time release practice at Foursquare. The infrastructure powering this is old and relatively simple: a Scala app tailing our proxy logs, with hooks to register listeners for individual request paths. These listeners can transform the request as needed before sending it off to the new API at an adjustable rate. It’s a useful tool for load testing, particularly for its ability to generate fully organic test traffic.
For example, rather than test our new search API with a few manually pre-seeded queries, we adapted live queries on our v2 search endpoints to the new v3 version, slowly ramping up this “dark traffic” until we could be confident that our new endpoint backend would stand up in real-time production. We continued to load test with dark traffic as our first few partners began to onboard and up until we had enough self-sustaining traffic to keep our backend services warm and ensure consistent performance on its own.
3. Limited release to beta partners
Getting external feedback is critical for any new product; it’s great to eat your own dogfood so to speak, but it doesn’t matter how good it tastes if the dogs won’t eat it too. Soft launching to a handful of beta partners gave us the final validation that our expectations for the product aligned with our partners’ expectations and real-world use cases, as well as the final chance to gather feedback and address any serious shortcomings before going out to sell the new product.
4. GA release to enterprise partners
Our first GA launch was still somewhat limited; we started onboarding new paying customers, but only enterprise partners with whom we had a contractual relationship. Having known points of contact meant we could coordinate with these early users to provide them a smooth onboarding experience and also have a bit more control over how our nascent API would be used as it began its production life.
This is also the point at which we considered the main product “finalized” and were committed to backward compatibility from here on out.
5. Public GA release
Eventually, we launched to the general public, meaning anyone could sign up for a Foursquare Developer account and generate a token for v3 API access (it only takes a couple minutes, give it a try!). By this point, we were fully confident our new backend could take whatever the internet might throw at it, and it has been chugging away without issue ever since.
It has been quite a journey, but Foursquare’s API was built first and foremost to provide external developers the best possible experience building upon our Places data. We have a robust foundation to build upon as we continue working on even more ambitious new features for 2022 and beyond, which will serve as a cornerstone of our platform for years to come.
Last year’s launch of the v3 API also presents a prime opportunity to look back at what went well during its development and how we can better ourselves for the future.
Some of our key takeaways:
1. Prototype (and bug bash) early and often
This is something we largely failed to do outside of choosing a framework, and our development time suffered for it. Getting functional code in place, however rough around the edges, helps surface issues with specs/roadmaps and other unknowns earlier in the development process and makes it significantly easier to course correct as needed. Critically, prototypes also serve to cut down on uncertainty and ground technical discussion and planning in concrete terms. It’s always easier to accurately prioritize and estimate time to fix specific bugs in a functional product than it is to accurately guess how long fixing yet-unknown bugs in a non-functional product will take.
2. Ensure your engineering and product teams are aligned
There are few decisions to be made in engineering that are purely technical; even decisions such as what backend infrastructure or framework to use and how to share existing code are dictated in large part by the desired product experience. Product decisions are also largely dictated by technical capabilities. Some of the largest product questions we faced were around what experience we wanted our search API to provide and how to define and monitor SLAs. The answers to both of those were heavily dependent on what our technology could reasonably support.
One thing we did reasonably well here was adjust our team processes to facilitate broader communication between teams and individuals working on the new API. During our speccing process and early development, this took the form of a weekly open-agenda discussion time, and as time went on we added and removed regular sync meetings between involved parties as needed. Through it all, a shared project Slack channel formed the crux of our day-to-day conversations and collaboration, providing a central forum for online discussion and a searchable history for stakeholders to follow.
3. Plan for unknowns, and adjust roadmaps as needed
There will always be surprises, or the “unknown unknowns,” and accurately accounting for them in advance is one of the hardest parts of project planning. That being the case, it’s helpful to revisit timelines as new issues or information comes to light.
Six months into development of the v3 API, Foursquare merged with Factual, and the resulting re-organizations caused a major shake-up of the engineering and product teams working on the project. It’s clear in hindsight that we didn’t adjust our roadmap enough to compensate for this and ended up launching significantly behind our original plan as a result. A much less extreme case saw us spending several unplanned weeks prior to launch working to improve the quality of results from our new search endpoint.
While we didn’t expect either of those delays ahead of time, the flexibility to push back our launch timeline was invaluable in ensuring we released a product we could be proud of. Accountability to product deadlines is important, but it’s equally important to realize that adhering to a previous deadline will compromise quality and make the appropriate adjustments. We’re fortunate to work with a product org that understands this.
Foursquare’s v3 API wouldn’t have been possible without the contributions and efforts of dozens of individuals in our engineering and product organizations. Since its launch last year, we’ve received a tremendous amount of positive feedback from customers and partners alike, expressing their excitement and love for the technology. Since its launch, we’ve received a tremendous amount of feedback, including winning Best in Enterprise APIs at API World’s for two years in a row in 2021 and 2022. . Our award-winning API strengthens Foursquare’s roster of enterprise-grade location solutions and continues our mission to make location data an ingrained part of business strategies everywhere.