Maximize Your POI Data Insights with Foursquare Places Bulk Harmonization

Harmonization

Have you ever sat through the painstaking process of trying to join two geospatial data sets? Perhaps you ran regex matching queries to determine if a set of data contains information related to what you already have? Maybe you’ve scoured the internet looking for obscure datasets to join with? If this sounds like a relatable experience, you may be familiar with “data harmonization.” There is a way to solve for these headaches and alleviate blockers – supercharging your own datasets and databases with trusted points of interest (POI). Perhaps you needed more information about locations you work with daily but had a difficult time finding reliable sources.

It goes without saying that POI data can be especially tricky to harmonize, with differing coordinates, names, regional descriptions, and other small (but significant) differences. To help simplify this process, Foursquare’s Places API team developed a Python script to harmonize POI datasets with our vast POI dataset. Since Foursquare boasts more than 550+ integrated partners and 100 million POI that are continually updated by real users, each POI comes with rich data points such as photos, descriptions, websites, and more.

The Bulk Harmonization Python script uses the Place Match Endpoint to harmonize every row of a delimited file, matching your POIs to our extensive dataset. Through this process, the script maintains the order of an input file and outputs a workable CSV file.

The Place Match API will return the following for each match found:

  • fsq_id – our identifier for the POI
  • match_score –  a confidence rating that our POI is a match (1.00 means we are very, very confident)
  • Standard POI data – such as latitude/longitude, address, and postal code

The match_score has a minimum confidence value of .45. If the matched POI has a match score less than .45, we will throw it out, rather than risk an incorrect match.

Want even more information about a matched POI? To unlock details like a photo, website, phone number, or popular hours, take the fsq_id returned in the matched file and pass it to the Places Details API: 

https://api.foursquare.com/v3/places/{fsq_id}

Now there’s an easy way to correlate existing POIs with a Foursquare POI at any time. Places Details API can be called to retrieve and cache POI metadata client side, for up to 24 hours, while the fsq_id can be stored indefinitely. Learn more about our caching policies here.

Running the Script

The repository’s README has the requirements and detailed instructions for the Bulk Harmonizer script.

To get started:

  1. Clone the repository and switch to the root directory. 
  2. Run the command below to install all the required python libs. 
pip3 install -r requirements.txt
  1. Log into the developer console and generate an API key and save it in a secure location.
  2. Set the environment variables for the API key and the number of threads you want to use to run the script. Use these commands and substitute your values to set the required environment variables. 
export FSQ_API_AUTH_KEY=[Your Foursquare API Key]
export N_THREADS=25

Now you are ready to run the script against your delimited POI data file. The sample repo provides a sample input file and an example command that runs a test of 100 different Starbucks POIs sourced from Kaggle

Please be aware that this test will use actual API credits against your Foursquare account. You can modify the input file to reduce the amount of API calls made if desired. 

python3 harmonize.py --input 100-starbucks.csv --output 100-starbucks-outputFile.csv --separator C  --column_mapping '{"id":"Store Number","name":"Store Name","address":"Street Address","city":"City","postalCode":"Postcode","cc":"Country","state":"State/Province","lat":"Latitude","lon":"Longitude"}' 2>&1 >out.log

The input file contains 13 columns, but we only need to match eight with Foursquare data types to attempt the matches. The script requires that you match the input file’s data columns with inputs for the POI match. For example, we match the ‘state’ key with the ‘State/Province’ column from the input file in the ‘column_mapping’ argument.

Let’s view the 100-starbucks-outputFile.csv file to get an understanding of the generated output:

idmtypestatus_coderesultmatch_scorecategorieschainsfsq_idgeocodes_main_latitudegeocodes_main_longitude
15008-157047ALL200SUCCESS0.8118650793650790[]611646c302a74d9e0678a8c1-34.471764-58.513128
1271-136936ALL200SUCCESS0.6689021164021160[]4b05871cf964a520a18022e3-34.508605-58.523335
15478-161685ALL200SUCCESS0.649623015873016[]89d1c2e897e046a66a08623b-34.513344-58.487929
15158-240092ALL404FAILURE

Not every row will be a match, so the output can be filtered by the result column to only show successes if necessary. We can see a full match_score for each item, indicating how confident we are that this location is a match. Again, anything under .45 has been removed to ensure your joined dataset is as accurate as possible.

The fsq_id identifier can be used to call other API endpoints for additional data, such as Place Details, Tips, or Photos. Store the fsq_id in your database to recall any of the APIs in the future and provide additional context to your customers. 

https://api.foursquare.com/v3/places/{fsq_id}
https://api.foursquare.com/v3/places/{fsq_id}/photos
https://api.foursquare.com/v3/places/{fsq_id}/tips

Use Cases

Correlating POI data can be useful for many types of businesses and research. Banks partner with other businesses to place their ATM in ideal locations. Banks may have all the locations stored, but they likely don’t feature photos and user-generated tips about these locations. The Bulk Harmonization script can help add a fsq_id to each matched location, which can then be used to show customers amenities available near an ATM.

Vendors provide goods and services to restaurants and retailers. A typical vendor will have all the locations where their products are sold stored in a database. More than likely, the vendor won’t have much information besides sales numbers for each location. The Bulk Harmonization script can correlate each of those locations with a fsq_id. The vendor could then use the Place Tips API to see what customers are saying about the retailer and/or their products. This information could help a sales professional make informed decisions about how to approach the business relationship.

Logistics companies work with many types of locations to deliver goods in a safe and timely manner. Additional data about a location could be extremely useful to their drivers. With the fsq_id stored in their database, drivers could see a location’s amenities such as restrooms, if there is a smoking area, or what credit cards are accepted there. 

Wrap Up

Enriching your data with the FSQ Places API has gotten easier than ever! When you are ready to run your own input, modify the example command according to the README and watch the harmonization happen. 

Have feedback or experiencing issues?  Please join the Foursquare Developer Discord Server!

More on capabilities

Let us show you how you can take advantage of Places API

Click here to arrange a meeting