Data Migration Guide
Foursquare Places now encompasses the best aspects of both legacy POI data sets - unifying Factual’s best in class methods for collecting POI data and core attributes with Foursquare’s strength in collecting fresh first-party and user-generated content – and leads the industry in data accuracy and freshness.
The purpose of this migration guide is to provide additional details to our partners and clients who are currently using Legacy Foursquare Venues data in their products and services. It is intended for product managers and developers that currently own dataset ingestion and usage within their products and code.
Delivery Format and Cadence
New Foursquare Places deliveries will be custom to each client and offered on a client defined schedule with standard options being: monthly, weekly, or daily. Monthly deliveries will be posted on the 1st day of every month. Weekly deliveries can be configured for same day every week, the default day being Sunday. Clients have the option to choose which day of the week they prefer to have deliveries posted. Foursquare Places will now be available on Amazon S3 or Amazon Data Exchange. Clients will be required to supply their own ARN for access to Foursquare S3 buckets.
Quality Filters
We apply a quality filter for all of our client deliverables in order to ensure only the highest quality of POIs are delivered. The current quality filtering logic:
- Excludes VRS Low (likely “fake” or not real)
- Excludes Existence High Recall (likely closed)
- Excludes “People” POIs (ie doctors, insurance agents)
Some legacy and potentially new customers may prefer to receive any or all of the types of POIs above or alternatively, do the filtering on their end. For these use cases, we do have datasets that are not filtered with the quality logic listed above. These datasets, however, do have commercial viability logic applied to remove POIs that are not as relevant for enterprise customers (POIs for our consumer app check-in use case). The commercial viability filtering logic:
- Excludes Deleted POIs
- Excludes Private POIs
- Excludes Geography Categories (States/Municipalities, City, County)
Attribute Data Type & Format Changes
This section outlines details of DataType Changes and Format Changes of specific attribute migrations as listed in Legacy Foursquare Places Schema Mapping. The majority of new Foursquare Places Attributes maintain the same data type and format as Legacy Foursquare Attributes. A handful of attribute migrations involve simple format changes and/or data type changes that are outlined here.
Special Considerations: For collection data types such as Arrays and Lists, flat file deliveries are generated with JSON formatting of these data types. The purpose of this formatting is to provide a standard way to ingest text based values across a variety of languages and frameworks. The information below describes a recommended Data Type post ingestion, and shows an example of the values being delivered in a text based flat file.
Core Data
translatedvenuenames ⇨ name_translated
Legacy Foursquare | New Foursquare Places |
---|---|
Array(String) | String(JSON) |
[[ニューヨーク マリオット マーキス, ja], [New York Marriott Marquis, en]] | [{"lang":"ja", "name":"ニューヨーク マリオット マーキス"},{"lang":"en", "name":"New York Marriott Marquis"}] |
The attribute translatedvenuenames is being replaced with name_translated. The Legacy attribute was an Array of Arrays(String) with a name value and its associated language code. The new attribute is an JSON formatted array of two-keyed objects containing the language code and name values.
state ⇨ region
Legacy Foursquare | New Foursquare Places |
---|---|
Florida | FL |
The attribute state is being replaced with region. The new attribute represents any sub-national level municipal unit like State, Province, etc. US States are now abbreviated compared to the legacy attribute which stored the full name.
countrycode ⇨ country
Legacy Foursquare | New Foursquare Places |
---|---|
US | us |
The attribute countrycode is being replaced with country. The new attribute uses the standard 2-letter ISO country code in lowercase format.
score_openclose ⇨ closed_bucket
The attribute score_openclose is being replaced with closed_bucket. The legacy attribute was the result of a scoring model that contained values
- VeryHigh: indicates the POI is marked as closed in our database.
- High: indicates the POI is likely closed.
- Low: indicates the POI is not likely closed.
- Null: indicates the data is not confident enough to make a judgment.
Coverage was low across the dataset.
The new attribute is the result of a new model trained on thousands of human annotations of Foursquare’s POI and uses features that reference how recent internet sources for the POI have been updated, the last time the POI had a check-in/tip/photo, etc. This results in new array of values that are easier to understand, and there is near 100% coverage across the data. The new values are:
- VeryLikelyClosed: indicates places with probabilities greater than 90% being closed
- LikelyClosed: indicates places with probabilities 70–90% being closed
- Unsure: indicates places w/ probabilities less than 70% closed or open
- LikelyOpen: indicates places with probabilities 70–90% being open
- VeryLikelyOpen: indicates places with probabilities greater than 90% being open
The new model is currently implemented for US only, and the Rest of the World will follow in Q3.
category_primary_id & category_secondary_id ⇨ fsq_category_ids
The attributes category_primary_id and category_secondary_id are being replaced with fsq_category_ids. The Legacy attributes were strings that stored two of the legacy category ids that were 24 character hexadecimal values. The new attribute is an array of integers, that can store more than two category ids. The Category ID format has changed with the new taxonomy and are values between 10000 and 19056. Please refer to the Legacy Foursquare Category Mapping for more details on the Category and Taxonomy changes.
category_primary & category_secondary ⇨ fsq_category_labels
The attributes category_primary and category_secondary are being replaced with fsq_category_labels. The legacy attributes were strings that stored two of the category names associated with category_primary_id and category_secondary_id respectively. The new attribute is an array of strings, that can store more than two category labels. Categories and Taxonomy has been reorganized. The parent level categories are now one of the following:
- Arts and Entertainment
- Business and Professional Services
- Community and Government
- Dining and Drinking
- Event
- Health and Medicine
- Landmarks and Outdoors
- Retail
- Sports and Recreation
- Travel and Transportation
Please refer to the Legacy Foursquare Category Mapping for more details on the Category and Taxonomy changes.
chainid ⇨ fsq_chain_id
Legacy Foursquare | New Foursquare Places |
---|---|
String | Array(String) |
556ca462a7c87f63786aa4d8 | ["3fae3191-08ff-4b62-9077-d8a1182d6ef2"] |
The attribute chain_id is being replaced with fsq_chain_id. The new attribute is an Array of strings that can represent multiple chain associations. The ID value format has changed from a 24-character hexadecimal value to a 128-bit UUID.
chainname ⇨ fsq_chain_name
Legacy Foursquare | New Foursquare Places |
---|---|
String | Array(String) |
IKEA | ["IKEA"] |
The attribute chain_name is being replaced by fsq_chain_name. The new attribute is an Array of strings that allows for multiple chain associations. An example of multiple chain associations would be a Car Dealership that sells vehicles under multiple brands.
Rich Data
hours ⇨ hours
Legacy Foursquare | New Foursquare Attribute |
---|---|
String | Array(String) |
[[1, 990, 1320,], [2, 990, 1320,], [3, 990, 1320,], [4, 990, 1380,],
[5, 690, 1440,], [6, 1050, 1440,], [7, 1050, 1320,]] |
{"saturday":[["9:00","18:00"]],"tuesday":[["9:00","18:00"]],"friday":[["9:00","18:00"]],"thursday":[["9:00","18:00"]],"wednesday":[["9:00","18:00"]],"monday":[["9:00","18:00"]]}
|
photo1, photo2, photo3 ... ⇨ photos
Legacy Foursquare | New Foursquare Attribute |
---|---|
String | Ordered Array(String) |
http://ir.4sqi.net/img/general/original/_photo_1_.jpg |
["http://ir.4sqi.net/img/general/original/_photo_1_.jpg", "http://ir.4sqi.net/img/general/original/_photo_2_.jpg]
|
http://ir.4sqi.net/img/general/original/_photo_2_.jpg |
taste1, taste2, taste3 ... ⇨ tastes
Legacy Foursquare | New Foursquare Attribute |
---|---|
String | Ordered Array(String) |
clippers |
["clippers","Lakers games","live music","concerts","NBA games"]
|
Lakers games |
|
live music |
|
concerts |
|
NBA games |
tip1, tip2, tip3 ... ⇨ tips
Legacy Foursquare | New Foursquare Attribute |
---|---|
String | Ordered Array(String) |
[5116c6e6e4b02c4e257efcfd, Watch Grammy Night LIVE on VH1.com starting Sunday 6/5C!] |
[["5116c6e6e4b02c4e257efcfd", "Watch Grammy Night LIVE on VH1.com starting Sunday 6/5C!"],["50c11139e4b0fe2a3acf99c0", "This is where we won our first-ever Stanley Cup on June 11, 2012"]]
|
[50c11139e4b0fe2a3acf99c0, This is where we won our first-ever Stanley Cup on June 11, 2012] |
Boolean Tags
Legacy Foursquare | New Foursquare Places |
---|---|
String (t\f) | Boolean |
t | TRUE |
All legacy boolean tag attributes that stored true or false as a String with values of 't' or 'f' are being replaced with true Boolean values of TRUE or FALSE.
Updated 6 months ago