Data Migration Guide
Foursquare Places now encompasses the best aspects of both legacy POI data sets - unifying Factual’s best in class methods for collecting POI data and core attributes with Foursquare’s strength in collecting fresh first-party and user-generated content – and leads the industry in data accuracy and freshness.
The purpose of this migration guide is to provide additional details to our partners and clients who are currently using Legacy Factual Places data in their products and services. It is intended for product managers and developers that currently own Factual Places dataset ingestion and usage within their products and code.
Delivery Format and Cadence
Legacy Factual Places datasets were delivered monthly via Amazon S3, FTP, or Dropbox. Data was updated on a country by country basis on a rotating schedule. New Foursquare Places deliveries will be custom to each client and offered on a client defined schedule with standard options being: monthly, weekly, or daily. Monthly deliveries will be posted on the 1st day of every month. Weekly deliveries can be configured for same day every week, the default day being Sunday. Clients have the option to choose which day of the week they prefer to have deliveries posted. Foursquare Places will now be available on Amazon S3 or Amazon Data Exchange. Clients will be required to supply their own ARN for access to Foursquare S3 buckets. Dropbox and Factual SFTP deliveries are being deprecated.
Quality Filters
We apply a quality filter for all of our client deliverables in order to ensure only the highest quality of POIs are delivered. The current quality filtering logic:
- Excludes VRS Low (likely “fake” or not real)
- Excludes Existence High Recall (likely closed)
- Excludes “People” POIs (ie doctors, insurance agents)
Some legacy and potentially new customers may prefer to receive any or all of the types of POIs above or alternatively, do the filtering on their end. For these use cases, we do have datasets that are not filtered by the quality filter. These datasets, however, do have basic filtering logic applied to remove POIs that are not as relevant for enterprise customers (POIs for our consumer app check-in use case). The commercial viability filtering logic:
- Excludes Deleted POIs
- Excludes Private POIs
- Excludes Geography Categories (States/Municipalities, City, County)
Attribute Data Type & Format Changes
This section outlines details of DataType Changes and Format Changes of specific attribute migrations as listed in Legacy Factual Places Schema Mapping. The majority of new Foursquare Places Attributes maintain the same data type and format as Legacy Factual Attributes. A handful of attribute migrations involve simple format changes and/or data type changes that are outlined here. One special migration, is the replacement of the existence attribute with several new attributes that give end-users more insight into the metrics that make up this score.
Special Considerations: For collection data types such as Arrays and Lists, flat file deliveries are generated with JSON formatting of these data types. The purpose of this formatting is to provide a standard way to ingest text based values across a variety of languages and frameworks. The information below describes a recommended Data Type post ingestion, and shows an example of the values being delivered in a text based flat file.
Core Attributes
category_ids ⇨ fsq_category_ids
The attribute category_ids is being replaced with fsq_category_ids. Although the data type is not changing, and the general format remains consistent, the Categories and Taxonomy has been reorganized. The range of IDs in Legacy Attributes was between 2 and 440. The range of ids for the Foursquare Attribute is between 10000 and 19056. Please refer to the Legacy Factual Category Mapping for more details on the Category and Taxonomy changes.
category_labels ⇨ fsq_category_labels
The attribute category_labels is being replaced with fsq_category_labels. Although the data type is not changing, and the general format remains consistent, the Categories and Taxonomy has been reorganized. The parent level categories are now one of the following:
- Arts and Entertainment
- Business and Professional Services
- Community and Government
- Dining and Drinking
- Event
- Health and Medicine
- Landmarks and Outdoors
- Retail
- Sports and Recreation
- Travel and Transportation
Please refer to the Legacy Factual Category Mapping for more details on the Category and Taxonomy changes.
chain_name ⇨ fsq_chain_name
Legacy Factual | Foursquare Places |
---|---|
String | Array(String) |
Shake Shack | ["Shake Shack"] |
The attribute chain_name is being replaced by fsq_chain_name. The new attribute is an Array of strings that allows for multiple chain associations. An example of multiple chain associations would be a Car Dealership that sells vehicles under multiple brands.
chain_id ⇨ fsq_chain_id
Legacy Factual | Foursquare Places |
---|---|
String | Array(String) |
77b71837-2819-4b77-ab68-432d34f58531 | ["77b71837-2819-4b77-ab68-432d34f58531"] |
The attribute chain_id is being replaced with fsq_chain_id. The new attribute is an Array of strings that can represent multiple chain associations.
existence ⇨ multiple fields
Legacy Factual | DataType | Example |
---|---|---|
existence | Decimal | 0.89 |
Foursquare Places | DataType | Example |
---|---|---|
venue_reality_bucket | String | VeryHigh |
date_closed | String | NULL |
closed_bucket | String | NULL |
provenance_rating | String | 1 |
The attribute existence is being replaced by several new attributes. The legacy attribute was a Decimal storing a score between 0.0 and 1.0 (in 0.1 increments) representing confidence that the record is real, open, and not a duplicate.
- The new attribute venue_reality_bucket is a String with a value in a categorical histogram representing the probability that the POI is real. The potential values are:
Low
,Medium
,High
,Very High
- The new attribute date_closed is a String with a value representing that date a POI was reported closed in our database.
- The new attribute closed_bucket is a String with a value in a categorical histogram representing the probability that the venue is open or closed and the value in date_closed is accurate. The potential values are:
- VeryLikelyClosed: indicates places with probabilities greater than 90% being closed
- LikelyClosed: indicates places with probabilities 70–90% being closed
- Unsure: indicates places w/ probabilities less than 70% closed or open
- LikelyOpen: indicates places with probabilities 70–90% being open
- VeryLikelyOpen: indicates places with probabilities greater than 90% being open
- The new attribute provenance_rating is a String representing a score indicating the authoritativeness of sources. Values range from 1, indicating venue data is acquired directly from business owner, its official website or its location data representative, to 4, indicating venue data is sourced exclusively from open web crawls.
Restaurants
price ⇨ price
Legacy Factual | Foursquare Places |
---|---|
Integer | String |
5 | Expensive |
The attribute price persists as the same name, but with a change to the datatype and format. The Legacy Attribute was an integer on a scale of 1 to 5, representing the average price per person. The new Foursquare Attribute is a string with values that fall into a category: Cheap, Moderate, Expensive, Very Expensive.
rating ⇨ rating
Legacy Factual | Foursquare Places |
---|---|
Decimal | Float |
3.5 | 7.02 |
The attribute rating persists as the same name, but with a change to the datatype and format. The Legacy Attribute was a Decimal on a scale of 1 to 5, rounded to the nearest half with 5 being the highest rating possible. It was an aggregation of ratings across multiple sources. The new Foursquare Attribute is a Float represented as a String on a scale of 1 to 10. It includes tips, likes, dislikes, and visit traffic into its calculation.
options_healthy ⇨ healthydiet
Legacy Factual | Foursquare Places |
---|---|
Boolean | String |
TRUE | Average |
The attribute options_healthy is being replaced by healthydiet. The Legacy Attribute is a Boolean. The new attribute is a String with one of three values: Poor
, Average
, or Great
options_vegan ⇨ vegandiet
Legacy Factual | Foursquare Places |
---|---|
Boolean | String |
TRUE | Poor |
The attribute options_vegan is being replaced by vegandiet. The Legacy Attribute is a Boolean. The new attribute is a String with one of three values: Poor
, Average
, or Great
options_vegetarian ⇨ vegetariandiet
Legacy Factual | Foursquare Places |
---|---|
Boolean | String |
TRUE | Great |
The attribute options_vegetarian is being replaced by vegetariandiet. The Legacy Attribute is a Boolean. The new attribute is a String with one of three values: Poor
, Average
, or Great
payment_cashonly ⇨ takescreditcards
Legacy Factual | Foursquare Places |
---|---|
Boolean | Boolean |
TRUE | FALSE |
The attribute payment_cashonly is being replaced by takescreditcards. Both are of the type Boolean, however the values are inverses. Where the legacy attribute value was TRUE
the new attribute value will be FALSE
and vice versa.
wifi ⇨ wifi
Legacy Factual | Foursquare Places |
---|---|
Boolean | String |
TRUE | p |
The attribute wifi persists as the same name, but with a change to the datatype and format. The Legacy Attribute was a Boolean. The new Foursquare Attribute is a String, with a coded value domain representing the following: 't', 'n', 'f', 'p', 'fp' - true, no, free, paid, free and paid
Hotels
lowest_price ⇨ price
Legacy Factual | Foursquare Places |
---|---|
Integer | String |
150 | Moderate |
The attribute lowest_price is being replaced with price. The Legacy Attribute was an Integer that represented the lower bound price in dollar amount. The new Foursquare Attribute is a description of the price of offerings within the following categories: Cheap, Moderate, Expensive, Very Expensive.
highest_price ⇨ price
Legacy Factual | Foursquare Places |
---|---|
Integer | String |
400 | Very Expensive |
The attribute highest_price is being replaced with price. The Legacy Attribute was an Integer that represented the upper bound price in dollar amount. The new Foursquare Attribute is a description of the price of offerings within the following categories: Cheap, Moderate, Expensive, Very Expensive.
pets ⇨ goodfordogs
Legacy Factual | Foursquare Places |
---|---|
Array(String) | String |
[True, Fee] | Great |
The attribute pets is being replaced with goodfordogs . The Legacy Attribute was an Array that included a true or false value, sometimes followed by a second value denoting a fee. The new Foursquare Attribute is a String with one of three values: Poor
, Average
, or Great
, It is specifically representing dog friendly policies not all pets.
internet ⇨ wifi
Legacy Factual | Foursquare Places |
---|---|
Array(String) | String |
[True, Paid] | p |
The attribute internet is being replaced with wifi. The Legacy Attribute was an Array that included a true or false value, sometimes followed by a descriptive value denoting free, paid, wired, or wireless. The new Foursquare Attribute is a String, with a coded value domain representing the following: 't', 'n', 'f', 'p', 'fp' - true, no, free, paid, free and paid. The inclusion of 'wired' has been deprecated.
rating ⇨ rating
Legacy Factual | Foursquare Places |
---|---|
Decimal | Float |
3.5 | 7.02 |
The attribute rating persists as the same name, but with a change to the datatype and format. This attribute maps to the same Foursquare Attribute as the Restaurant rating attribute. The Legacy Attribute was a Decimal on a scale of 1 to 5, rounded to the nearest half with 5 being the highest rating possible. It was an aggregation of ratings across multiple sources. The new Foursquare Attribute is a Float represented as a String on a scale of 1 to 10. It includes tips, likes, dislikes, and visit traffic into its calculation.
Placerank
placerank ⇨ popularity
Legacy Factual | Foursquare Places |
---|---|
Integer | String |
88 | 0.88 |
The placerank attribute is being replaced with popularity. The Legacy Attribute was an Integer on a scale of 0 to 100, with 100 being the most significant or most popular. The new Foursquare Attribute is a String with values between 0 and 1, with 1 being the most popular. Scores are based on foot traffic and normalized by country.
Crosswalk
crosswalk_id_facebook ⇨ facebook_id
Legacy Factual | Foursquare Places |
---|---|
Integer | String |
265393942504 | 265393942504 |
Although Crosswalk IDs are being deprecated - Foursquare Places does offer an identifier for Facebook. The crosswalk_id_facebook attribute is available under the attribute name facebook_id in Foursquare Places now as a string.
Updated 6 months ago