Foursquare Data Normalization
Research Visits Feeds Only
The information provided in the following document pertains to Foursquare's Research Visits Feeds only.
How Normalization Works
Foursquare projects ‘normalized visits’ based on observed visits from the millions of consumers in our always-on foot traffic panel.
We apply a weighting to each observed visit. That weighting is based on state, age, and gender and thus referred to as a ‘SAG’ Score. By accounting for any age, gender, and regional skews within the panel, we are able to estimate real world trends.
We recommend using normalized visits rather than ‘raw’ observed visits. Weightings are adjusted to account for fluctuations in the size of our panel, as well as other technical factors (such as changes to how we calculate a visit in our SDK due to an OS update) that occur periodically. Normalization will inherently de-bias our data and control for fluctuations in panel size (since the raw panel visitation in our feeds will show demographic skew and is susceptible to step changes in the scale of our panel).
Why It's Important
Normalized visits should always be used in compiling visits trend analyses to mitigate the effects of changes to our panel user base (both positive and negative). Raw visits will show significant changes in the volume of data over time, due to a variety of factors including:
- Changes to our check-in methodology and the attributes used in our Pilgrim SDK
- Monthly active user (MAU) changes in our partners’ apps due to new version releases, new features, user acquisition efforts such as press and marketing, etc.
- OS updates such as iOS 13
- Changes in our panel (e.g. new apps contributing to our panel)
- Changes in consumer behaviors related to world events like COVID-19
On a given day, we see three panel visits to a Starbucks in New York by users who are female, age 20-24 living in New York State (Cohort A), and four visits from users who are male, 35-39, also living in New York State (Cohort B).
We have 500 active panelists in Cohort A, and the United States Census tells us there are 50,000 people in that demographic. All panelists in that demographic will carry a SAG score of 100 (50,000 / 500). 200 active panelists are in Cohort B, and the Census tells us there are 100,000 in that demographic. Panelists in Cohort B will each carry a SAG score of 500.
Each of Cohort A’s visits will constitute 100 population visits, and each of Cohort B’s will constitute 500.
So, total normalized visit volume on that day to that Starbucks in New York would be
(100 3) + (500 4) = 2,300
Because of how SAG scores are derived, raw visits and normalized visits will not necessarily always follow the same trend at a given point in time. For instance, it is possible for normalized visits to increase or remain stable, while raw visits decrease. SAG scores not only debias our data; they scale and stabilize total normalized visit volume. For instance, with any backend panel changes, we may see an increase/decrease in raw active user volume. This is usually met with a corresponding increases/decreases in raw visit volume. Accordingly, remaining users’ SAG scores will increase/decrease in order to stabilize our projected normalized visit volume and counteract the panel-specific changes in visit volume.
Updated 8 months ago