The Cluster and Outlier analysis module allows you to calculate statistically significant hot-spots, cold-spots, and spatial outliers, then quickly visualize the results.
Using Cluster and Outlier Analysis, answer questions such as:
- Which areas of California are disproportionately affected by pollution?
- Are there any significant spatial patterns in the listing price of Berlin's Airbnbs?
- Which restaurants in New York have significantly more/fewer visits than nearby competitors?
To conduct the Cluster and Outlier analysis, Studio applies Anselin's Local Indicators of Spatial Association (LISA), specifically the local Moran statistic, to identify geographical clusters of values or find geographical outliers.
This method has been widely used in spatial applications including environmental and natural resource analysis, real estate analysis, criminology studies, public health research, political geography and demographics studies, and much more.
Find examples and more information in the Cluster and Outlier Analysis use case article.
Follow these steps to perform a cluster and outlier analysis in Studio.
Cluster and Outlier Analysis requires your map to contain a
Navigate to the Analysis tab in Studio, then click Cluster and Outlier Analysis.
The input layer must be a
geojson layer with
polygon geometries. This is the layer on which Studio will conduct the analysis of local Moran statistics.
Select a field to use as values for the analysis of local Moran statistics. The attribute field must be from a dataset associated with your input layer.
For local Moran statistics, we recommend you select an attribute field containing quantitative variables.
Studio provides two types of spatial weights:
Input the number of nearest neighbors to ensure all spatial objects have the same number of neighbors. Defaults to
Input a distance unit (KM or Miles), creating a distance threshold to determine neighbors.
By default, Studio will suggest a distance that ensures each spatial object has at least one neighbor.
In local Moran statistics, permutation-based inference generates a pseudo p-value used to evaluate the significance of each cluster.
Studio allows you to modify the following local Moran parameters:
Permutations are used to determine the probability of finding the actual distribution of the values under analysis. This is accomplished by comparing many random datasets to the local Moran's I of your original data.
Input the number of permutations to compute the pseudo p-value. Defaults to
Input a number serving as a P-value threshold, allowing you to only display significant clusters on the map. Defaults to
Note: In permutation-based inference, the smallest pseudo p-value is computed as
1/(permutations + 1).
For example, given a p-value of
999, the smallest pseudo p-value is
Click Run to generate the results of your cluster and outlier analysis.
The results of the analysis are shown in a preview table. If you are not satisfied with the results, tweak the parameters and click Rerun.
The results are stored in a data table containing the following columns:
|Attribute Field||The value of the selected |
|latitude (optional)||The latitude value, only when |
|longitude (optional)||The longitude value, only when |
|lisa||The local Moran's I value|
|spatial_lag||The average (standardized) value of the neighbors|
|cluster||The type of spatial association - 0 for not significant, 1 for High-High, 2 for Low-Low, 3 for High-Low, 4 for Low-High, 5 for isolated (no neighbors)|
|pvalue||The pseudo p-value is the significance value computed from the random permutations|
|neighbors||The array of row indices of the neighbors|
When you are satisfied with the results of your analysis, click Confirm to proceed to the visualization.
Upon completing the cluster and outlier analysis, a new layer and dataset are generated.
If the input layer was a
point layer, a connectivity graph will appear to visualize the neighboring/connectivity relationship among spatial objects. Mouse over a point to highlight neighboring points (defined by the spatial weights configuration).
Cluster types are visualized by color-coding geometries to represent the cluster type. A chart will generate, serving as a legend for the cluster types.
The local Moran statistic takes the data values and the associated geographical locations as input, then returns statistically significant clusters in four types:
|High-High||Hot spot clusters with high values surrounded by other high values.|
|Low-Low||Cold spot clusters with low values surrounded by other low values.|
|High-Low||Spatial outlier with high values surrounded by low values.|
|Low-High||Spatial outlier with low values surrounded by high values.|
This visualization can be customized via the Layer configuration.
CalEnviroScreen is a screening methodology that can be used to help identify California communities that are disproportionately burdened by multiple sources of pollution. Use the slider to view statewide data on the left, and a cluster/outlier analysis on the right.
Find examples and other information in the Cluster and Outlier Analysis use case article.
Updated 2 months ago