Given a set of features (Input Feature Class) and an analysis field (Input Field), the Cluster and Outlier Analysis tool identifies spatial clusters of features with high or low values. The tool also identifies spatial outliers. To do this, the tool calculates a local Moran's I value, a z-score, a pseudo p-value, and a code representing the cluster type for each statistically significant feature. The z-scores and pseudo p-values represent the statistical significance of the computed index values.
- Raster Local Moran P Value Proposition
- Raster Local Moran P Values
- Raster Local Moran P Value
- Raster Local Moran P Value Calculator
Calculations
View additional mathematics for the local Moran's I statistic.
Jun 27, 2019 The count of cell with value 1 in the INTERSECTION raster is 22,822, while in the UNION raster is 37,716. I have computed Moran's I with ape using. I want to test the correlation in the values between 2 spatial raster data sets (that perfectly overlap). This tool creates a new Output Feature Class with the following attributes for each feature in the Input Feature Class: Local Moran's I index, z-score, p-value, and cluster/outlier type (COType). The z-scores and p-values are measures of statistical significance which tell you whether or not to reject the null hypothesis, feature by feature.
Interpretation
A positive value for I indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. For more information on determining statistical significance, see What is a z-score? What is a p-value? Note that the local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value. The z-scores and p-values reported in the output feature class are uncorrected for multiple testing or spatial dependency.
The cluster/outlier type (COType) field distinguishes between a statistically significant cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surrounded primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH). Statistical significance is set at the 95 percent confidence level. When no FDR correction is applied, features with p-values smaller than 0.05 are considered statistically significant. The FDR correction reduces this p-value threshold from 0.05 to a value that better reflects the 95 percent confidence level given multiple testing.
Output
This tool creates a new output feature class with the following attributes for each feature in the input feature class: local Moran's I index, z-score, p-value, and COType.
When this tool runs in ArcMap, the output feature class is automatically added to the table of contents (TOC) with default rendering applied to the COType field. The rendering applied is defined by a layer file in <ArcGIS>/ArcToolbox/Templates/Layers. You can reapply the default rendering, if needed, by importing the template layer symbology.
Permutations
Permutations are used to determine how likely it would be to find the actual spatial distribution of the values that you are analyzing by comparing your values to a set of randomly generated values. Even with complete spatial randomness (CSR), some degree of clustering will always be observed simply due to randomness. Permutations will generate many random datasets and compare these values to the Local Moran's I of your original data. To do this, each permutation randomly rearranges the neighborhood values around each feature and calculates the Local Moran's I value of this random data. By looking at the distribution of the Local Moran's I generated from permutations, you can see the range of Local Moran's I values that could reasonably be due to randomness. If there is a statistically significant spatial pattern in your data, you expect the Local Moran's I values generated from permutations to display less clustering than the Local Moran's I value from your original data. A pseudo p-value is then calculated by determining the proportion of Local Moran's I values generated from permutations that display more clustering than your original data. If this proportion (the pseudo p-value) is small (less than 0.05), you can conclude that your data does display statistically significant clustering.
Choosing the number of permutations is a balance between precision and increased processing time. Increasing the number of permutations increases precision by increasing the range of possible values for the pseudo-p. For example, with 99 permutations, the precision of the pseudo-p value is .01 (1/99+1), and for 999 permutations, the precision is .001 (1/999+1). A lower number of permutations can be used when first exploring a problem, but it is best practice to increase the permutations to the highest number feasible for final results.
Best practice guidelines
- Results are only reliable if the input feature class contains at least 30 features.
- This tool requires an input field such as a count, rate, or other numeric measurement. If you are analyzing point data, where each point represents a single event or incident, you might not have a specific numeric attribute to evaluate (a severity ranking, count, or other measurement). If you are interested in finding locations with many incidents (hot spots) and/or locations with very few incidents (cold spots), you will need to aggregate your incident data prior to analysis. The Hot Spot Analysis (Getis-Ord Gi*) tool is also effective for finding hot and cold spots. Only the Cluster and Outlier Analysis (Anselin Local Moran's I) tool, however, will identify statistically significant spatial outliers (a high value surrounded by low values or a low value surrounded by high values).
- Select an appropriate conceptualization of spatial relationships.
- When you select the SPACE_TIME_WINDOW conceptualization, you can identify space-time clusters and outliers. See Space-Time Analysis for more information.
- Select an appropriate distance band or threshold distance.
- All features should have at least one neighbor.
- No feature should have all other features as a neighbor.
- Especially if the values for the input field are skewed, each feature should have about eight neighbors.
Potential applications
The Cluster and Outlier Analysis (Anselin Local Moran's I) tool identifies concentrations of high values, concentrations of low values, and spatial outliers. It can help you answer questions such as these:
- Where are the sharpest boundaries between affluence and poverty in a study area?
- Are there locations in a study area with anomalous spending patterns?
- Where are the unexpectedly high rates of diabetes across the study area?
Applications can be found in many fields including economics, resource management, biogeography, political geography, and demographics.
Additional resources
Anselin, Luc. 'Local Indicators of Spatial Association—LISA,' Geographical Analysis 27(2): 93–115, 1995.
Mitchell, Andy. The ESRI Guide to GIS Analysis, Volume 2.ESRI Press, 2005.
The previous notebook provided several illustrations of the power ofvisualization in the analysis of spatial data. This power stems fromvisualizations ability to tap into our human pattern recognition machinery.
In this notebook we introduce methods of exploratory spatial data analysisthat are intended to complement geovizualization through formal univariate andmultivariate statistical tests for spatial clustering.
Imports
Spatial Autocorrelation
Visual inspection of the map pattern for the prices allows us to search forspatial structure. If the spatial distribution of the prices was random, then weshould not see any clustering of similar values on the map. However, our visualsystem is drawn to the darker clusters in the south west as well as the center,and a concentration of the lighter hues (lower prices) in the north central andsouth east.
Our brains are very powerful pattern recognition machines. However, sometimesthey can be too powerful and lead us to detect false positives, or patternswhere there are no statistical patterns. This is a particular concern whendealing with visualization of irregular polygons of differning sizes and shapes.
The concept of spatialautocorrelation relates to the combination of two types of similarity: spatialsimilarity and attribute similarity. Although there are many different measuresof spatial autocorrelation, they all combine these two types of simmilarity intoa summary measure.
Let’s use PySAL to generate these two types of similaritymeasures.
Spatial Similarity
We have already encountered spatial weightsin a previous notebook. In spatial autocorrelation analysis, the spatial weightsare used to formalize the notion of spatial similarity. As we have seen thereare many ways to define spatial weights, here we will use queen contiguity:
Attribute Similarity
So the spatial weight between neighborhoods $i$ and $j$ indicates if the two are neighbors (i.e., geographically similar). What we also need is a measure ofattribute similarity to pair up with this concept of spatial similarity. Thespatial lag is a derived variable that accomplishes this for us. For neighborhood$i$ the spatial lag is defined as:
The quintile map for the spatial lag tends to enhance the impression of valuesimilarity in space. It is, in effect, a local smoother.
However, we still havethe challenge of visually associating the value of the prices in a neighborhodwith the value of the spatial lag of values for the focal unit. The latter is aweighted average of prices in the focal unit’s neighborhood.
To complement the geovisualization of these associations we can turn to formalstatistical measures of spatial autocorrelation.
Global Spatial Autocorrelation
We begin with a simple case where the variable under consideration is binary.This is useful to unpack the logic of spatial autocorrelation tests. So even thoughour attribute is a continuously valued one, we will convert it to a binary caseto illustrate the key concepts:
Binary Case
We have 22 neighborhoods with list prices above the median and the remainder below themedian (recall the issue with ties).
The spatial distribution of the binary variable immediately raises questionsabout the juxtaposition of the “black” and “white” areas.
Join counts
One way to formalize a test for spatial autocorrelation in a binary attribute isto consider the so-called joins. A join exists for each neighbor pair ofobservations, and the joins are reflected in our binary spatial weights objectwq
.
Each unit can take on one of two values “Black” or “White”, and so for a givenpair of neighboring locations there are three different types of joins that canarise:
- Black Black (BB)
- White White (WW)
- Black White (or White Black) (BW)
Given that we have 68 Black polygons on our map, what is the number of BlackBlack (BB) joins we could expect if the process were such that the Blackpolygons were randomly assigned on the map? This is the logic of join count statistics.
We can use the esda
package from PySAL to carry out join count analysis:
Raster Local Moran P Value Proposition
The resulting object stores the observed counts for the different types of joins:
Note that the three cases exhaust all possibilities:
and
which is the unique number of joins in the spatial weights object.
Our object tells us we have observed 44 BB joins:
The critical question for us, is whether this is a departure from what we wouldexpect if the process generating the spatial distribution of the Black polygonswere a completely random one? To answer this, PySAL uses random spatialpermutations of the observed attribute values to generate a realization underthe null of complete spatial randomness (CSR). This is repeated a large numberof times (999 default) to construct a reference distribution to evaluate thestatistical significance of our observed counts.
The average number of BB joins from the synthetic realizations is:
which is less than our observed count. The question is whether our observedvalue is so different from the expectation that we would reject the null of CSR?
The density portrays the distribution of the BB counts, with the black verticalline indicating the mean BB count from the synthetic realizations and the redline the observed BB count for our prices. Clearly our observed value isextremely high. A pseudo p-value summarizes this:
Since this is below conventional significance levels, we would reject the nullof complete spatial randomness in favor of spatial autocorrelation in market prices.
Continuous Case
The join count analysis is based on a binary attribute, which can cover manyinteresting empirical applications where one is interested in presence andabsence type phenomena. In our case, we artificially created the binary variable,and in the process we throw away a lot of information in our originallycontinuous attribute. Turning back to the original variable, we can exploreother tests for spatial autocorrelation for the continuous case.
First, we transform our weights to be row-standardized, from the current binary state:
Raster Local Moran P Values
Moran’s I is a test for global autocorrelation for a continuous attribute:
Again, our value for the statistic needs to be interpreted against a referencedistribution under the null of CSR. PySAL uses a similar approach as we saw inthe join count analysis: random spatial permutations.
Here our observed value is again in the upper tail, and thus statistically significant:
Local Autocorrelation: Hot Spots, Cold Spots, and Spatial Outliers
Raster Local Moran P Value
In addition to the Global autocorrelation statistics, PySAL has many localautocorrelation statistics. Let’s compute a local Moran statistic for the samed
Now, instead of a single $I$ statistic, we have an array of local $I_i$statistics, stored in the .Is
attribute, and p-values from the simulation arein p_sim
.
We can again test for local clustering using permutations, but here we useconditional random permutations (different distributions for each focal location)
Raster Local Moran P Value Calculator
We can distinguish the specific type of local spatial association reflected inthe four quadrants of the Moran Scatterplot above: