I am attempting to explore the scale of spatial autocorrelation in a raster (eventually across a stack of 10 but for now a single layer) and consequently in a potential sample of points across the landscape (ie. if we wanted to know what sampling design in terms of distance would minimize autocorrelation). I’ve spent a couple of days trying to understand the various ways to evaluate spatial autocorrelation for a raster or points dataset but am struggling with a few questions. I hope someone can kindly shed some light on the following (in my example I’m playing with a single WorldClim layer at a resolution of 1 km, cropped to the eastern third of the USA):
1) In spdep, I’ve done the following (with my [potentially erroneous] thinking laid out in the comments and two questions at the end):
### use the raster package to get a regular sample of points across the raster### because using the full set of cells or their centroids on a large raster seems to crash R downstream
### tidy point dataset### in particular, missing values (e.g. the ocean) in the raster and thus in the points lead to errors later, so remove these
### make nb object (provides list of nearest neighbours for lower lag class)### here I’ve chosen k=8 which I’m assuming given the regular sampling of points is almost akin to the “queens” design in the raster-specific cell2nb command (except for cells near the ocean)
### make correlogram
Two questions here:
a) I’ve been able to successfully set the order to 15 but not 20 before there are empty neighbour sets found for this particular dataset. Is there a way, other than by trial and error to tell the maximum order possible?
b) After plotting the correlogram, I get the Moran’s I as a function of lag distance. I see it crosses the 0 line between lags 13 and 14 …is there a way to tell what distance this amounts to in kms?
2) Using the pgirmess package (which I understand to be calculating the lags in a fundamentally different way) I can get a correlogram with distances…
### so now I reproject the raster to albers equal area in order to have the units on the x axis be metres (and actually the projection I want to use in the end anyways) ### the rest of the steps to create dd2 are the same as above### use the correlog function to create correlogram
pgi.cor <- correlog(coords=dd2[,1:2], z=dd2$V3, method="Moran", nbclass=20)plot(pgi.cor)
a) In my new plot, the distance class at which Moran’s I is no longer significantly different from zero is around 600 km. That seems really far to me…am I wrong in my interpretation that this distance represents the distance beyond which sample sites would be are relatively free from autocorrelation? or is this truly representative of the scale of autocorrelation that I can expect in climate data over the relatively modest topographic complexity of the eastern USA?
b) In general, when/ for what types of questions or datasets is the approach used by spdep to generate the lag steps more appropriate than the (fixed bins?) method of pgirmess?
Please forgive me if I’m approaching this problem incorrectly altogether! I’m eventually hoping to say something along the lines of “if we take sites x distance apart, we can be fairly sure that the amount of spatial autocorrelation in our climate data will be minimal”. But maybe this is completely ridiculous? I’d be really happy to have some suggestions. (and on a side note, I’m currently looking for a good introduction to spatial statistics course or textbook…something for the truly uninitiated. Any recommendations?)
PS. Online sources for some of the code above:
Re: evaluating spatial autocorrelation in a raster
Julie Lee-Yaw <julleeyaw <at> yahoo.ca> writes:
> I am attempting to explore the scale of spatial autocorrelation in a
raster (eventually across a stack of 10
> but for now a single layer) and consequently in a potential sample of
points across the landscape (ie. if we
> wanted to know what sampling design in terms of distance would minimize
This could with advantage been posted on R-sig-geo rather than on R-help.
The main problem will be in the raster resolution, as this will most likely
not match the "natural" resolution of the phenomena of interest, thus
spuriously generating apparent spatial autocorrelation. It may be hard to
choose an appropriate sampling design.
> Please forgive me if I’m approaching this problem incorrectly altogether!
I’m eventually hoping to
> say something along the lines of “if we take sites x distance apart, we
can be fairly sure that the amount
> of spatial autocorrelation in our climate data will be minimal”. But maybe
this is completely
> ridiculous? I’d be really happy to have some suggestions. (and on a side
note, I’m currently
> looking for a good introduction to spatial statistics course or
textbook…something for the truly
> uninitiated. Any recommendations?)
Look for work by Werner Mueller on sampling design, and work citing his work.