

Hi R users,
I have a question about data processing. I have such a dataset, while each
black grid cell has a few attributes and the corresponding attribute
values. The latitude and longitude of the center of each grid cell are
given also.
Then I want to average the attribute values from four adjacent grid cells
to get the average value for the center of each red grid cell. Thus, there
are the same number of attributes, but different values. The red grid cells
do not overlap. I was thinking to write such a script that can ID each
black grid cell, for example, 1, 2, 3, 4, ..., then the corresponding four
grid cells will be used to average for the red grid cell. But I just have
the latitude and longitude, attribute values for the black cells, and also
latitude and longitude for the red cells, how to write such a script in R.
Could anyone give me suggestion about the work flow? Thanks very much.
I attached the picture of the grid cells here.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi lily,
There are one or two assumptions to be made here. First is that the
latitude and longitude values of the "black" cells are equally spaced
as in your illustration. Second, that all latitude and longitude
values for the "red" cells fall at the corners of four "black" cells.
You can get the four "black" cells by finding the lat/lon values that
are closest to the "red" lat/lon values. Here's a basic example:
lat<rep(28:38,11)
lon<rep(98:108,each=11)
pop<sample(80:200,121)
blackcells<list()
for(i in 1:121) blackcells[[i]]<list(lat=lat[i],lon=lon[i],pop=pop[i])
redcell<list(lat=33.5,lon=100.5,pop=NA)
close4<rep(NA,4)
closen<1
for(i in 1:121) {
if(abs(blackcells[[i]]$latredcell$lat) < 1 &&
abs(blackcells[[i]]$lonredcell$lon) < 1) {
close4[closen]<i
closen<closen+1
}
}
cat(close4,"\n")
redcell$pop<(blackcells[[close4[1]]]$pop +
blackcells[[close4[2]]]$pop + blackcells[[close4[3]]]$pop +
blackcells[[close4[4]]]$pop)/4
print(blackcells[[close4[1]]])
print(blackcells[[close4[2]]])
print(blackcells[[close4[3]]])
print(blackcells[[close4[4]]])
print(redcell)
As you can see, this has picked out the four "black" cells closest to
the "red" cell's coordinates and calculated the mean.
Jim
On Wed, May 16, 2018 at 2:23 PM, lily li < [hidden email]> wrote:
> Hi R users,
>
> I have a question about data processing. I have such a dataset, while each
> black grid cell has a few attributes and the corresponding attribute
> values. The latitude and longitude of the center of each grid cell are
> given also.
>
> Then I want to average the attribute values from four adjacent grid cells
> to get the average value for the center of each red grid cell. Thus, there
> are the same number of attributes, but different values. The red grid cells
> do not overlap. I was thinking to write such a script that can ID each
> black grid cell, for example, 1, 2, 3, 4, ..., then the corresponding four
> grid cells will be used to average for the red grid cell. But I just have
> the latitude and longitude, attribute values for the black cells, and also
> latitude and longitude for the red cells, how to write such a script in R.
> Could anyone give me suggestion about the work flow? Thanks very much.
>
> I attached the picture of the grid cells here.
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Jim,
Thanks. Yes, the two assumptions are correct, and they reflect the
datasets. I have an uncertainty about the code below. Why do you use
abs(blackcells[[i]]$lat  redcell$lat) <1 rather than a different number
than 1? Second, why to construct blackcells as a list, rather than a
dataframe. Because in a dataframe, each row can represent one grid cell,
while the three columns can represent the lati, lon, and pop. Thanks again
for your help.
for(i in 1:121) {
if(abs(blackcells[[i]]$latredcell$lat) < 1 &&
abs(blackcells[[i]]$lonredcell$lon) < 1) {
close4[closen]<i
closen<closen+1
}
}
On Wed, May 16, 2018 at 2:45 AM, Jim Lemon < [hidden email]> wrote:
> Hi lily,
> There are one or two assumptions to be made here. First is that the
> latitude and longitude values of the "black" cells are equally spaced
> as in your illustration. Second, that all latitude and longitude
> values for the "red" cells fall at the corners of four "black" cells.
>
> You can get the four "black" cells by finding the lat/lon values that
> are closest to the "red" lat/lon values. Here's a basic example:
>
> lat<rep(28:38,11)
> lon<rep(98:108,each=11)
> pop<sample(80:200,121)
> blackcells<list()
> for(i in 1:121) blackcells[[i]]<list(lat=lat[i],lon=lon[i],pop=pop[i])
> redcell<list(lat=33.5,lon=100.5,pop=NA)
> close4<rep(NA,4)
> closen<1
> for(i in 1:121) {
> if(abs(blackcells[[i]]$latredcell$lat) < 1 &&
> abs(blackcells[[i]]$lonredcell$lon) < 1) {
> close4[closen]<i
> closen<closen+1
> }
> }
> cat(close4,"\n")
> redcell$pop<(blackcells[[close4[1]]]$pop +
> blackcells[[close4[2]]]$pop + blackcells[[close4[3]]]$pop +
> blackcells[[close4[4]]]$pop)/4
> print(blackcells[[close4[1]]])
> print(blackcells[[close4[2]]])
> print(blackcells[[close4[3]]])
> print(blackcells[[close4[4]]])
> print(redcell)
>
> As you can see, this has picked out the four "black" cells closest to
> the "red" cell's coordinates and calculated the mean.
>
> Jim
>
> On Wed, May 16, 2018 at 2:23 PM, lily li < [hidden email]> wrote:
> > Hi R users,
> >
> > I have a question about data processing. I have such a dataset, while
> each
> > black grid cell has a few attributes and the corresponding attribute
> > values. The latitude and longitude of the center of each grid cell are
> > given also.
> >
> > Then I want to average the attribute values from four adjacent grid cells
> > to get the average value for the center of each red grid cell. Thus,
> there
> > are the same number of attributes, but different values. The red grid
> cells
> > do not overlap. I was thinking to write such a script that can ID each
> > black grid cell, for example, 1, 2, 3, 4, ..., then the corresponding
> four
> > grid cells will be used to average for the red grid cell. But I just have
> > the latitude and longitude, attribute values for the black cells, and
> also
> > latitude and longitude for the red cells, how to write such a script in
> R.
> > Could anyone give me suggestion about the work flow? Thanks very much.
> >
> > I attached the picture of the grid cells here.
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/> postingguide.html
> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi lily,
You could also create "blackcells" as a dataframe (which is itself a
type of list). I used a list as I thought it would be a more general
solution if there were different numbers of values for different grid
cells. The use of 1 for the comparison was due to the grid increments
being 1. If you had larger or smaller grid increments, you would use
the grid increment size for the comparison. That guarantees that only
the four nearest "black" cells will be identified as within the "red"
cell.
Jim
On Sat, May 19, 2018 at 4:07 AM, lily li < [hidden email]> wrote:
> Hi Jim,
>
> Thanks. Yes, the two assumptions are correct, and they reflect the datasets.
> I have an uncertainty about the code below. Why do you use
> abs(blackcells[[i]]$lat  redcell$lat) <1 rather than a different number
> than 1? Second, why to construct blackcells as a list, rather than a
> dataframe. Because in a dataframe, each row can represent one grid cell,
> while the three columns can represent the lati, lon, and pop. Thanks again
> for your help.
>
> for(i in 1:121) {
> if(abs(blackcells[[i]]$latredcell$lat) < 1 &&
> abs(blackcells[[i]]$lonredcell$lon) < 1) {
> close4[closen]<i
> closen<closen+1
> }
> }
>
> On Wed, May 16, 2018 at 2:45 AM, Jim Lemon < [hidden email]> wrote:
>>
>> Hi lily,
>> There are one or two assumptions to be made here. First is that the
>> latitude and longitude values of the "black" cells are equally spaced
>> as in your illustration. Second, that all latitude and longitude
>> values for the "red" cells fall at the corners of four "black" cells.
>>
>> You can get the four "black" cells by finding the lat/lon values that
>> are closest to the "red" lat/lon values. Here's a basic example:
>>
>> lat<rep(28:38,11)
>> lon<rep(98:108,each=11)
>> pop<sample(80:200,121)
>> blackcells<list()
>> for(i in 1:121) blackcells[[i]]<list(lat=lat[i],lon=lon[i],pop=pop[i])
>> redcell<list(lat=33.5,lon=100.5,pop=NA)
>> close4<rep(NA,4)
>> closen<1
>> for(i in 1:121) {
>> if(abs(blackcells[[i]]$latredcell$lat) < 1 &&
>> abs(blackcells[[i]]$lonredcell$lon) < 1) {
>> close4[closen]<i
>> closen<closen+1
>> }
>> }
>> cat(close4,"\n")
>> redcell$pop<(blackcells[[close4[1]]]$pop +
>> blackcells[[close4[2]]]$pop + blackcells[[close4[3]]]$pop +
>> blackcells[[close4[4]]]$pop)/4
>> print(blackcells[[close4[1]]])
>> print(blackcells[[close4[2]]])
>> print(blackcells[[close4[3]]])
>> print(blackcells[[close4[4]]])
>> print(redcell)
>>
>> As you can see, this has picked out the four "black" cells closest to
>> the "red" cell's coordinates and calculated the mean.
>>
>> Jim
>>
>> On Wed, May 16, 2018 at 2:23 PM, lily li < [hidden email]> wrote:
>> > Hi R users,
>> >
>> > I have a question about data processing. I have such a dataset, while
>> > each
>> > black grid cell has a few attributes and the corresponding attribute
>> > values. The latitude and longitude of the center of each grid cell are
>> > given also.
>> >
>> > Then I want to average the attribute values from four adjacent grid
>> > cells
>> > to get the average value for the center of each red grid cell. Thus,
>> > there
>> > are the same number of attributes, but different values. The red grid
>> > cells
>> > do not overlap. I was thinking to write such a script that can ID each
>> > black grid cell, for example, 1, 2, 3, 4, ..., then the corresponding
>> > four
>> > grid cells will be used to average for the red grid cell. But I just
>> > have
>> > the latitude and longitude, attribute values for the black cells, and
>> > also
>> > latitude and longitude for the red cells, how to write such a script in
>> > R.
>> > Could anyone give me suggestion about the work flow? Thanks very much.
>> >
>> > I attached the picture of the grid cells here.
>> >
>> > ______________________________________________
>> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/rhelp>> > PLEASE do read the posting guide
>> > http://www.Rproject.org/postingguide.html>> > and provide commented, minimal, selfcontained, reproducible code.
>> >
>
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi lily,
It's not too hard to do it using dataframes. Getting the indexing
right is usually that hardest part:
# these values are the centers of the black cells
lat<rep(28:38,11)
lon<rep(98:108,each=11)
pop<sample(80:200,121)
# just use the data.frame function
blackcells<data.frame(lat=lat,lon=lon,pop=pop)
plot(0,type="n",xlim=c(97.5,108.5),ylim=c(27.5,38.5),
xlab="Longitude",ylab="Latitude")
abline(h=27.5)
abline(h=lat+0.5)
abline(v=97.5)
abline(v=lon+0.5)
text(blackcells$lon,blackcells$lat,pop)
# the red cells will be centered on the corners of 4 black cells
lat2<rep(seq(28.5,34.5,by=2),4)
lon2<rep(seq(99.5,105.5,by=2),each=4)
redcells<data.frame(lat=lat2,lon=lon2,value=NA)
display the red cells
rect(lon21,lat21,lon2+1,lat2+1,border="red",lwd=2)
nblackcells<dim(blackcells)[1]
nredcells<dim(redcells)[1]
for(redcell in 1:nredcells) {
close4<rep(NA,4)
closen<1
for(blackcell in 1:nblackcells) {
if(abs(blackcells[blackcell,"lat"]redcells[redcell,"lat"]) < 1 &&
abs(blackcells[blackcell,"lon"]redcells[redcell,"lon"]) < 1) {
close4[closen]<blackcells[blackcell,"pop"]
closen<closen + 1
}
}
cat(close4,"\n")
redcells[redcell,"value"]<sum(close4)/4
}
library(plotrix)
boxed.labels(redcells$lon,redcells$lat,redcells$value,col="red")
Jim
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Jim,
Thanks. It works. I now have more complex problems. If at each blackcell,
there are two variables such as pop and mood. For each variable, there are
daily records in one year, so 365 records for pop and 365 records for mood.
The averaged values for the redcells should be daily records too. What kind
of format do you recommend for this problem? Right now, I just get the
latitudes and longitudes into a dataframe. Thanks.
On Sun, May 20, 2018 at 3:52 AM, Jim Lemon < [hidden email]> wrote:
> Hi lily,
> It's not too hard to do it using dataframes. Getting the indexing
> right is usually that hardest part:
>
> # these values are the centers of the black cells
> lat<rep(28:38,11)
> lon<rep(98:108,each=11)
> pop<sample(80:200,121)
> # just use the data.frame function
> blackcells<data.frame(lat=lat,lon=lon,pop=pop)
> plot(0,type="n",xlim=c(97.5,108.5),ylim=c(27.5,38.5),
> xlab="Longitude",ylab="Latitude")
> abline(h=27.5)
> abline(h=lat+0.5)
> abline(v=97.5)
> abline(v=lon+0.5)
> text(blackcells$lon,blackcells$lat,pop)
> # the red cells will be centered on the corners of 4 black cells
> lat2<rep(seq(28.5,34.5,by=2),4)
> lon2<rep(seq(99.5,105.5,by=2),each=4)
> redcells<data.frame(lat=lat2,lon=lon2,value=NA)
> display the red cells
> rect(lon21,lat21,lon2+1,lat2+1,border="red",lwd=2)
> nblackcells<dim(blackcells)[1]
> nredcells<dim(redcells)[1]
> for(redcell in 1:nredcells) {
> close4<rep(NA,4)
> closen<1
> for(blackcell in 1:nblackcells) {
> if(abs(blackcells[blackcell,"lat"]redcells[redcell,"lat"]) < 1 &&
> abs(blackcells[blackcell,"lon"]redcells[redcell,"lon"]) < 1) {
> close4[closen]<blackcells[blackcell,"pop"]
> closen<closen + 1
> }
> }
> cat(close4,"\n")
> redcells[redcell,"value"]<sum(close4)/4
> }
> library(plotrix)
> boxed.labels(redcells$lon,redcells$lat,redcells$value,col="red")
>
> Jim
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi lili,
You can extend it like this. I checked this with two values each for
pop and mood, and it looked okay. Obviously I didn't check the result
with 365 values for each, but it ran okay.
# these values are the centers of the black cells
lat<rep(28:38,11)
lon<rep(98:108,each=11)
pop<matrix(sample(80:200,44165,replace=TRUE),ncol=365)
colnames(pop)<paste0("pop",1:365)
mood<matrix(sample(1:10,44165,replace=TRUE),ncol=365)
colnames(mood)<c("mood",1:365)
# create the data frame for the black cells
blackcells<cbind(data.frame(lat=lat,lon=lon),pop,mood)
plot(0,type="n",xlim=c(97.5,108.5),ylim=c(27.5,38.5),
xlab="Longitude",ylab="Latitude")
abline(h=27.5)
abline(h=lat+0.5)
abline(v=97.5)
abline(v=lon+0.5)
# the red cells will be centered on the corners of 4 black cells
lat2<rep(seq(28.5,34.5,by=2),4)
lon2<rep(seq(99.5,105.5,by=2),each=4)
popmat<matrix(NA,nrow=16,ncol=365)
colnames(popmat)<paste0("pop",1:365)
moodmat<matrix(NA,nrow=16,ncol=365)
colnames(moodmat)<paste0("mood",1:365)
redcells<cbind(data.frame(lat=lat2,lon=lon2),popmat,moodmat)
#display the red cells
rect(lon21,lat21,lon2+1,lat2+1,border="red",lwd=2)
nblackcells<dim(blackcells)[1]
nredcells<dim(redcells)[1]
for(redcell in 1:nredcells) {
close4<rep(NA,4)
closen<1
for(blackcell in 1:nblackcells) {
if(abs(blackcells[blackcell,"lat"]redcells[redcell,"lat"]) < 1 &&
abs(blackcells[blackcell,"lon"]redcells[redcell,"lon"]) < 1) {
close4[closen]<blackcell
closen<closen + 1
}
}
redcells[redcell,3:730]<colMeans(blackcells[close4,3:730])
}
Jim
On Tue, May 22, 2018 at 1:37 PM, lily li < [hidden email]> wrote:
> Hi Jim,
>
> Thanks. It works. I now have more complex problems. If at each blackcell,
> there are two variables such as pop and mood. For each variable, there are
> daily records in one year, so 365 records for pop and 365 records for mood.
> The averaged values for the redcells should be daily records too. What kind
> of format do you recommend for this problem? Right now, I just get the
> latitudes and longitudes into a dataframe. Thanks.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

