Dear all,
I have weekly data by city (variable citycode). I would like to take the average of the previous two, three, four weeks (without the current week) of the variable called value. This is what I have tried to compute the average of the two previous weeks; df = df %>% mutate(value.lag1 = lag(value, n = 1)) %>% mutate(value .2.previous = rollapply(data = value.lag1, width = 2, FUN = mean, align = "right", fill = NA, na.rm = T)) I crated the lag of the variable first and then attempted to compute the average but this does not seem to to what I want. What I am doing wrong? Any help will be appreciated. The data is below. Thank you. Sincerely, Milu dput(droplevels(head(df, 10))) structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date = structure(c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", "citycode", "month", "week", "date", "tmin"), row.names = c(NA, 10L), class = "data.frame") [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
There is no `value` column in the `dput` output shown in the
question so using `tmin` instead note that the `width=` argument of `rollapply` can be a list containing a vector of offsets (-1 is prior value, -2 is value before that, etc.) and that we can use `rollapplyr` with an `r` on the end to get right alignment. See `?rollapply` library(dplyr) library(zoo) roll <- function(x, k) rollapplyr(x, list(-seq(1:k)), mean, fill = NA) df %>% group_by(citycode) %>% mutate(mean2 = roll(tmin, 2), mean3 = roll(tmin, 3), mean4 = roll(tmin, 4)) %>% ungroup (The code above has been indented 2 spaces so you can identify inadvertent line wrapping by the email system.) On Sun, Mar 25, 2018 at 10:48 AM, Miluji Sb <[hidden email]> wrote: > Dear all, > > I have weekly data by city (variable citycode). I would like to take the > average of the previous two, three, four weeks (without the current week) > of the variable called value. > > This is what I have tried to compute the average of the two previous weeks; > > df = df %>% > mutate(value.lag1 = lag(value, n = 1)) %>% > mutate(value .2.previous = rollapply(data = value.lag1, > width = 2, > FUN = mean, > align = "right", > fill = NA, > na.rm = T)) > > I crated the lag of the variable first and then attempted to compute the > average but this does not seem to to what I want. What I am doing wrong? > Any help will be appreciated. The data is below. Thank you. > > Sincerely, > > Milu > > dput(droplevels(head(df, 10))) > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date = > structure(c(1L, > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, > 10L), class = "data.frame") > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Miluji Sb
I am sure that this sort of thing has been asked and answered before,
so in case my suggestions don't work for you, just search the archives a bit more. I am also sure that it can be handled directly by numerous functions in numerous packages, e.g. via time series methods or by calculating running means of suitably shifted series. However, as it seems to be a straightforward task, I'll provide what I think is a simple solution in base R. Adjust to your situation. ## First I need a little utility function to offset rows. Lots of ways to do this,many nicer than this I'm sure. > shift <- function(x,k) + ## x is a vector of values -- e.g. of a column in your df + { + sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i))) + } > > > ## Testit > x <- c(1,3,5,7,8:11) > m <- shift(x,3) ## matrix of prior values up to lag 3 > m ## note rows have been omitted where lags don't exist [,1] [,2] [,3] [1,] NA NA NA [2,] 1 NA NA [3,] 3 1 NA [4,] 5 3 1 [5,] 7 5 3 [6,] 8 7 5 [7,] 9 8 7 [8,] 10 9 8 > rowMeans(m) ## means of previous 3 [1] NA NA NA 3.000000 5.000000 6.666667 8.000000 9.000000 > rowMeans(m[,1:2]) ## means of previous 2 [1] NA NA 2.0 4.0 6.0 7.5 8.5 9.5 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <[hidden email]> wrote: > Dear all, > > I have weekly data by city (variable citycode). I would like to take the > average of the previous two, three, four weeks (without the current week) > of the variable called value. > > This is what I have tried to compute the average of the two previous weeks; > > df = df %>% > mutate(value.lag1 = lag(value, n = 1)) %>% > mutate(value .2.previous = rollapply(data = value.lag1, > width = 2, > FUN = mean, > align = "right", > fill = NA, > na.rm = T)) > > I crated the lag of the variable first and then attempted to compute the > average but this does not seem to to what I want. What I am doing wrong? > Any help will be appreciated. The data is below. Thank you. > > Sincerely, > > Milu > > dput(droplevels(head(df, 10))) > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date = > structure(c(1L, > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, > 10L), class = "data.frame") > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Bert,
Thank you very much.This works. I was wondering if the fact that I want to create new variables (sorry for not stating that fact) makes any difference? Thank you again. Sincerely, Milu On Sun, Mar 25, 2018 at 10:05 PM, Bert Gunter <[hidden email]> wrote: > I am sure that this sort of thing has been asked and answered before, > so in case my suggestions don't work for you, just search the archives > a bit more. > I am also sure that it can be handled directly by numerous functions > in numerous packages, e.g. via time series methods or by calculating > running means of suitably shifted series. > > However, as it seems to be a straightforward task, I'll provide what I > think is a simple solution in base R. Adjust to your situation. > > ## First I need a little utility function to offset rows. Lots of ways > to do this,many nicer than this I'm sure. > > > shift <- function(x,k) > + ## x is a vector of values -- e.g. of a column in your df > + { > + sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i))) > + } > > > > > > ## Testit > > x <- c(1,3,5,7,8:11) > > m <- shift(x,3) ## matrix of prior values up to lag 3 > > m ## note rows have been omitted where lags don't exist > [,1] [,2] [,3] > [1,] NA NA NA > [2,] 1 NA NA > [3,] 3 1 NA > [4,] 5 3 1 > [5,] 7 5 3 > [6,] 8 7 5 > [7,] 9 8 7 > [8,] 10 9 8 > > rowMeans(m) ## means of previous 3 > [1] NA NA NA 3.000000 5.000000 6.666667 8.000000 9.000000 > > rowMeans(m[,1:2]) ## means of previous 2 > [1] NA NA 2.0 4.0 6.0 7.5 8.5 9.5 > > > Cheers, > Bert > > > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <[hidden email]> wrote: > > Dear all, > > > > I have weekly data by city (variable citycode). I would like to take the > > average of the previous two, three, four weeks (without the current week) > > of the variable called value. > > > > This is what I have tried to compute the average of the two previous > weeks; > > > > df = df %>% > > mutate(value.lag1 = lag(value, n = 1)) %>% > > mutate(value .2.previous = rollapply(data = value.lag1, > > width = 2, > > FUN = mean, > > align = "right", > > fill = NA, > > na.rm = T)) > > > > I crated the lag of the variable first and then attempted to compute the > > average but this does not seem to to what I want. What I am doing wrong? > > Any help will be appreciated. The data is below. Thank you. > > > > Sincerely, > > > > Milu > > > > dput(droplevels(head(df, 10))) > > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, > > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, > > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date = > > structure(c(1L, > > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", > > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", > > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), > > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, > > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", > > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, > > 10L), class = "data.frame") > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
The row means **are** new variables and can be put wherever you like.
But all columns in a data frame **must** have the same number of rows, so you'll have to fill in missing values as appropriate. That's where you need to "adjust as necessary to your needs." -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Mar 26, 2018 at 6:22 AM, Miluji Sb <[hidden email]> wrote: > Dear Bert, > > Thank you very much.This works. I was wondering if the fact that I want to > create new variables (sorry for not stating that fact) makes any difference? > Thank you again. > > Sincerely, > > Milu > > On Sun, Mar 25, 2018 at 10:05 PM, Bert Gunter <[hidden email]> > wrote: >> >> I am sure that this sort of thing has been asked and answered before, >> so in case my suggestions don't work for you, just search the archives >> a bit more. >> I am also sure that it can be handled directly by numerous functions >> in numerous packages, e.g. via time series methods or by calculating >> running means of suitably shifted series. >> >> However, as it seems to be a straightforward task, I'll provide what I >> think is a simple solution in base R. Adjust to your situation. >> >> ## First I need a little utility function to offset rows. Lots of ways >> to do this,many nicer than this I'm sure. >> >> > shift <- function(x,k) >> + ## x is a vector of values -- e.g. of a column in your df >> + { >> + sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i))) >> + } >> > >> > >> > ## Testit >> > x <- c(1,3,5,7,8:11) >> > m <- shift(x,3) ## matrix of prior values up to lag 3 >> > m ## note rows have been omitted where lags don't exist >> [,1] [,2] [,3] >> [1,] NA NA NA >> [2,] 1 NA NA >> [3,] 3 1 NA >> [4,] 5 3 1 >> [5,] 7 5 3 >> [6,] 8 7 5 >> [7,] 9 8 7 >> [8,] 10 9 8 >> > rowMeans(m) ## means of previous 3 >> [1] NA NA NA 3.000000 5.000000 6.666667 8.000000 >> 9.000000 >> > rowMeans(m[,1:2]) ## means of previous 2 >> [1] NA NA 2.0 4.0 6.0 7.5 8.5 9.5 >> >> >> Cheers, >> Bert >> >> >> >> >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <[hidden email]> wrote: >> > Dear all, >> > >> > I have weekly data by city (variable citycode). I would like to take the >> > average of the previous two, three, four weeks (without the current >> > week) >> > of the variable called value. >> > >> > This is what I have tried to compute the average of the two previous >> > weeks; >> > >> > df = df %>% >> > mutate(value.lag1 = lag(value, n = 1)) %>% >> > mutate(value .2.previous = rollapply(data = value.lag1, >> > width = 2, >> > FUN = mean, >> > align = "right", >> > fill = NA, >> > na.rm = T)) >> > >> > I crated the lag of the variable first and then attempted to compute the >> > average but this does not seem to to what I want. What I am doing wrong? >> > Any help will be appreciated. The data is below. Thank you. >> > >> > Sincerely, >> > >> > Milu >> > >> > dput(droplevels(head(df, 10))) >> > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, >> > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, >> > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date = >> > structure(c(1L, >> > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10", >> > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14", >> > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"), >> > value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995, >> > -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year", >> > "citycode", "month", "week", "date", "tmin"), row.names = c(NA, >> > 10L), class = "data.frame") >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |