Take average of previous weeks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Take average of previous weeks

Miluji Sb
Dear all,

I have weekly data by city (variable citycode). I would like to take the
average of the previous two, three, four weeks (without the current week)
of the variable called value.

This is what I have tried to compute the average of the two previous weeks;

df = df %>%
  mutate(value.lag1 = lag(value, n = 1)) %>%
  mutate(value .2.previous = rollapply(data = value.lag1,
                                     width = 2,
                                     FUN = mean,
                                     align = "right",
                                     fill = NA,
                                     na.rm = T))

I crated the lag of the variable first and then attempted to compute the
average but this does not seem to to what I want. What I am doing wrong?
Any help will be appreciated. The data is below. Thank you.

Sincerely,

Milu

dput(droplevels(head(df, 10)))
structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date =
structure(c(1L,
2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10",
"1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14",
"1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"),
    value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995,
    -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year",
"citycode", "month", "week", "date", "tmin"), row.names = c(NA,
10L), class = "data.frame")

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Take average of previous weeks

Gabor Grothendieck
There is no  `value` column in the `dput` output shown in the
question so using `tmin` instead note that the `width=` argument
of `rollapply` can be a list containing a vector of offsets (-1 is prior
value, -2 is value before that, etc.) and that we can use `rollapplyr`
with an `r` on the end to get right alignment.  See `?rollapply`

  library(dplyr)
  library(zoo)

  roll <- function(x, k) rollapplyr(x, list(-seq(1:k)), mean, fill = NA)
  df %>%
      group_by(citycode) %>%
      mutate(mean2 = roll(tmin, 2), mean3 = roll(tmin, 3), mean4 =
roll(tmin, 4)) %>%
      ungroup

(The code above has been indented 2 spaces so you can
identify inadvertent line wrapping by the email system.)


On Sun, Mar 25, 2018 at 10:48 AM, Miluji Sb <[hidden email]> wrote:

> Dear all,
>
> I have weekly data by city (variable citycode). I would like to take the
> average of the previous two, three, four weeks (without the current week)
> of the variable called value.
>
> This is what I have tried to compute the average of the two previous weeks;
>
> df = df %>%
>   mutate(value.lag1 = lag(value, n = 1)) %>%
>   mutate(value .2.previous = rollapply(data = value.lag1,
>                                      width = 2,
>                                      FUN = mean,
>                                      align = "right",
>                                      fill = NA,
>                                      na.rm = T))
>
> I crated the lag of the variable first and then attempted to compute the
> average but this does not seem to to what I want. What I am doing wrong?
> Any help will be appreciated. The data is below. Thank you.
>
> Sincerely,
>
> Milu
>
> dput(droplevels(head(df, 10)))
> structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
> 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
> 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date =
> structure(c(1L,
> 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10",
> "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14",
> "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"),
>     value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995,
>     -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year",
> "citycode", "month", "week", "date", "tmin"), row.names = c(NA,
> 10L), class = "data.frame")
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Take average of previous weeks

Bert Gunter-2
In reply to this post by Miluji Sb
I am sure that this sort of thing has been asked and answered before,
so in case my suggestions don't work for you, just search the archives
a bit more.
I am also sure that it can be handled directly by numerous functions
in numerous packages, e.g. via time series methods or by calculating
running means of suitably shifted series.

However, as it seems to be a straightforward task, I'll provide what I
think is a simple solution in base R. Adjust to your situation.

## First I need a little utility function to offset rows. Lots of ways
to do this,many nicer than this I'm sure.

> shift <- function(x,k)
+    ## x is a vector of values -- e.g. of a column in your df
+ {
+    sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i)))
+ }
>
>
> ## Testit
> x <- c(1,3,5,7,8:11)
> m <- shift(x,3) ## matrix of prior values up to lag 3
> m ## note rows have been omitted where lags don't exist
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]    1   NA   NA
[3,]    3    1   NA
[4,]    5    3    1
[5,]    7    5    3
[6,]    8    7    5
[7,]    9    8    7
[8,]   10    9    8
> rowMeans(m) ## means of previous 3
[1]       NA       NA       NA 3.000000 5.000000 6.666667 8.000000 9.000000
> rowMeans(m[,1:2]) ## means of previous 2
[1]  NA  NA 2.0 4.0 6.0 7.5 8.5 9.5


Cheers,
Bert






Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <[hidden email]> wrote:

> Dear all,
>
> I have weekly data by city (variable citycode). I would like to take the
> average of the previous two, three, four weeks (without the current week)
> of the variable called value.
>
> This is what I have tried to compute the average of the two previous weeks;
>
> df = df %>%
>   mutate(value.lag1 = lag(value, n = 1)) %>%
>   mutate(value .2.previous = rollapply(data = value.lag1,
>                                      width = 2,
>                                      FUN = mean,
>                                      align = "right",
>                                      fill = NA,
>                                      na.rm = T))
>
> I crated the lag of the variable first and then attempted to compute the
> average but this does not seem to to what I want. What I am doing wrong?
> Any help will be appreciated. The data is below. Thank you.
>
> Sincerely,
>
> Milu
>
> dput(droplevels(head(df, 10)))
> structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
> 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
> 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date =
> structure(c(1L,
> 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10",
> "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14",
> "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"),
>     value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995,
>     -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year",
> "citycode", "month", "week", "date", "tmin"), row.names = c(NA,
> 10L), class = "data.frame")
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Take average of previous weeks

Miluji Sb
Dear Bert,

Thank you very much.This works. I was wondering if the fact that I want to
create new variables (sorry for not stating that fact) makes any
difference? Thank you again.

Sincerely,

Milu

On Sun, Mar 25, 2018 at 10:05 PM, Bert Gunter <[hidden email]>
wrote:

> I am sure that this sort of thing has been asked and answered before,
> so in case my suggestions don't work for you, just search the archives
> a bit more.
> I am also sure that it can be handled directly by numerous functions
> in numerous packages, e.g. via time series methods or by calculating
> running means of suitably shifted series.
>
> However, as it seems to be a straightforward task, I'll provide what I
> think is a simple solution in base R. Adjust to your situation.
>
> ## First I need a little utility function to offset rows. Lots of ways
> to do this,many nicer than this I'm sure.
>
> > shift <- function(x,k)
> +    ## x is a vector of values -- e.g. of a column in your df
> + {
> +    sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i)))
> + }
> >
> >
> > ## Testit
> > x <- c(1,3,5,7,8:11)
> > m <- shift(x,3) ## matrix of prior values up to lag 3
> > m ## note rows have been omitted where lags don't exist
>      [,1] [,2] [,3]
> [1,]   NA   NA   NA
> [2,]    1   NA   NA
> [3,]    3    1   NA
> [4,]    5    3    1
> [5,]    7    5    3
> [6,]    8    7    5
> [7,]    9    8    7
> [8,]   10    9    8
> > rowMeans(m) ## means of previous 3
> [1]       NA       NA       NA 3.000000 5.000000 6.666667 8.000000 9.000000
> > rowMeans(m[,1:2]) ## means of previous 2
> [1]  NA  NA 2.0 4.0 6.0 7.5 8.5 9.5
>
>
> Cheers,
> Bert
>
>
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <[hidden email]> wrote:
> > Dear all,
> >
> > I have weekly data by city (variable citycode). I would like to take the
> > average of the previous two, three, four weeks (without the current week)
> > of the variable called value.
> >
> > This is what I have tried to compute the average of the two previous
> weeks;
> >
> > df = df %>%
> >   mutate(value.lag1 = lag(value, n = 1)) %>%
> >   mutate(value .2.previous = rollapply(data = value.lag1,
> >                                      width = 2,
> >                                      FUN = mean,
> >                                      align = "right",
> >                                      fill = NA,
> >                                      na.rm = T))
> >
> > I crated the lag of the variable first and then attempted to compute the
> > average but this does not seem to to what I want. What I am doing wrong?
> > Any help will be appreciated. The data is below. Thank you.
> >
> > Sincerely,
> >
> > Milu
> >
> > dput(droplevels(head(df, 10)))
> > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
> > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L,
> > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
> > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date =
> > structure(c(1L,
> > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10",
> > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14",
> > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"),
> >     value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995,
> >     -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year",
> > "citycode", "month", "week", "date", "tmin"), row.names = c(NA,
> > 10L), class = "data.frame")
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Take average of previous weeks

Bert Gunter-2
The row means **are** new variables and can be put wherever you like.
But all columns in a data frame **must** have the same number of rows,
so you'll have to fill in missing values as appropriate. That's where
you need to "adjust as necessary to your needs."

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Mar 26, 2018 at 6:22 AM, Miluji Sb <[hidden email]> wrote:

> Dear Bert,
>
> Thank you very much.This works. I was wondering if the fact that I want to
> create new variables (sorry for not stating that fact) makes any difference?
> Thank you again.
>
> Sincerely,
>
> Milu
>
> On Sun, Mar 25, 2018 at 10:05 PM, Bert Gunter <[hidden email]>
> wrote:
>>
>> I am sure that this sort of thing has been asked and answered before,
>> so in case my suggestions don't work for you, just search the archives
>> a bit more.
>> I am also sure that it can be handled directly by numerous functions
>> in numerous packages, e.g. via time series methods or by calculating
>> running means of suitably shifted series.
>>
>> However, as it seems to be a straightforward task, I'll provide what I
>> think is a simple solution in base R. Adjust to your situation.
>>
>> ## First I need a little utility function to offset rows. Lots of ways
>> to do this,many nicer than this I'm sure.
>>
>> > shift <- function(x,k)
>> +    ## x is a vector of values -- e.g. of a column in your df
>> + {
>> +    sapply(seq_len(k),function(i)c(rep(NA,i),head(x,-i)))
>> + }
>> >
>> >
>> > ## Testit
>> > x <- c(1,3,5,7,8:11)
>> > m <- shift(x,3) ## matrix of prior values up to lag 3
>> > m ## note rows have been omitted where lags don't exist
>>      [,1] [,2] [,3]
>> [1,]   NA   NA   NA
>> [2,]    1   NA   NA
>> [3,]    3    1   NA
>> [4,]    5    3    1
>> [5,]    7    5    3
>> [6,]    8    7    5
>> [7,]    9    8    7
>> [8,]   10    9    8
>> > rowMeans(m) ## means of previous 3
>> [1]       NA       NA       NA 3.000000 5.000000 6.666667 8.000000
>> 9.000000
>> > rowMeans(m[,1:2]) ## means of previous 2
>> [1]  NA  NA 2.0 4.0 6.0 7.5 8.5 9.5
>>
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Mar 25, 2018 at 7:48 AM, Miluji Sb <[hidden email]> wrote:
>> > Dear all,
>> >
>> > I have weekly data by city (variable citycode). I would like to take the
>> > average of the previous two, three, four weeks (without the current
>> > week)
>> > of the variable called value.
>> >
>> > This is what I have tried to compute the average of the two previous
>> > weeks;
>> >
>> > df = df %>%
>> >   mutate(value.lag1 = lag(value, n = 1)) %>%
>> >   mutate(value .2.previous = rollapply(data = value.lag1,
>> >                                      width = 2,
>> >                                      FUN = mean,
>> >                                      align = "right",
>> >                                      fill = NA,
>> >                                      na.rm = T))
>> >
>> > I crated the lag of the variable first and then attempted to compute the
>> > average but this does not seem to to what I want. What I am doing wrong?
>> > Any help will be appreciated. The data is below. Thank you.
>> >
>> > Sincerely,
>> >
>> > Milu
>> >
>> > dput(droplevels(head(df, 10)))
>> > structure(list(year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
>> > 1970L, 1970L, 1970L, 1970L), citycode = c(1L, 1L, 1L, 1L, 1L,
>> > 1L, 1L, 1L, 1L, 1L), month = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
>> > 2L, 3L), week = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), date =
>> > structure(c(1L,
>> > 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L), .Label = c("1970-01-10",
>> > "1970-01-17", "1970-01-24", "1970-01-31", "1970-02-07", "1970-02-14",
>> > "1970-02-21", "1970-02-28", "1970-03-07"), class = "factor"),
>> >     value = c(-15.035, -20.478, -22.245, -23.576, -8.84099999999995,
>> >     -18.497, -13.892, -18.974, -15.919, -13.576)), .Names = c("year",
>> > "citycode", "month", "week", "date", "tmin"), row.names = c(NA,
>> > 10L), class = "data.frame")
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.