Query about calculating the monthly average of daily data columns

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Query about calculating the monthly average of daily data columns

Subhamitra Patra
Dear R-users,

I have daily data from 03-01-1994 to 29-12-2000. In my datafile, he first
column is date and the second and third columns are the returns of the
country A, and B. Here, the date column is same for both countries. I want
to calculate the monthly average of both country's returns by using a loop,
and then, I want to export the results into excel.

Please help me in this regard.

Please find the attached datasheet.

Thank you.

--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/12/19,
08:23:07 PM
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Bert Gunter-2
No reproducible example, so hard to say. What class is your "date" column?
-- factor, character, Date?  See ?Date
Once you have an object of appropriate class -- see ?format.Date -- ?months
can extract the month and ?ave can do your averaging. No explicit looping
is needed.

The tidydata alternative universe can also do all these things if that's
where you prefer to live.

As usual, any attached data was stripped. See ?dput for one way to include
data in your post.

Cheers,
Bert


On Thu, Sep 12, 2019 at 7:54 AM Subhamitra Patra <[hidden email]>
wrote:

> Dear R-users,
>
> I have daily data from 03-01-1994 to 29-12-2000. In my datafile, he first
> column is date and the second and third columns are the returns of the
> country A, and B. Here, the date column is same for both countries. I want
> to calculate the monthly average of both country's returns by using a loop,
> and then, I want to export the results into excel.
>
> Please help me in this regard.
>
> Please find the attached datasheet.
>
> Thank you.
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> Sender
> notified by
> Mailtrack
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> 09/12/19,
> 08:23:07 PM
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Rui Barradas
In reply to this post by Subhamitra Patra
Hello,

Please include data, say

dput(head(data, 20))  # post the output of this


But, is the problem as simple as

rowMeans(data[2:3], na.rm = TRUE)

?

Hope this helps,

Rui Barradas


Às 15:53 de 12/09/19, Subhamitra Patra escreveu:

> Dear R-users,
>
> I have daily data from 03-01-1994 to 29-12-2000. In my datafile, he first
> column is date and the second and third columns are the returns of the
> country A, and B. Here, the date column is same for both countries. I want
> to calculate the monthly average of both country's returns by using a loop,
> and then, I want to export the results into excel.
>
> Please help me in this regard.
>
> Please find the attached datasheet.
>
> Thank you.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Rui Barradas
Hello,

Inline.

Às 17:33 de 12/09/19, Bert Gunter escreveu:
> But she wants *monthly* averages, Rui.

Thanks, my mistake.

Ergo ave() or tidyData
> equivalent, right?

Maybe. But ave() returns as many values as the input length, this seems
more suited for tapply or aggregate.


I will first create an example data set.

set.seed(1234)
start <- as.Date("03-01-1994", "%d-%m-%Y")
end <- as.Date("29-12-2000", "%d-%m-%Y")
date <- seq(start, end, by = "day")
date <- date[as.integer(format(date, "%u")) %in% 1:5]
df1 <- data.frame(date,
                   CountryA = rnorm(length(date)),
                   CountryB = rnorm(length(date)))


Now the averages by month

month <- zoo::as.yearmon(df1[[1]])
aggA <- aggregate(CountryA ~ month, df1, mean)
aggB <- aggregate(CountryB ~ month, df1, mean)
MonthReturns <- merge(aggA, aggB)
head(MonthReturns)


Final clean up.

rm(date, month, aggA, aggB)


Hope this helps,

Rui Barradas

>
> -- Bert
>
> On Thu, Sep 12, 2019 at 8:41 AM Rui Barradas <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     Please include data, say
>
>     dput(head(data, 20))  # post the output of this
>
>
>     But, is the problem as simple as
>
>     rowMeans(data[2:3], na.rm = TRUE)
>
>     ?
>
>     Hope this helps,
>
>     Rui Barradas
>
>
>     Às 15:53 de 12/09/19, Subhamitra Patra escreveu:
>      > Dear R-users,
>      >
>      > I have daily data from 03-01-1994 to 29-12-2000. In my datafile,
>     he first
>      > column is date and the second and third columns are the returns
>     of the
>      > country A, and B. Here, the date column is same for both
>     countries. I want
>      > to calculate the monthly average of both country's returns by
>     using a loop,
>      > and then, I want to export the results into excel.
>      >
>      > Please help me in this regard.
>      >
>      > Please find the attached datasheet.
>      >
>      > Thank you.
>      >
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Jim Lemon-4
In reply to this post by Subhamitra Patra
Hi Subhamitra,
Your data didn't make it through, so I guess the first thing is to
guess what it looks like. Here's a try at just January and February of
1994 so that we can see the result on the screen. The logic will work
just as well for the whole seven years.

# create fake data for the first two months
spdat<-data.frame(
 dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
 returnA=sample(15:50,58,TRUE),returnB=sample(10:45,58,TRUE))
# I'll assume that the dates in your file are character, not factor
spdat$dates<-as.character(spdat$dates)
# if you only have to get the monthly averages, it can be done this way
spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2)
spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
# get the averages by month and year - is this correct?
monthlyA<-by(spdat$returnA,spdat[,c("month","year")],mean)
monthlyB<-by(spdat$returnB,spdat[,c("month","year")],mean)

Now you have what you say you want:

monthlyA
month: 1
year: 1994
[1] 34.1
------------------------------------------------------------
month: 2
year: 1994
[1] 33.32143

monthlyB
month: 1
year: 1994
[1] 29.7
------------------------------------------------------------
month: 2
year: 1994
[1] 27.28571

Sorry I didn't use a loop (for(month in 1:12) ... for (year in
1994:2000) ...), too lazy.
Now you have to let us know how this information is to be formatted to
go into Excel. Excel will import the text as above, but I think you
want something that you can use for further calculations.

Jim

On Fri, Sep 13, 2019 at 12:54 AM Subhamitra Patra
<[hidden email]> wrote:

>
> Dear R-users,
>
> I have daily data from 03-01-1994 to 29-12-2000. In my datafile, he first
> column is date and the second and third columns are the returns of the
> country A, and B. Here, the date column is same for both countries. I want
> to calculate the monthly average of both country's returns by using a loop,
> and then, I want to export the results into excel.
>
> Please help me in this regard.
>
> Please find the attached datasheet.
>
> Thank you.
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
> 09/12/19,
> 08:23:07 PM
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Subhamitra Patra
Dear Sir,

Thank you very much for your suggestion.

Yes, your suggested code worked. But, actually, I have data from 3rd
January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
sample. From this, I have given the example of the years up to 2000. Before
applying the same code for the long 24 years, I want to learn the logic
behind the code. Actually, some part of the code is not understandable to
me which I mentioned in the bold letter as follows.

"spdat<-data.frame(
  dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
  returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"

A. Here, I need to define the no. of days in a month, and the no. of
countries name separately, right? But, what is meant by 15:50, and 10:45 in
return A, and B respectively?

"# if you only have to get the monthly averages, it can be done this way
spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"

B. Here, I need to define the no. of months, and years separately, right?
or else what 2, and 3 (in bold) indicates?

"# get the averages by month and year - is this correct?
monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"

C. From this part, I got the exact average values of both January and
February of 1994 for country A, and B. But, in code, I have a query that I
need to define  spdat$returnA, and  spdat$returnB separately before writing
this code, right? Like this, I need to define for each 84 countries
separately with their respective number of months, and years before writing
this code, right?

Yes, after obtaining the monthly average for each country's data, I need to
use them for further calculations. So, I want to export the result to
excel. But, until understanding the code, I think I willn't able to apply
for the entire sample, and cannot be able to discuss the format of the
resulted column to export to excel.

Therefore, kindly help me to understand the code.

Thank you very much, Sir, and thanks to this R forum for helping the
R-beginners.



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/13/19,
12:57:58 PM

On Fri, Sep 13, 2019 at 3:15 AM Jim Lemon <[hidden email]> wrote:

> Hi Subhamitra,
> Your data didn't make it through, so I guess the first thing is to
> guess what it looks like. Here's a try at just January and February of
> 1994 so that we can see the result on the screen. The logic will work
> just as well for the whole seven years.
>
> # create fake data for the first two months
> spdat<-data.frame(
>  dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>  returnA=sample(15:50,58,TRUE),returnB=sample(10:45,58,TRUE))
> # I'll assume that the dates in your file are character, not factor
> spdat$dates<-as.character(spdat$dates)
> # if you only have to get the monthly averages, it can be done this way
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2)
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
> # get the averages by month and year - is this correct?
> monthlyA<-by(spdat$returnA,spdat[,c("month","year")],mean)
> monthlyB<-by(spdat$returnB,spdat[,c("month","year")],mean)
>
> Now you have what you say you want:
>
> monthlyA
> month: 1
> year: 1994
> [1] 34.1
> ------------------------------------------------------------
> month: 2
> year: 1994
> [1] 33.32143
>
> monthlyB
> month: 1
> year: 1994
> [1] 29.7
> ------------------------------------------------------------
> month: 2
> year: 1994
> [1] 27.28571
>
> Sorry I didn't use a loop (for(month in 1:12) ... for (year in
> 1994:2000) ...), too lazy.
> Now you have to let us know how this information is to be formatted to
> go into Excel. Excel will import the text as above, but I think you
> want something that you can use for further calculations.
>
> Jim
>
> On Fri, Sep 13, 2019 at 12:54 AM Subhamitra Patra
> <[hidden email]> wrote:
> >
> > Dear R-users,
> >
> > I have daily data from 03-01-1994 to 29-12-2000. In my datafile, he first
> > column is date and the second and third columns are the returns of the
> > country A, and B. Here, the date column is same for both countries. I
> want
> > to calculate the monthly average of both country's returns by using a
> loop,
> > and then, I want to export the results into excel.
> >
> > Please help me in this regard.
> >
> > Please find the attached datasheet.
> >
> > Thank you.
> >
> > --
> > *Best Regards,*
> > *Subhamitra Patra*
> > *Phd. Research Scholar*
> > *Department of Humanities and Social Sciences*
> > *Indian Institute of Technology, Kharagpur*
> > *INDIA*
> >
> > [image: Mailtrack]
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> > Sender
> > notified by
> > Mailtrack
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> > 09/12/19,
> > 08:23:07 PM
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>


--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

PIKAL Petr
Hi

I may be completely wrong but reshape/aggregate should by what you want
spdat
       dates returnA returnB
1   1-1-1994      16      13
2   2-1-1994      44      10
3   3-1-1994      24      32
.....
> library(reshape2)
> spdat.m <- melt(spdat)
Using dates as id variables
> str(spdat.m)
'data.frame':   116 obs. of  3 variables:
 $ dates   : Factor w/ 58 levels "1-1-1994","1-2-1994",..: 1 23 44 47 49 51 53 55 57 3 ...
 $ variable: Factor w/ 2 levels "returnA","returnB": 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : int  16 44 24 47 16 35 34 34 26 36 ...
> spdat.m$realdate <- as.Date(spdat.m[,1], format="%d-%m-%Y")
> aggregate(spdat.m$value, list(format(spdat.m$realdate, "%m.%Y"), spdat.m$variable), mean)
  Group.1 Group.2        x
1 01.1994 returnA 31.93333
2 02.1994 returnA 32.39286
3 01.1994 returnB 24.26667
4 02.1994 returnB 30.03571

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Subhamitra
> Patra
> Sent: Friday, September 13, 2019 10:08 AM
> To: Jim Lemon <[hidden email]>
> Cc: r-help mailing list <[hidden email]>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear Sir,
>
> Thank you very much for your suggestion.
>
> Yes, your suggested code worked. But, actually, I have data from 3rd January
> 1994 to 3rd August 2017 for very large (i.e. for 84 countries) sample. From
> this, I have given the example of the years up to 2000. Before applying the
> same code for the long 24 years, I want to learn the logic behind the code.
> Actually, some part of the code is not understandable to me which I
> mentioned in the bold letter as follows.
>
> "spdat<-data.frame(
>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>
> A. Here, I need to define the no. of days in a month, and the no. of countries
> name separately, right? But, what is meant by 15:50, and 10:45 in return A,
> and B respectively?
>
> "# if you only have to get the monthly averages, it can be done this way
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>
> "# get the averages by month and year - is this correct?
> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>
> C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define  spdat$returnA, and  spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries separately
> with their respective number of months, and years before writing this code,
> right?
>
> Yes, after obtaining the monthly average for each country's data, I need to
> use them for further calculations. So, I want to export the result to excel. But,
> until understanding the code, I think I willn't able to apply for the entire
> sample, and cannot be able to discuss the format of the resulted column to
> export to excel.
>
> Therefore, kindly help me to understand the code.
>
> Thank you very much, Sir, and thanks to this R forum for helping the R-
> beginners.
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> mpaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> mpaign=signaturevirality5&>
> 09/13/19,
> 12:57:58 PM
>
> On Fri, Sep 13, 2019 at 3:15 AM Jim Lemon <[hidden email]> wrote:
>
> > Hi Subhamitra,
> > Your data didn't make it through, so I guess the first thing is to
> > guess what it looks like. Here's a try at just January and February of
> > 1994 so that we can see the result on the screen. The logic will work
> > just as well for the whole seven years.
> >
> > # create fake data for the first two months spdat<-data.frame(
> > dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> >  returnA=sample(15:50,58,TRUE),returnB=sample(10:45,58,TRUE))
> > # I'll assume that the dates in your file are character, not factor
> > spdat$dates<-as.character(spdat$dates)
> > # if you only have to get the monthly averages, it can be done this
> > way
> > spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2)
> > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
> > # get the averages by month and year - is this correct?
> > monthlyA<-by(spdat$returnA,spdat[,c("month","year")],mean)
> > monthlyB<-by(spdat$returnB,spdat[,c("month","year")],mean)
> >
> > Now you have what you say you want:
> >
> > monthlyA
> > month: 1
> > year: 1994
> > [1] 34.1
> > ------------------------------------------------------------
> > month: 2
> > year: 1994
> > [1] 33.32143
> >
> > monthlyB
> > month: 1
> > year: 1994
> > [1] 29.7
> > ------------------------------------------------------------
> > month: 2
> > year: 1994
> > [1] 27.28571
> >
> > Sorry I didn't use a loop (for(month in 1:12) ... for (year in
> > 1994:2000) ...), too lazy.
> > Now you have to let us know how this information is to be formatted to
> > go into Excel. Excel will import the text as above, but I think you
> > want something that you can use for further calculations.
> >
> > Jim
> >
> > On Fri, Sep 13, 2019 at 12:54 AM Subhamitra Patra
> > <[hidden email]> wrote:
> > >
> > > Dear R-users,
> > >
> > > I have daily data from 03-01-1994 to 29-12-2000. In my datafile, he
> > > first column is date and the second and third columns are the
> > > returns of the country A, and B. Here, the date column is same for
> > > both countries. I
> > want
> > > to calculate the monthly average of both country's returns by using
> > > a
> > loop,
> > > and then, I want to export the results into excel.
> > >
> > > Please help me in this regard.
> > >
> > > Please find the attached datasheet.
> > >
> > > Thank you.
> > >
> > > --
> > > *Best Regards,*
> > > *Subhamitra Patra*
> > > *Phd. Research Scholar*
> > > *Department of Humanities and Social Sciences* *Indian Institute of
> > > Technology, Kharagpur*
> > > *INDIA*
> > >
> > > [image: Mailtrack]
> > > <
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_cam
> paig
> > n=signaturevirality5&
> > >
> > > Sender
> > > notified by
> > > Mailtrack
> > > <
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_cam
> paig
> > n=signaturevirality5&
> > >
> > > 09/12/19,
> > > 08:23:07 PM
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences* *Indian Institute of
> Technology, Kharagpur*
> *INDIA*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Jim Lemon-4
In reply to this post by Subhamitra Patra
Hi Subhamitra,
I'll try to write my answers adjacent to your questions below.

On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <[hidden email]>
wrote:

> Dear Sir,
>
> Thank you very much for your suggestion.
>
> Yes, your suggested code worked. But, actually, I have data from 3rd
> January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
> sample. From this, I have given the example of the years up to 2000. Before
> applying the same code for the long 24 years, I want to learn the logic
> behind the code. Actually, some part of the code is not understandable to
> me which I mentioned in the bold letter as follows.
>
> "spdat<-data.frame(
>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>
> A. Here, I need to define the no. of days in a month, and the no. of
> countries name separately, right? But, what is meant by 15:50, and 10:45 in
> return A, and B respectively?
>

To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the real
values of return are, so I made them up using the "sample" function.
However, this is not meant to mislead anyone, just to show how whatever
numbers are in your data can be used in calculations. The colon (":")
operator creates a sequence of numbers starting with the one to the left
and ending with the one to the right.

>
> "# if you only have to get the monthly averages, it can be done this way
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>

To get the grouping variable of sequential months that you want, you only
need the month and year values of the dates in the first column. First I
used the "strsplit" function to split the date field at the hyphens, then
used "sapply" to extract ("[") the second (month) and third (year) parts as
two new columns. Because you have more than one year of data, you need the
year values or you will group all Januarys, all Februarys and so on. Notice
how I pass both of the new columns as a list (a data frame is a type of
list) in the call to get the mean of each month.

>
> "# get the averages by month and year - is this correct?
> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>
> C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define  spdat$returnA, and  spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries
> separately with their respective number of months, and years before writing
> this code, right?
>

I don't think so. Because I don't know what your data looks like, I am
guessing that for each row, it has columns for each of the 84 countries. I
don't know what these columns are named, either. Maybe:

date             Australia   Belarus   ...    Zambia
01/01/1994   20             21                 22
...


> Yes, after obtaining the monthly average for each country's data, I need
> to use them for further calculations. So, I want to export the result to
> excel. But, until understanding the code, I think I willn't able to apply
> for the entire sample, and cannot be able to discuss the format of the
> resulted column to export to excel.
>

Say that we perform the grouped mean calculation for the first two country
columns like this:
monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
monmeans
    Australia  Belarus
[1,]  29.70000 30.43333
[2,]  34.17857 27.39286

We are presented with a 2x2 matrix of monthly means in just the format
someone might use for importing into Excel. The first row is January 1994,
the second February 1994 and so on. By expanding the columns to include all
the countries in your data, You should have the result you want.

Jim

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Subhamitra Patra
Dear Sir,

Yes, I understood the logic. But, still, I have a few queries that I
mentioned below your answers.

"# if you only have to get the monthly averages, it can be done this way
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>

To get the grouping variable of sequential months that you want, you only
need the month and year values of the dates in the first column. First I
used the "strsplit" function to split the date field at the hyphens, then
used "sapply" to extract ("[") the second (month) and *third (year)* parts
as two new columns. Because you have more than one year of data, you need
the year values or you will group all Januarys, all Februarys and so on.
Notice how I pass both of the new columns as a list (a data frame is a type
of list) in the call to get the mean of each month.

1. Here, as per my understanding, the "3" indicates the 3rd year, right?
But, you showed an average for 2 months of the same year. Then, what "3" in
the  spdat$year object indicate?


C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define  spdat$returnA, and  spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries
> separately with their respective number of months, and years before writing
> this code, right?
>

I don't think so. Because I don't know what your data looks like, I am
guessing that for each row, it has columns for each of the 84 countries. I
don't know what these columns are named, either. Maybe:

date             Australia   Belarus   ...    Zambia
01/01/1994   20             21                 22
...

Here, due to my misunderstanding about the code, I was wrong. But, what
data structure you guessed, it is absolutely right that for each row, I
have columns for each of the 84 countries. So, I think, I need to define
the date column with no. of months, and years once for all the countries.
Therefore, I got my answer to the first and third question in the previous
email (what you suggested) that I no need to define the column of each
country, as the date, and no. of observations are same for all countries.
But, the no. of days are different for each month, and similarly, for each
year. So, I think I need to define date for each year separately.  Hence, I
have given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
have written the following code. Please correct me in case I am wrong.

 spdat<-data.frame(

dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20),
rep(3,23), rep(4,21),
rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22)
),rep(1994,260)
 dates1=
paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20),
rep(3,23), rep(4,20),
rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21)
),rep(1995,259) ,sep="-")

Concerning the exporting of structure of the dataset to excel, I will have
12*84 matrix. But, please suggest me the way to proceed for the large
sample. I have mentioned below what I understood from your code. Please
correct me if I am wrong.
1. I need to define the date for each year as the no. of days in each month
are different for each year (as mentioned in my above code). For instance,
in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
2. Need to define the date column as character.
3. Need to define the monthly average for each month, and year. So, now
code will be as follows.
spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
  %%%%As I need all months average sequentially.
spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)

Here, this meaning of "3", I am really unable to get.

4. Need to define each country with each month and year as mentioned in the
last part of your code.

Please suggest me in this regard.

Thank you.







[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/13/19,
06:41:41 PM

On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <[hidden email]> wrote:

> Hi Subhamitra,
> I'll try to write my answers adjacent to your questions below.
>
> On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> [hidden email]> wrote:
>
>> Dear Sir,
>>
>> Thank you very much for your suggestion.
>>
>> Yes, your suggested code worked. But, actually, I have data from 3rd
>> January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
>> sample. From this, I have given the example of the years up to 2000. Before
>> applying the same code for the long 24 years, I want to learn the logic
>> behind the code. Actually, some part of the code is not understandable to
>> me which I mentioned in the bold letter as follows.
>>
>> "spdat<-data.frame(
>>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>>
>> A. Here, I need to define the no. of days in a month, and the no. of
>> countries name separately, right? But, what is meant by 15:50, and 10:45 in
>> return A, and B respectively?
>>
>
> To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> real values of return are, so I made them up using the "sample" function.
> However, this is not meant to mislead anyone, just to show how whatever
> numbers are in your data can be used in calculations. The colon (":")
> operator creates a sequence of numbers starting with the one to the left
> and ending with the one to the right.
>
>>
>> "# if you only have to get the monthly averages, it can be done this way
>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>>
>> B. Here, I need to define the no. of months, and years separately, right?
>> or else what 2, and 3 (in bold) indicates?
>>
>
> To get the grouping variable of sequential months that you want, you only
> need the month and year values of the dates in the first column. First I
> used the "strsplit" function to split the date field at the hyphens, then
> used "sapply" to extract ("[") the second (month) and third (year) parts as
> two new columns. Because you have more than one year of data, you need the
> year values or you will group all Januarys, all Februarys and so on. Notice
> how I pass both of the new columns as a list (a data frame is a type of
> list) in the call to get the mean of each month.
>
>>
>> "# get the averages by month and year - is this correct?
>> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
>> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>>
>> C. From this part, I got the exact average values of both January and
>> February of 1994 for country A, and B. But, in code, I have a query that I
>> need to define  spdat$returnA, and  spdat$returnB separately before writing
>> this code, right? Like this, I need to define for each 84 countries
>> separately with their respective number of months, and years before writing
>> this code, right?
>>
>
> I don't think so. Because I don't know what your data looks like, I am
> guessing that for each row, it has columns for each of the 84 countries. I
> don't know what these columns are named, either. Maybe:
>
> date             Australia   Belarus   ...    Zambia
> 01/01/1994   20             21                 22
> ...
>
>
>> Yes, after obtaining the monthly average for each country's data, I need
>> to use them for further calculations. So, I want to export the result to
>> excel. But, until understanding the code, I think I willn't able to apply
>> for the entire sample, and cannot be able to discuss the format of the
>> resulted column to export to excel.
>>
>
> Say that we perform the grouped mean calculation for the first two country
> columns like this:
> monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> monmeans
>     Australia  Belarus
> [1,]  29.70000 30.43333
> [2,]  34.17857 27.39286
>
> We are presented with a 2x2 matrix of monthly means in just the format
> someone might use for importing into Excel. The first row is January 1994,
> the second February 1994 and so on. By expanding the columns to include all
> the countries in your data, You should have the result you want.
>
> Jim
>


--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

PIKAL Petr
Hi

I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date

?as.Date

reshape your wide format to long one
library(reshape2)
?melt

to get 3 column data.frame with one date column, one country column and one value column

use ?aggregate and ?format to get summary value

something like
aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)

But if you insist to scratch your left ear with right hand accross your head, you could continue your way.

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Subhamitra
> Patra
> Sent: Friday, September 13, 2019 3:20 PM
> To: Jim Lemon <[hidden email]>; r-help mailing list <r-help@r-
> project.org>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear Sir,
>
> Yes, I understood the logic. But, still, I have a few queries that I mentioned
> below your answers.
>
> "# if you only have to get the monthly averages, it can be done this way
> > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> >
> > B. Here, I need to define the no. of months, and years separately, right?
> > or else what 2, and 3 (in bold) indicates?
> >
>
> To get the grouping variable of sequential months that you want, you only
> need the month and year values of the dates in the first column. First I used
> the "strsplit" function to split the date field at the hyphens, then used
> "sapply" to extract ("[") the second (month) and *third (year)* parts as two
> new columns. Because you have more than one year of data, you need the
> year values or you will group all Januarys, all Februarys and so on.
> Notice how I pass both of the new columns as a list (a data frame is a type of
> list) in the call to get the mean of each month.
>
> 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> But, you showed an average for 2 months of the same year. Then, what "3"
> in the  spdat$year object indicate?
>
>
> C. From this part, I got the exact average values of both January and
> > February of 1994 for country A, and B. But, in code, I have a query
> > that I need to define  spdat$returnA, and  spdat$returnB separately
> > before writing this code, right? Like this, I need to define for each
> > 84 countries separately with their respective number of months, and
> > years before writing this code, right?
> >
>
> I don't think so. Because I don't know what your data looks like, I am
> guessing that for each row, it has columns for each of the 84 countries. I
> don't know what these columns are named, either. Maybe:
>
> date             Australia   Belarus   ...    Zambia
> 01/01/1994   20             21                 22
> ...
>
> Here, due to my misunderstanding about the code, I was wrong. But, what
> data structure you guessed, it is absolutely right that for each row, I have
> columns for each of the 84 countries. So, I think, I need to define the date
> column with no. of months, and years once for all the countries.
> Therefore, I got my answer to the first and third question in the previous
> email (what you suggested) that I no need to define the column of each
> country, as the date, and no. of observations are same for all countries.
> But, the no. of days are different for each month, and similarly, for each
> year. So, I think I need to define date for each year separately.  Hence, I have
> given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have
> written the following code. Please correct me in case I am wrong.
>
>  spdat<-data.frame(
>
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> ep(1,21),rep(2,20),
> rep(3,23), rep(4,21),
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> ,22)
> ),rep(1994,260)
>  dates1=
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> 2),rep(2,20),
> rep(3,23), rep(4,20),
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> ,21)
> ),rep(1995,259) ,sep="-")
>
> Concerning the exporting of structure of the dataset to excel, I will have
> 12*84 matrix. But, please suggest me the way to proceed for the large
> sample. I have mentioned below what I understood from your code. Please
> correct me if I am wrong.
> 1. I need to define the date for each year as the no. of days in each month
> are different for each year (as mentioned in my above code). For instance, in
> my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> 2. Need to define the date column as character.
> 3. Need to define the monthly average for each month, and year. So, now
> code will be as follows.
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
>   %%%%As I need all months average sequentially.
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
>
> Here, this meaning of "3", I am really unable to get.
>
> 4. Need to define each country with each month and year as mentioned in
> the last part of your code.
>
> Please suggest me in this regard.
>
> Thank you.
>
>
>
>
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> mpaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> mpaign=signaturevirality5&>
> 09/13/19,
> 06:41:41 PM
>
> On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <[hidden email]> wrote:
>
> > Hi Subhamitra,
> > I'll try to write my answers adjacent to your questions below.
> >
> > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > [hidden email]> wrote:
> >
> >> Dear Sir,
> >>
> >> Thank you very much for your suggestion.
> >>
> >> Yes, your suggested code worked. But, actually, I have data from 3rd
> >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> >> countries) sample. From this, I have given the example of the years
> >> up to 2000. Before applying the same code for the long 24 years, I
> >> want to learn the logic behind the code. Actually, some part of the
> >> code is not understandable to me which I mentioned in the bold letter as
> follows.
> >>
> >> "spdat<-data.frame(
> >>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> >>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> >>
> >> A. Here, I need to define the no. of days in a month, and the no. of
> >> countries name separately, right? But, what is meant by 15:50, and
> >> 10:45 in return A, and B respectively?
> >>
> >
> > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > real values of return are, so I made them up using the "sample" function.
> > However, this is not meant to mislead anyone, just to show how
> > whatever numbers are in your data can be used in calculations. The
> > colon (":") operator creates a sequence of numbers starting with the
> > one to the left and ending with the one to the right.
> >
> >>
> >> "# if you only have to get the monthly averages, it can be done this
> >> way
> >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> >>
> >> B. Here, I need to define the no. of months, and years separately, right?
> >> or else what 2, and 3 (in bold) indicates?
> >>
> >
> > To get the grouping variable of sequential months that you want, you
> > only need the month and year values of the dates in the first column.
> > First I used the "strsplit" function to split the date field at the
> > hyphens, then used "sapply" to extract ("[") the second (month) and
> > third (year) parts as two new columns. Because you have more than one
> > year of data, you need the year values or you will group all Januarys,
> > all Februarys and so on. Notice how I pass both of the new columns as
> > a list (a data frame is a type of
> > list) in the call to get the mean of each month.
> >
> >>
> >> "# get the averages by month and year - is this correct?
> >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> >>
> >> C. From this part, I got the exact average values of both January and
> >> February of 1994 for country A, and B. But, in code, I have a query
> >> that I need to define  spdat$returnA, and  spdat$returnB separately
> >> before writing this code, right? Like this, I need to define for each
> >> 84 countries separately with their respective number of months, and
> >> years before writing this code, right?
> >>
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84
> > countries. I don't know what these columns are named, either. Maybe:
> >
> > date             Australia   Belarus   ...    Zambia
> > 01/01/1994   20             21                 22
> > ...
> >
> >
> >> Yes, after obtaining the monthly average for each country's data, I
> >> need to use them for further calculations. So, I want to export the
> >> result to excel. But, until understanding the code, I think I willn't
> >> able to apply for the entire sample, and cannot be able to discuss
> >> the format of the resulted column to export to excel.
> >>
> >
> > Say that we perform the grouped mean calculation for the first two
> > country columns like this:
> > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > monmeans
> >     Australia  Belarus
> > [1,]  29.70000 30.43333
> > [2,]  34.17857 27.39286
> >
> > We are presented with a 2x2 matrix of monthly means in just the format
> > someone might use for importing into Excel. The first row is January
> > 1994, the second February 1994 and so on. By expanding the columns to
> > include all the countries in your data, You should have the result you want.
> >
> > Jim
> >
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences* *Indian Institute of
> Technology, Kharagpur*
> *INDIA*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Subhamitra Patra
Dear PIKAL,

Thank you very much for your suggestion.

I tried your previous suggested code and getting the average value for each
month for both country A, and B. But in your recent email, you are
suggesting not to change the date column to real date. If I am going
through your recently suggested code, i.e.

 "aggregate(value column, list(format(date column, "%m.%Y"), country
column), mean)"

I am getting an Error that "*aggregate(value, list(format(date, "%m.%Y"),
country), mean) : **object 'value' not found"*.

Here, my query "*may I need to define the date column, country column, and
value column separately?"*

Further, I need something the average value result like below in the data
frame

Month       Country A   Country B
Jan 1994    26.66         35.78
Feb 1994    26.13         29.14

so that it will be easy for me to export to excel, and to use for the
further calculations.

Please suggest me in this regard.

Thank you.







[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/13/19,
07:22:53 PM

On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <[hidden email]> wrote:

> Hi
>
> I am almost 100% sure that you would spare yourself much trouble if you
> changed your date column to real date
>
> ?as.Date
>
> reshape your wide format to long one
> library(reshape2)
> ?melt
>
> to get 3 column data.frame with one date column, one country column and
> one value column
>
> use ?aggregate and ?format to get summary value
>
> something like
> aggregate(value column, list(format(date column, "%m.%Y"), country
> column), mean)
>
> But if you insist to scratch your left ear with right hand accross your
> head, you could continue your way.
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <[hidden email]> On Behalf Of Subhamitra
> > Patra
> > Sent: Friday, September 13, 2019 3:20 PM
> > To: Jim Lemon <[hidden email]>; r-help mailing list <r-help@r-
> > project.org>
> > Subject: Re: [R] Query about calculating the monthly average of daily
> data
> > columns
> >
> > Dear Sir,
> >
> > Yes, I understood the logic. But, still, I have a few queries that I
> mentioned
> > below your answers.
> >
> > "# if you only have to get the monthly averages, it can be done this way
> > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > >
> > > B. Here, I need to define the no. of months, and years separately,
> right?
> > > or else what 2, and 3 (in bold) indicates?
> > >
> >
> > To get the grouping variable of sequential months that you want, you only
> > need the month and year values of the dates in the first column. First I
> used
> > the "strsplit" function to split the date field at the hyphens, then used
> > "sapply" to extract ("[") the second (month) and *third (year)* parts as
> two
> > new columns. Because you have more than one year of data, you need the
> > year values or you will group all Januarys, all Februarys and so on.
> > Notice how I pass both of the new columns as a list (a data frame is a
> type of
> > list) in the call to get the mean of each month.
> >
> > 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> > But, you showed an average for 2 months of the same year. Then, what "3"
> > in the  spdat$year object indicate?
> >
> >
> > C. From this part, I got the exact average values of both January and
> > > February of 1994 for country A, and B. But, in code, I have a query
> > > that I need to define  spdat$returnA, and  spdat$returnB separately
> > > before writing this code, right? Like this, I need to define for each
> > > 84 countries separately with their respective number of months, and
> > > years before writing this code, right?
> > >
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84 countries.
> I
> > don't know what these columns are named, either. Maybe:
> >
> > date             Australia   Belarus   ...    Zambia
> > 01/01/1994   20             21                 22
> > ...
> >
> > Here, due to my misunderstanding about the code, I was wrong. But, what
> > data structure you guessed, it is absolutely right that for each row, I
> have
> > columns for each of the 84 countries. So, I think, I need to define the
> date
> > column with no. of months, and years once for all the countries.
> > Therefore, I got my answer to the first and third question in the
> previous
> > email (what you suggested) that I no need to define the column of each
> > country, as the date, and no. of observations are same for all countries.
> > But, the no. of days are different for each month, and similarly, for
> each
> > year. So, I think I need to define date for each year separately.
> Hence, I have
> > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
> have
> > written the following code. Please correct me in case I am wrong.
> >
> >  spdat<-data.frame(
> >
> >
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> > ep(1,21),rep(2,20),
> > rep(3,23), rep(4,21),
> >
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> > ,22)
> > ),rep(1994,260)
> >  dates1=
> >
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> > 2),rep(2,20),
> > rep(3,23), rep(4,20),
> >
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> > ,21)
> > ),rep(1995,259) ,sep="-")
> >
> > Concerning the exporting of structure of the dataset to excel, I will
> have
> > 12*84 matrix. But, please suggest me the way to proceed for the large
> > sample. I have mentioned below what I understood from your code. Please
> > correct me if I am wrong.
> > 1. I need to define the date for each year as the no. of days in each
> month
> > are different for each year (as mentioned in my above code). For
> instance, in
> > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> > 2. Need to define the date column as character.
> > 3. Need to define the monthly average for each month, and year. So, now
> > code will be as follows.
> >
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
> >   %%%%As I need all months average sequentially.
> > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
> >
> > Here, this meaning of "3", I am really unable to get.
> >
> > 4. Need to define each country with each month and year as mentioned in
> > the last part of your code.
> >
> > Please suggest me in this regard.
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> > [image: Mailtrack]
> > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > mpaign=signaturevirality5&>
> > Sender
> > notified by
> > Mailtrack
> > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > mpaign=signaturevirality5&>
> > 09/13/19,
> > 06:41:41 PM
> >
> > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <[hidden email]> wrote:
> >
> > > Hi Subhamitra,
> > > I'll try to write my answers adjacent to your questions below.
> > >
> > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > > [hidden email]> wrote:
> > >
> > >> Dear Sir,
> > >>
> > >> Thank you very much for your suggestion.
> > >>
> > >> Yes, your suggested code worked. But, actually, I have data from 3rd
> > >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> > >> countries) sample. From this, I have given the example of the years
> > >> up to 2000. Before applying the same code for the long 24 years, I
> > >> want to learn the logic behind the code. Actually, some part of the
> > >> code is not understandable to me which I mentioned in the bold letter
> as
> > follows.
> > >>
> > >> "spdat<-data.frame(
> > >>
>  dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> > >>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> > >>
> > >> A. Here, I need to define the no. of days in a month, and the no. of
> > >> countries name separately, right? But, what is meant by 15:50, and
> > >> 10:45 in return A, and B respectively?
> > >>
> > >
> > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > > real values of return are, so I made them up using the "sample"
> function.
> > > However, this is not meant to mislead anyone, just to show how
> > > whatever numbers are in your data can be used in calculations. The
> > > colon (":") operator creates a sequence of numbers starting with the
> > > one to the left and ending with the one to the right.
> > >
> > >>
> > >> "# if you only have to get the monthly averages, it can be done this
> > >> way
> > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > >>
> > >> B. Here, I need to define the no. of months, and years separately,
> right?
> > >> or else what 2, and 3 (in bold) indicates?
> > >>
> > >
> > > To get the grouping variable of sequential months that you want, you
> > > only need the month and year values of the dates in the first column.
> > > First I used the "strsplit" function to split the date field at the
> > > hyphens, then used "sapply" to extract ("[") the second (month) and
> > > third (year) parts as two new columns. Because you have more than one
> > > year of data, you need the year values or you will group all Januarys,
> > > all Februarys and so on. Notice how I pass both of the new columns as
> > > a list (a data frame is a type of
> > > list) in the call to get the mean of each month.
> > >
> > >>
> > >> "# get the averages by month and year - is this correct?
> > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> > >>
> > >> C. From this part, I got the exact average values of both January and
> > >> February of 1994 for country A, and B. But, in code, I have a query
> > >> that I need to define  spdat$returnA, and  spdat$returnB separately
> > >> before writing this code, right? Like this, I need to define for each
> > >> 84 countries separately with their respective number of months, and
> > >> years before writing this code, right?
> > >>
> > >
> > > I don't think so. Because I don't know what your data looks like, I am
> > > guessing that for each row, it has columns for each of the 84
> > > countries. I don't know what these columns are named, either. Maybe:
> > >
> > > date             Australia   Belarus   ...    Zambia
> > > 01/01/1994   20             21                 22
> > > ...
> > >
> > >
> > >> Yes, after obtaining the monthly average for each country's data, I
> > >> need to use them for further calculations. So, I want to export the
> > >> result to excel. But, until understanding the code, I think I willn't
> > >> able to apply for the entire sample, and cannot be able to discuss
> > >> the format of the resulted column to export to excel.
> > >>
> > >
> > > Say that we perform the grouped mean calculation for the first two
> > > country columns like this:
> > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > > monmeans
> > >     Australia  Belarus
> > > [1,]  29.70000 30.43333
> > > [2,]  34.17857 27.39286
> > >
> > > We are presented with a 2x2 matrix of monthly means in just the format
> > > someone might use for importing into Excel. The first row is January
> > > 1994, the second February 1994 and so on. By expanding the columns to
> > > include all the countries in your data, You should have the result you
> want.
> > >
> > > Jim
> > >
> >
> >
> > --
> > *Best Regards,*
> > *Subhamitra Patra*
> > *Phd. Research Scholar*
> > *Department of Humanities and Social Sciences* *Indian Institute of
> > Technology, Kharagpur*
> > *INDIA*
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
>

--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Jim Lemon-4
In reply to this post by Subhamitra Patra
Sorry, forgot to include the list.

On Sat, Sep 14, 2019 at 10:27 AM Jim Lemon <[hidden email]> wrote:

>
> See inline
>
> On Fri, Sep 13, 2019 at 11:20 PM Subhamitra Patra <[hidden email]> wrote:
>>
>> Dear Sir,
>>
>> Yes, I understood the logic. But, still, I have a few queries that I mentioned below your answers.
>>
>>> "# if you only have to get the monthly averages, it can be done this way
>>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2)
>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)"
>>>
>>> B. Here, I need to define the no. of months, and years separately, right? or else what 2, and 3 (in bold) indicates?
>>
>>
>> To get the grouping variable of sequential months that you want, you only need the month and year values of the dates in the first column. First I used the "strsplit" function to split the date field at the hyphens, then used "sapply" to extract ("[") the second (month) and third (year) parts as two new columns. Because you have more than one year of data, you need the year values or you will group all Januarys, all Februarys and so on. Notice how I pass both of the new columns as a list (a data frame is a type of list) in the call to get the mean of each month.
>>
>> 1. Here, as per my understanding, the "3" indicates the 3rd year, right? But, you showed an average for 2 months of the same year. Then, what "3" in the  spdat$year object indicate?
>
>
> No, as I explained in the initial email and below, the "strsplit" function takes one or more strings (your dates) and breaks them at the specified character ("-"), So
>
> strsplit("1-1-1994","-")
> [[1]]
> [1] "1"    "1"    "1994"
>
> That is passed to the "sapply" function that applies the extraction ("[") operator to the result of "strsplit". The "3" indicates that you want to extract the third element, in this case, the year.
>
> > sapply(strsplit("1-1-1994","-"),"[",3)
> [1] "1994"
>
> So by splitting the dates and extracting the second (month) and third (year) element from each date, we have all the information needed to create a grouping variable for monthly averages.
>
>>
>>
>>> C. From this part, I got the exact average values of both January and February of 1994 for country A, and B. But, in code, I have a query that I need to define  spdat$returnA, and  spdat$returnB separately before writing this code, right? Like this, I need to define for each 84 countries separately with their respective number of months, and years before writing this code, right?
>>
>>
>> I don't think so. Because I don't know what your data looks like, I am guessing that for each row, it has columns for each of the 84 countries. I don't know what these columns are named, either. Maybe:
>>
>> date             Australia   Belarus   ...    Zambia
>> 01/01/1994   20             21                 22
>> ...
>>
>> Here, due to my misunderstanding about the code, I was wrong. But, what data structure you guessed, it is absolutely right that for each row, I have columns for each of the 84 countries. So, I think, I need to define the date column with no. of months, and years once for all the countries. Therefore, I got my answer to the first and third question in the previous email (what you suggested) that I no need to define the column of each country, as the date, and no. of observations are same for all countries. But, the no. of days are different for each month, and similarly, for each year. So, I think I need to define date for each year separately.  Hence, I have given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have written the following code. Please correct me in case I am wrong.
>>
>>  spdat<-data.frame(
>>   dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20),rep(3,23), rep(4,21), rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22)),rep(1994,260)
>>  dates1=paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20),rep(3,23), rep(4,20), rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21)),rep(1995,259) ,sep="-")
>>
> First, you don't have to recreate the data that you already have. I did because I don't have it and have to guess what it looks like. Remember neither I nor any of the others who have offered help have your data or even a representative sample. If you tried the code above, you surely must know that it doesn't work. I could create code that would produce the dates from 1-1-1994 to 31/12/1995 or any other stretch you would like, but it would only confuse you more.  _You already have the dates in your data file._ What I have shown you is how to use those dates to create the grouping variable that you want.
>
>> Concerning the exporting of structure of the dataset to excel, I will have 12*84 matrix. But, please suggest me the way to proceed for the large sample. I have mentioned below what I understood from your code. Please correct me if I am wrong.
>> 1. I need to define the date for each year as the no. of days in each month are different for each year (as mentioned in my above code). For instance, in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
>> 2. Need to define the date column as character.
>> 3. Need to define the monthly average for each month, and year. So, now code will be as follows.
>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)    %%%%As I need all months average sequentially.
>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
>>
>> Here, this meaning of "3", I am really unable to get
>
>
> You have missed the point here, as above. I didn't mean to suggest that you had to recreate the dates that you already have. What I did was to show how you could use the dates that you already have to create a grouping variable for your calculation.
>>
>> .
>>
>> 4. Need to define each country with each month and year as mentioned in the last part of your code.
>>
> What I did was to add the month and year of each row as two separate columns of data. You should be able to see that by looking at spdat after the "strsplit/sapply" operation. Then you have in each row the returns from your 84 countries _and_ the month/year for that row. When I used the "by" function to get the monthly means from spdat, I showed you how to use the same code on your data frame, after creating the month and year columns, to get the monthly average return for the 84 countries for as many years as you have. As this should return a matrix of successive months as rows and countries as columns, you should easily be able to import this into Excel.
>
> The reason I did it this way was to illustrate how to define the grouping variable from the existing information and perform an easily understood calculation. I thought that using methods that automatically perform the operations I used might allow you to get the result without understanding how you had gotten it. Like the data I didn't have, I took a guess at how much my example would help you understand what was happening and give you the skills to do it yourself. I do hope that you get to the point where you are able to think of my example as unsophisticated, for I could have done it with dplyr/ts/reshape and the rest.
>
> Jim

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

PIKAL Petr
In reply to this post by Subhamitra Patra
Original email did not come through (some problems with formating).

Hi

No, on contrary. I **am** suggesting to change date column to real date asi it is easy to handle with appropriate functions.

Here are some fake data

> str(spdat)
'data.frame':   260 obs. of  3 variables:
$ dates   : Date, format: "1995-01-01" "1995-01-02" "1995-01-03" "1995-01-04" ...
$ coutryA : num  0.188 0.405 -0.107 -0.596 -0.529 ...
$ countryB: num  9.4 10.76 11.24 8.26 10.71 ..

> head(spdat)
       dates    coutryA  countryB
1 1995-01-01  0.1875060  9.402851
2 1995-01-02  0.4045193 10.755112
3 1995-01-03 -0.1073904 11.243663
4 1995-01-04 -0.5959683  8.256424
5 1995-01-05 -0.5293772 10.705431
6 1995-01-06 -0.2228029 10.171461

First I melt it
spdat.m <- melt(spdat, id.var="dates")

> head(spdat.m)
       dates variable      value
1 1995-01-01  coutryA  0.1875060
2 1995-01-02  coutryA  0.4045193
3 1995-01-03  coutryA -0.1073904
4 1995-01-04  coutryA -0.5959683
5 1995-01-05  coutryA -0.5293772
6 1995-01-06  coutryA -0.2228029

I do aggregation

> spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates, "%m.%Y"), spdat.m$variable), mean)

And now I use dcast  to get required result.

> dcast(spdat.ag, Group.1~Group.2)
Using x as value column: use value.var to override.
   Group.1      coutryA  countryB
1  01.1995  0.098688137 10.177696
2  02.1995  0.352264682  9.609261
3  03.1995  0.155521876 10.043503
4  04.1995 -0.166092393 10.129844
5  05.1995  0.164665188 10.308275
6  06.1995  0.260633585 10.210129
7  07.1995  0.003671979 10.549016
8  08.1995  0.045295990 10.087435
9  09.1995 -0.145488206  9.689876
10 10.1995 -0.225645950  9.743744
11 11.1995  0.030273383 10.025435
12 12.1995  0.043557468 10.105626

Cheers
Petr

Here are the data.

> dput(spdat)
spdat  <- structure(list(dates = structure(c(9131, 9132, 9133, 9134, 9135,
9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
0.404519257417805, -0.107390371811605, -0.595968278805544, -0.529377240936012,
-0.222802921207767, 0.413182392872818, 0.689673026532298, -1.2768723266992,
-0.506308625809406, 0.113859233745174, -0.0963423819877653, 0.323987304768398,
1.63846917270538, 0.893233423250338, 0.297732439150487, 0.949323101836486,
-0.599518074708052, 0.366372319197032, -2.25734971953878, -0.190971733204918,
-0.0874143568874351, 1.46699645184047, 0.00702170238687361, 0.11221346278474,
-0.8060359607624, 0.340842350476532, 0.798838328074708, 0.449214745851041,
-0.664972890558734, 0.521830282184173, -1.35020467264521, -0.95240631225826,
1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207,
1.48489932847779, 0.529222943794807, 0.0995675049147771, 0.477770516727839,
1.64567253670186, -0.0212651530684566, 0.558952796713992, 0.0409979382929057,
0.428675380654606, 0.0919422583362682, -0.819694497340459, 1.23998830450888,
0.607498144489643, -1.27724580163097, 1.41634774644371, -0.579094515769707,
2.02039606694223, 0.0740478208705996, -1.69826944583929, -0.321482399813063,
-0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973,
1.31264724137396, -0.0473627194710677, 0.141362267796145, 0.329709761206515,
0.518454586458572, -1.39489985851779, -0.388303591187678, -0.668922704543522,
0.0735115674875065, 1.30737242978235, 0.198503397980751, 0.257831448122427,
-1.31173539205588, -1.45147941969116, 0.359725782295977, 0.612882118056585,
-0.0733768753346202, -0.508349204402508, 1.35776663767231, 0.997807735669086,
-1.41717534266382, -0.894170593324238, -0.68578120845151, -0.211509378018794,
0.436738904337909, -1.46932152770435, 0.0817388759874159, -0.0389350881653141,
0.709198476466861, -0.963669144724435, -0.548607422521798, -0.896886885575286,
0.322231150840934, 1.37327611339939, 0.0310213133870952, 0.796577750757324,
-0.2010067423637, -0.241723752424226, 1.37547329580654, -1.15382202538982,
0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
-0.536393730924719, -0.45845011727266, 1.10226256157127, -0.385596991265563,
3.20218061566932, -1.25865250042183, -0.13613128784276, 0.483329357746514,
-0.597187329618306, 0.710977603908319, -1.07945708269043, -0.477626236401394,
1.51034914684104, 2.35886426985999, -0.0250526828683629, -0.29439443478131,
0.665774016744828, 0.464027472251246, 0.226658374792016, -0.802597030454373,
0.825517059805602, -1.11293193130819, -1.27677400513873, 1.60776237113347,
1.12490009531342, 0.95767047134623, 0.0475745549797055, -0.0591587460876868,
-0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106,
0.764367674339969, 1.49261525602638, 0.549570728337346, -1.29658399741794,
-1.6289903797869, 0.00573336252135834, 0.0300702149640632, 0.440810830115721,
0.663568666361326, -0.126685900835146, -0.00221628368438927,
0.815321995886579, -0.499280888368945, -0.271814047751667, -0.071025546459042,
1.73165491816826, -0.0294770299043331, 0.833605607221529, -0.670108794857159,
-0.303323318026829, 1.29039844459134, -0.818806702120603, -0.445515595649677,
-0.0128796557666887, 0.320923705586147, 0.230597275812536, -1.54009153212366,
-0.294702981688559, 0.581209734391958, 0.121384768986639, 0.502914098451111,
-1.59018268505718, -0.635101104166451, 1.48005776676403, -0.25631761189957,
0.171947814411552, 0.444646195980014, 0.172655758440111, -0.00432159794094836,
-0.549321974240026, 0.585055026451421, -1.22813371480849, 0.846807540195381,
0.319629441352597, 0.393525732059709, -1.40275675444594, 1.11062585584811,
0.214809571213853, -0.636432711800391, -0.283087127251573, -1.46385553207618,
0.436928676930225, -1.34231945433777, 0.451281957595763, -0.523155001924496,
-2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
0.346147428691405, -0.464527560160041, 0.337233933370495, 1.11331396366389,
-1.00060600083316, -0.734784444487169, 1.40476315358621, 1.01671092179193,
-0.0144306250829694, -0.923555930346906, -1.02275966525015, 0.619422010219383,
0.603484309754755, -0.774553813657576, 0.0932792545556387, -0.651884521428279,
-0.61965612647073, -1.22104834441579, -1.31439612639271, -2.87707752518163,
-0.0343801084491906, -0.640678302378492, -1.38653452986558, 0.884963139028743,
-0.657454283462004, 0.462842665244993, -0.20881674837534, 0.6345884135548,
0.707165108434729, -0.162090928425892, -0.998662309785188, 1.3130254639318,
0.191890764940071, -0.0493619237876962, -0.55183232511689, 0.470263932874487,
-0.217088645692971, 0.231550037620628, -0.530406537266415, -0.616522469083808,
0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
0.766584887163714, -0.259803384094296, -0.402463714097741, -0.0229799209735185,
-0.259677990559218, -1.41529707261105, 0.191362852138627, 1.54483266684747,
-1.17947655378489, -0.426265411073274, 0.723010460481118, 1.37405142869537,
-0.374771207936141, 0.0513905365832423, -0.369432731236118, -0.945441984794364,
0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083
), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
9.59278722974697)), row.names = c(NA, -260L), class = "data.frame")





From: Subhamitra Patra <mailto:[hidden email]>
Sent: Friday, September 13, 2019 3:59 PM
To: PIKAL Petr <mailto:[hidden email]>; r-help mailing list <mailto:[hidden email]>
Subject: Re: [R] Query about calculating the monthly average of daily data columns

Dear PIKAL,

Thank you very much for your suggestion.

I tried your previous suggested code and getting the average value for each month for both country A, and B. But in your recent email, you are suggesting not to change the date column to real date. If I am going through your recently suggested code, i.e.

 "aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)"

I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"), country), mean) : object 'value' not found". 

Here, my query "may I need to define the date column, country column, and value column separately?"

Further, I need something the average value result like below in the data frame

Month       Country A   Country B
Jan 1994    26.66         35.78
Feb 1994    26.13         29.14

so that it will be easy for me to export to excel, and to use for the further calculations.

Please suggest me in this regard.

Thank you.





https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
Sender notified by
https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 09/13/19, 07:22:53 PM



On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:[hidden email]> wrote:
Hi

I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date

?as.Date

reshape your wide format to long one
library(reshape2)
?melt

to get 3 column data.frame with one date column, one country column and one value column

use ?aggregate and ?format to get summary value

something like
aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)

But if you insist to scratch your left ear with right hand accross your head, you could continue your way.

Cheers
Petr

> -----Original Message-----
> From: R-help <mailto:[hidden email]> On Behalf Of Subhamitra
> Patra
> Sent: Friday, September 13, 2019 3:20 PM
> To: Jim Lemon <mailto:[hidden email]>; r-help mailing list <r-help@r-
> http://project.org>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear Sir,
>
> Yes, I understood the logic. But, still, I have a few queries that I mentioned
> below your answers.
>
> "# if you only have to get the monthly averages, it can be done this way
> > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> >
> > B. Here, I need to define the no. of months, and years separately, right?
> > or else what 2, and 3 (in bold) indicates?
> >
>
> To get the grouping variable of sequential months that you want, you only
> need the month and year values of the dates in the first column. First I used
> the "strsplit" function to split the date field at the hyphens, then used
> "sapply" to extract ("[") the second (month) and *third (year)* parts as two
> new columns. Because you have more than one year of data, you need the
> year values or you will group all Januarys, all Februarys and so on.
> Notice how I pass both of the new columns as a list (a data frame is a type of
> list) in the call to get the mean of each month.
>
> 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> But, you showed an average for 2 months of the same year. Then, what "3"
> in the  spdat$year object indicate?
>
>
> C. From this part, I got the exact average values of both January and
> > February of 1994 for country A, and B. But, in code, I have a query
> > that I need to define  spdat$returnA, and  spdat$returnB separately
> > before writing this code, right? Like this, I need to define for each
> > 84 countries separately with their respective number of months, and
> > years before writing this code, right?
> >
>
> I don't think so. Because I don't know what your data looks like, I am
> guessing that for each row, it has columns for each of the 84 countries. I
> don't know what these columns are named, either. Maybe:
>
> date             Australia   Belarus   ...    Zambia
> 01/01/1994   20             21                 22
> ...
>
> Here, due to my misunderstanding about the code, I was wrong. But, what
> data structure you guessed, it is absolutely right that for each row, I have
> columns for each of the 84 countries. So, I think, I need to define the date
> column with no. of months, and years once for all the countries.
> Therefore, I got my answer to the first and third question in the previous
> email (what you suggested) that I no need to define the column of each
> country, as the date, and no. of observations are same for all countries.
> But, the no. of days are different for each month, and similarly, for each
> year. So, I think I need to define date for each year separately.  Hence, I have
> given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have
> written the following code. Please correct me in case I am wrong.
>
>  spdat<-data.frame(
>
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> ep(1,21),rep(2,20),
> rep(3,23), rep(4,21),
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> ,22)
> ),rep(1994,260)
>  dates1=
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> 2),rep(2,20),
> rep(3,23), rep(4,20),
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> ,21)
> ),rep(1995,259) ,sep="-")
>
> Concerning the exporting of structure of the dataset to excel, I will have
> 12*84 matrix. But, please suggest me the way to proceed for the large
> sample. I have mentioned below what I understood from your code. Please
> correct me if I am wrong.
> 1. I need to define the date for each year as the no. of days in each month
> are different for each year (as mentioned in my above code). For instance, in
> my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> 2. Need to define the date column as character.
> 3. Need to define the monthly average for each month, and year. So, now
> code will be as follows.
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
>   %%%%As I need all months average sequentially.
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
>
> Here, this meaning of "3", I am really unable to get.
>
> 4. Need to define each country with each month and year as mentioned in
> the last part of your code.
>
> Please suggest me in this regard.
>
> Thank you.
>
>
>
>
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> mpaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> mpaign=signaturevirality5&>
> 09/13/19,
> 06:41:41 PM
>
> On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:[hidden email]> wrote:
>
> > Hi Subhamitra,
> > I'll try to write my answers adjacent to your questions below.
> >
> > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > mailto:[hidden email]> wrote:
> >
> >> Dear Sir,
> >>
> >> Thank you very much for your suggestion.
> >>
> >> Yes, your suggested code worked. But, actually, I have data from 3rd
> >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> >> countries) sample. From this, I have given the example of the years
> >> up to 2000. Before applying the same code for the long 24 years, I
> >> want to learn the logic behind the code. Actually, some part of the
> >> code is not understandable to me which I mentioned in the bold letter as
> follows.
> >>
> >> "spdat<-data.frame(
> >>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> >>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> >>
> >> A. Here, I need to define the no. of days in a month, and the no. of
> >> countries name separately, right? But, what is meant by 15:50, and
> >> 10:45 in return A, and B respectively?
> >>
> >
> > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > real values of return are, so I made them up using the "sample" function.
> > However, this is not meant to mislead anyone, just to show how
> > whatever numbers are in your data can be used in calculations. The
> > colon (":") operator creates a sequence of numbers starting with the
> > one to the left and ending with the one to the right.
> >
> >>
> >> "# if you only have to get the monthly averages, it can be done this
> >> way
> >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> >>
> >> B. Here, I need to define the no. of months, and years separately, right?
> >> or else what 2, and 3 (in bold) indicates?
> >>
> >
> > To get the grouping variable of sequential months that you want, you
> > only need the month and year values of the dates in the first column.
> > First I used the "strsplit" function to split the date field at the
> > hyphens, then used "sapply" to extract ("[") the second (month) and
> > third (year) parts as two new columns. Because you have more than one
> > year of data, you need the year values or you will group all Januarys,
> > all Februarys and so on. Notice how I pass both of the new columns as
> > a list (a data frame is a type of
> > list) in the call to get the mean of each month.
> >
> >>
> >> "# get the averages by month and year - is this correct?
> >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> >>
> >> C. From this part, I got the exact average values of both January and
> >> February of 1994 for country A, and B. But, in code, I have a query
> >> that I need to define  spdat$returnA, and  spdat$returnB separately
> >> before writing this code, right? Like this, I need to define for each
> >> 84 countries separately with their respective number of months, and
> >> years before writing this code, right?
> >>
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84
> > countries. I don't know what these columns are named, either. Maybe:
> >
> > date             Australia   Belarus   ...    Zambia
> > 01/01/1994   20             21                 22
> > ...
> >
> >
> >> Yes, after obtaining the monthly average for each country's data, I
> >> need to use them for further calculations. So, I want to export the
> >> result to excel. But, until understanding the code, I think I willn't
> >> able to apply for the entire sample, and cannot be able to discuss
> >> the format of the resulted column to export to excel.
> >>
> >
> > Say that we perform the grouped mean calculation for the first two
> > country columns like this:
> > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > monmeans
> >     Australia  Belarus
> > [1,]  29.70000 30.43333
> > [2,]  34.17857 27.39286
> >
> > We are presented with a 2x2 matrix of monthly means in just the format
> > someone might use for importing into Excel. The first row is January
> > 1994, the second February 1994 and so on. By expanding the columns to
> > include all the countries in your data, You should have the result you want.
> >
> > Jim
> >
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences* *Indian Institute of
> Technology, Kharagpur*
> *INDIA*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/



--
Best Regards,
Subhamitra Patra
Phd. Research Scholar
Department of Humanities and Social Sciences
Indian Institute of Technology, Kharagpur
INDIA
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Subhamitra Patra
Dear Sir,

Thank you very much for your suggestions.

Due to certain inconveniences, I was unable to work on your suggestions.

Today I worked on both suggestions and got the result that I really wanted
that monthly averages for each country.

Here, I am asking one more query (just for learning purpose) that if my
country name and its respective variable is in the panel format, and I want
to take the monthly average for each country, how the code will be
arranged. For your convenience, I am providing a small data sample below.

PERMNO DATE Spread
111 19940103 0.025464308
111 19940104 0.064424296
111 19940105 0.018579337
111 19940106 0.018872211
111 19940107 0.065279782
111 19940110 0.063485905
111 19940111 0.018355453
111 19940112 0.064135683
111 19940113 0.063519987
111 19940114 0.018277351
111 19940117 0.018628417
111 19940118 0.065630229
111 19940119 0.018713152
111 19940120 0.019119037
111 19940121 0.068342043
111 19940124 0.020843244
111 19940125 0.019954211
111 19940126 0.018980321
111 19940127 0.066827165
111 19940128 0.067459235
111 19940131 0.068682559
111 19940201 0.02081465
111 19940202 0.068236091
111 19940203 0.068821406
111 19940204 0.020075648
111 19940207 0.066070584
111 19940208 0.066068837
111 19940209 0.019077072
111 19940210 0.065894875
111 19940211 0.018847478
111 19940214 0.065040844
111 19940215 0.01880332
111 19940216 0.018836199
111 19940217 0.066888865
111 19940218 0.067116793
111 19940221 0.068809742
111 19940222 0.068230213
111 19940223 0.069502855
111 19940224 0.070383523
111 19940225 0.020430811
111 19940228 0.067087257
111 19940301 0.066776479
111 19940302 0.019959031
111 19940303 0.066596469
111 19940304 0.019131334
111 19940307 0.019312528
111 19940308 0.067349909
111 19940309 0.068916431
111 19940310 0.068620043
111 19940311 0.070494844
111 19940314 0.071056842
111 19940315 0.071042517
111 19940316 0.072401771
111 19940317 0.071940001
111 19940318 0.07352884
111 19940321 0.072671688
111 19940322 0.072652595
111 19940323 0.021352138
111 19940324 0.069933727
111 19940325 0.068717467
111 19940328 0.020470748
111 19940329 0.020003748
111 19940330 0.065833717
111 19940331 0.065268388
111 19940401 0.018762356
111 19940404 0.064914179
111 19940405 0.064706743
111 19940406 0.018764175
111 19940407 0.06524806
111 19940408 0.018593449
111 19940411 0.064913949
111 19940412 0.01872089
111 19940413 0.018729328
111 19940414 0.018978773
111 19940415 0.065477137
111 19940418 0.064614365
111 19940419 0.064184148
111 19940420 0.018553192
111 19940421 0.066872771
111 19940422 0.06680782
111 19940425 0.067467961
111 19940426 0.02014297
111 19940427 0.062464016
111 19940428 0.062357052
112 19940429 0.000233993
112 19940103 0.000815264
112 19940104 0.000238165
112 19940105 0.000813632
112 19940106 0.000236915
112 19940107 0.000809102
112 19940110 0.000801642
112 19940111 0.000797932
112 19940112 0.000795251
112 19940113 0.000795186
112 19940114 0.000231359
112 19940117 0.000232134
112 19940118 0.000233718
112 19940119 0.000233993
112 19940120 0.000234694
112 19940121 0.000235753
112 19940124 0.000808653
112 19940125 0.000235604
112 19940126 0.000805068
112 19940127 0.000802337
112 19940128 0.000801768
112 19940131 0.000233517
112 19940201 0.000797431
112 19940202 0.000233338
112 19940203 0.000233826
112 19940204 0.000799519
112 19940207 0.000798105
112 19940208 0.000792245
112 19940209 0.000231113
112 19940210 0.000233413
112 19940211 0.000798168
112 19940214 0.000233282
112 19940215 0.000797848
112 19940216 0.000785165
112 19940217 0.000228426
112 19940218 0.000786783
112 19940221 0.00078343
112 19940222 0.000781459
112 19940223 0.000776264
112 19940224 0.000226399
112 19940225 0.000779066
112 19940228 0.000773603
112 19940301 0.000226487
112 19940302 0.000775233
112 19940303 0.000227017
112 19940304 0.000227854
112 19940307 0.000782814
112 19940308 0.000229164
112 19940309 0.000787033
112 19940310 0.000784049
112 19940311 0.000228984
112 19940314 0.00078697
112 19940315 0.000782567
112 19940316 0.000228516
112 19940317 0.000786347
112 19940318 0.000229236
112 19940321 0.000230107
112 19940322 0.000792689
112 19940323 0.000787284
112 19940324 0.000787221
112 19940325 0.000227978
112 19940328 0.000228713
112 19940329 0.000228894
112 19940330 0.000229255
112 19940331 0.000231003
112 19940401 0.000796567
112 19940404 0.000790668
112 19940405 0.00078195
112 19940406 0.000780475
112 19940407 0.000228355
112 19940408 0.000781723
112 19940411 0.000775741
112 19940412 0.000226647
112 19940413 0.000778876
112 19940414 0.000777336
112 19940415 0.000775253
112 19940418 0.000226362
112 19940419 0.000779554
112 19940420 0.000774824
112 19940421 0.000225582
112 19940422 0.000225724
112 19940425 0.000773361
112 19940426 0.0002256
112 19940427 0.000776416
113 19940428 0.000280542
113 19940429 0.000964148
113 19940103 0.000962654
113 19940104 0.000281768
113 19940105 0.000962219
113 19940106 0.000961965
113 19940107 0.000958602
113 19940110 0.000280056
113 19940111 0.000956348
113 19940112 0.000952171
113 19940113 0.000948176
113 19940114 0.000275607
113 19940117 0.000275773
113 19940118 0.000276738
113 19940119 0.000947068
113 19940120 0.000940959
113 19940121 0.000275224
113 19940124 0.000948489
113 19940125 0.000940076
113 19940126 0.0009309
113 19940127 0.000269955
113 19940128 0.000270328
113 19940131 0.000924234
113 19940201 0.000924038
113 19940202 0.000269088
113 19940203 0.000270247
113 19940204 0.000270562
113 19940207 0.00092656
113 19940208 0.000921819
113 19940209 0.000920361
113 19940210 0.000268958
113 19940211 0.000924758
113 19940214 0.000266768
113 19940215 0.000911325
113 19940216 0.000909294
113 19940217 0.000905887
113 19940218 0.000262919
113 19940221 0.000262978
113 19940222 0.000263189
113 19940223 0.000904439
113 19940224 0.000263512
113 19940225 0.000906184
113 19940228 0.000265198
113 19940301 0.000906126
113 19940302 0.000264357
113 19940303 0.000265392
113 19940304 0.000912495
113 19940307 0.000910641
113 19940308 0.000266143
113 19940309 0.000910113
113 19940310 0.000909277
113 19940311 0.000905056
113 19940314 0.00090285
113 19940315 0.000898831
113 19940316 0.000896118
113 19940317 0.000261294
113 19940318 0.000892563
113 19940321 0.000890852
113 19940322 0.00088639
113 19940323 0.000258509
113 19940324 0.000260286
113 19940325 0.000889354
113 19940328 0.000888373
113 19940329 0.000885049
113 19940330 0.000259116
113 19940331 0.000259474
113 19940401 0.000260316
113 19940404 0.000897493
113 19940405 0.000894592
113 19940406 0.000260435
113 19940407 0.000260989
113 19940408 0.000262061
113 19940411 0.000262262
113 19940412 0.000263604
113 19940413 0.000908682
113 19940414 0.000265348
113 19940415 0.000265637
113 19940418 0.00026862
113 19940419 0.000918882
113 19940420 0.000909904
113 19940421 0.000901725
113 19940422 0.000900062
113 19940425 0.000893547
113 19940426 0.000260899
Here, the 1st column is the name of the countries panel which I identified
in number, 2nd is the date column arranged in yyyy-mm-dd format, and 3rd is
the daily variable for each country for 4 months of 1994. I need to take
the monthly average of spread variable for each country (i.e. noted as 111,
112, and 113) in the above example. In short, my monthly spread variable
should be sorted on the basis of both PERMNO (i.e. country identifier), and
months of the particular year.

Please educate me that in this data format, how the average code can be
written?

Thank you very much.



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
10/20/19,
05:31:23 PM

On Mon, Sep 16, 2019 at 4:31 PM PIKAL Petr <[hidden email]> wrote:

> Original email did not come through (some problems with formating).
>
> Hi
>
> No, on contrary. I **am** suggesting to change date column to real date
> asi it is easy to handle with appropriate functions.
>
> Here are some fake data
>
> > str(spdat)
> 'data.frame':   260 obs. of  3 variables:
> $ dates   : Date, format: "1995-01-01" "1995-01-02" "1995-01-03"
> "1995-01-04" ...
> $ coutryA : num  0.188 0.405 -0.107 -0.596 -0.529 ...
> $ countryB: num  9.4 10.76 11.24 8.26 10.71 ..
>
> > head(spdat)
>        dates    coutryA  countryB
> 1 1995-01-01  0.1875060  9.402851
> 2 1995-01-02  0.4045193 10.755112
> 3 1995-01-03 -0.1073904 11.243663
> 4 1995-01-04 -0.5959683  8.256424
> 5 1995-01-05 -0.5293772 10.705431
> 6 1995-01-06 -0.2228029 10.171461
>
> First I melt it
> spdat.m <- melt(spdat, id.var="dates")
>
> > head(spdat.m)
>        dates variable      value
> 1 1995-01-01  coutryA  0.1875060
> 2 1995-01-02  coutryA  0.4045193
> 3 1995-01-03  coutryA -0.1073904
> 4 1995-01-04  coutryA -0.5959683
> 5 1995-01-05  coutryA -0.5293772
> 6 1995-01-06  coutryA -0.2228029
>
> I do aggregation
>
> > spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates,
> "%m.%Y"), spdat.m$variable), mean)
>
> And now I use dcast  to get required result.
>
> > dcast(spdat.ag, Group.1~Group.2)
> Using x as value column: use value.var to override.
>    Group.1      coutryA  countryB
> 1  01.1995  0.098688137 10.177696
> 2  02.1995  0.352264682  9.609261
> 3  03.1995  0.155521876 10.043503
> 4  04.1995 -0.166092393 10.129844
> 5  05.1995  0.164665188 10.308275
> 6  06.1995  0.260633585 10.210129
> 7  07.1995  0.003671979 10.549016
> 8  08.1995  0.045295990 10.087435
> 9  09.1995 -0.145488206  9.689876
> 10 10.1995 -0.225645950  9.743744
> 11 11.1995  0.030273383 10.025435
> 12 12.1995  0.043557468 10.105626
>
> Cheers
> Petr
>
> Here are the data.
>
> > dput(spdat)
> spdat  <- structure(list(dates = structure(c(9131, 9132, 9133, 9134, 9135,
> 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
> 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
> 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
> 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
> 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
> 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
> 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
> 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
> 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
> 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
> 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
> 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
> 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
> 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
> 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
> 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
> 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
> 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
> 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
> 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
> 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
> 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
> 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
> 9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
> 0.404519257417805, -0.107390371811605, -0.595968278805544,
> -0.529377240936012,
> -0.222802921207767, 0.413182392872818, 0.689673026532298,
> -1.2768723266992,
> -0.506308625809406, 0.113859233745174, -0.0963423819877653,
> 0.323987304768398,
> 1.63846917270538, 0.893233423250338, 0.297732439150487, 0.949323101836486,
> -0.599518074708052, 0.366372319197032, -2.25734971953878,
> -0.190971733204918,
> -0.0874143568874351, 1.46699645184047, 0.00702170238687361,
> 0.11221346278474,
> -0.8060359607624, 0.340842350476532, 0.798838328074708, 0.449214745851041,
> -0.664972890558734, 0.521830282184173, -1.35020467264521,
> -0.95240631225826,
> 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207,
> 1.48489932847779, 0.529222943794807, 0.0995675049147771,
> 0.477770516727839,
> 1.64567253670186, -0.0212651530684566, 0.558952796713992,
> 0.0409979382929057,
> 0.428675380654606, 0.0919422583362682, -0.819694497340459,
> 1.23998830450888,
> 0.607498144489643, -1.27724580163097, 1.41634774644371,
> -0.579094515769707,
> 2.02039606694223, 0.0740478208705996, -1.69826944583929,
> -0.321482399813063,
> -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973,
> 1.31264724137396, -0.0473627194710677, 0.141362267796145,
> 0.329709761206515,
> 0.518454586458572, -1.39489985851779, -0.388303591187678,
> -0.668922704543522,
> 0.0735115674875065, 1.30737242978235, 0.198503397980751,
> 0.257831448122427,
> -1.31173539205588, -1.45147941969116, 0.359725782295977,
> 0.612882118056585,
> -0.0733768753346202, -0.508349204402508, 1.35776663767231,
> 0.997807735669086,
> -1.41717534266382, -0.894170593324238, -0.68578120845151,
> -0.211509378018794,
> 0.436738904337909, -1.46932152770435, 0.0817388759874159,
> -0.0389350881653141,
> 0.709198476466861, -0.963669144724435, -0.548607422521798,
> -0.896886885575286,
> 0.322231150840934, 1.37327611339939, 0.0310213133870952,
> 0.796577750757324,
> -0.2010067423637, -0.241723752424226, 1.37547329580654, -1.15382202538982,
> 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
> -0.536393730924719, -0.45845011727266, 1.10226256157127,
> -0.385596991265563,
> 3.20218061566932, -1.25865250042183, -0.13613128784276, 0.483329357746514,
> -0.597187329618306, 0.710977603908319, -1.07945708269043,
> -0.477626236401394,
> 1.51034914684104, 2.35886426985999, -0.0250526828683629,
> -0.29439443478131,
> 0.665774016744828, 0.464027472251246, 0.226658374792016,
> -0.802597030454373,
> 0.825517059805602, -1.11293193130819, -1.27677400513873, 1.60776237113347,
> 1.12490009531342, 0.95767047134623, 0.0475745549797055,
> -0.0591587460876868,
> -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106,
> 0.764367674339969, 1.49261525602638, 0.549570728337346, -1.29658399741794,
> -1.6289903797869, 0.00573336252135834, 0.0300702149640632,
> 0.440810830115721,
> 0.663568666361326, -0.126685900835146, -0.00221628368438927,
> 0.815321995886579, -0.499280888368945, -0.271814047751667,
> -0.071025546459042,
> 1.73165491816826, -0.0294770299043331, 0.833605607221529,
> -0.670108794857159,
> -0.303323318026829, 1.29039844459134, -0.818806702120603,
> -0.445515595649677,
> -0.0128796557666887, 0.320923705586147, 0.230597275812536,
> -1.54009153212366,
> -0.294702981688559, 0.581209734391958, 0.121384768986639,
> 0.502914098451111,
> -1.59018268505718, -0.635101104166451, 1.48005776676403,
> -0.25631761189957,
> 0.171947814411552, 0.444646195980014, 0.172655758440111,
> -0.00432159794094836,
> -0.549321974240026, 0.585055026451421, -1.22813371480849,
> 0.846807540195381,
> 0.319629441352597, 0.393525732059709, -1.40275675444594, 1.11062585584811,
> 0.214809571213853, -0.636432711800391, -0.283087127251573,
> -1.46385553207618,
> 0.436928676930225, -1.34231945433777, 0.451281957595763,
> -0.523155001924496,
> -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
> 0.346147428691405, -0.464527560160041, 0.337233933370495,
> 1.11331396366389,
> -1.00060600083316, -0.734784444487169, 1.40476315358621, 1.01671092179193,
> -0.0144306250829694, -0.923555930346906, -1.02275966525015,
> 0.619422010219383,
> 0.603484309754755, -0.774553813657576, 0.0932792545556387,
> -0.651884521428279,
> -0.61965612647073, -1.22104834441579, -1.31439612639271,
> -2.87707752518163,
> -0.0343801084491906, -0.640678302378492, -1.38653452986558,
> 0.884963139028743,
> -0.657454283462004, 0.462842665244993, -0.20881674837534, 0.6345884135548,
> 0.707165108434729, -0.162090928425892, -0.998662309785188,
> 1.3130254639318,
> 0.191890764940071, -0.0493619237876962, -0.55183232511689,
> 0.470263932874487,
> -0.217088645692971, 0.231550037620628, -0.530406537266415,
> -0.616522469083808,
> 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
> 0.766584887163714, -0.259803384094296, -0.402463714097741,
> -0.0229799209735185,
> -0.259677990559218, -1.41529707261105, 0.191362852138627,
> 1.54483266684747,
> -1.17947655378489, -0.426265411073274, 0.723010460481118,
> 1.37405142869537,
> -0.374771207936141, 0.0513905365832423, -0.369432731236118,
> -0.945441984794364,
> 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083
> ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
> 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
> 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
> 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
> 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
> 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
> 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
> 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
> 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
> 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
> 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
> 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
> 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
> 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
> 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
> 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
> 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
> 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
> 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
> 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
> 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
> 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
> 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
> 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
> 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
> 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
> 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
> 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
> 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
> 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
> 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
> 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
> 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
> 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
> 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
> 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
> 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
> 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
> 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
> 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
> 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
> 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
> 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
> 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
> 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
> 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
> 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
> 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
> 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
> 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
> 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
> 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
> 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
> 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
> 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
> 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
> 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
> 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
> 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
> 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
> 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
> 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
> 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
> 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
> 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
> 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame")
>
>
>
>
>
> From: Subhamitra Patra <mailto:[hidden email]>
> Sent: Friday, September 13, 2019 3:59 PM
> To: PIKAL Petr <mailto:[hidden email]>; r-help mailing list
> <mailto:[hidden email]>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear PIKAL,
>
> Thank you very much for your suggestion.
>
> I tried your previous suggested code and getting the average value for
> each month for both country A, and B. But in your recent email, you are
> suggesting not to change the date column to real date. If I am going
> through your recently suggested code, i.e.
>
>  "aggregate(value column, list(format(date column, "%m.%Y"), country
> column), mean)"
>
> I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"),
> country), mean) : object 'value' not found".
>
> Here, my query "may I need to define the date column, country column, and
> value column separately?"
>
> Further, I need something the average value result like below in the data
> frame
>
> Month       Country A   Country B
> Jan 1994    26.66         35.78
> Feb 1994    26.13         29.14
>
> so that it will be easy for me to export to excel, and to use for the
> further calculations.
>
> Please suggest me in this regard.
>
> Thank you.
>
>
>
>
>
>
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> Sender notified by
>
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> 09/13/19, 07:22:53 PM
>
>
>
> On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:[hidden email]>
> wrote:
> Hi
>
> I am almost 100% sure that you would spare yourself much trouble if you
> changed your date column to real date
>
> ?as.Date
>
> reshape your wide format to long one
> library(reshape2)
> ?melt
>
> to get 3 column data.frame with one date column, one country column and
> one value column
>
> use ?aggregate and ?format to get summary value
>
> something like
> aggregate(value column, list(format(date column, "%m.%Y"), country
> column), mean)
>
> But if you insist to scratch your left ear with right hand accross your
> head, you could continue your way.
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <mailto:[hidden email]> On Behalf Of
> Subhamitra
> > Patra
> > Sent: Friday, September 13, 2019 3:20 PM
> > To: Jim Lemon <mailto:[hidden email]>; r-help mailing list
> <r-help@r-
> > http://project.org>
> > Subject: Re: [R] Query about calculating the monthly average of daily
> data
> > columns
> >
> > Dear Sir,
> >
> > Yes, I understood the logic. But, still, I have a few queries that I
> mentioned
> > below your answers.
> >
> > "# if you only have to get the monthly averages, it can be done this way
> > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > >
> > > B. Here, I need to define the no. of months, and years separately,
> right?
> > > or else what 2, and 3 (in bold) indicates?
> > >
> >
> > To get the grouping variable of sequential months that you want, you only
> > need the month and year values of the dates in the first column. First I
> used
> > the "strsplit" function to split the date field at the hyphens, then used
> > "sapply" to extract ("[") the second (month) and *third (year)* parts as
> two
> > new columns. Because you have more than one year of data, you need the
> > year values or you will group all Januarys, all Februarys and so on.
> > Notice how I pass both of the new columns as a list (a data frame is a
> type of
> > list) in the call to get the mean of each month.
> >
> > 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> > But, you showed an average for 2 months of the same year. Then, what "3"
> > in the  spdat$year object indicate?
> >
> >
> > C. From this part, I got the exact average values of both January and
> > > February of 1994 for country A, and B. But, in code, I have a query
> > > that I need to define  spdat$returnA, and  spdat$returnB separately
> > > before writing this code, right? Like this, I need to define for each
> > > 84 countries separately with their respective number of months, and
> > > years before writing this code, right?
> > >
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84 countries.
> I
> > don't know what these columns are named, either. Maybe:
> >
> > date             Australia   Belarus   ...    Zambia
> > 01/01/1994   20             21                 22
> > ...
> >
> > Here, due to my misunderstanding about the code, I was wrong. But, what
> > data structure you guessed, it is absolutely right that for each row, I
> have
> > columns for each of the 84 countries. So, I think, I need to define the
> date
> > column with no. of months, and years once for all the countries.
> > Therefore, I got my answer to the first and third question in the
> previous
> > email (what you suggested) that I no need to define the column of each
> > country, as the date, and no. of observations are same for all countries.
> > But, the no. of days are different for each month, and similarly, for
> each
> > year. So, I think I need to define date for each year separately.
> Hence, I have
> > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
> have
> > written the following code. Please correct me in case I am wrong.
> >
> >  spdat<-data.frame(
> >
> >
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> > ep(1,21),rep(2,20),
> > rep(3,23), rep(4,21),
> >
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> > ,22)
> > ),rep(1994,260)
> >  dates1=
> >
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> > 2),rep(2,20),
> > rep(3,23), rep(4,20),
> >
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> > ,21)
> > ),rep(1995,259) ,sep="-")
> >
> > Concerning the exporting of structure of the dataset to excel, I will
> have
> > 12*84 matrix. But, please suggest me the way to proceed for the large
> > sample. I have mentioned below what I understood from your code. Please
> > correct me if I am wrong.
> > 1. I need to define the date for each year as the no. of days in each
> month
> > are different for each year (as mentioned in my above code). For
> instance, in
> > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> > 2. Need to define the date column as character.
> > 3. Need to define the monthly average for each month, and year. So, now
> > code will be as follows.
> >
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
> >   %%%%As I need all months average sequentially.
> > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
> >
> > Here, this meaning of "3", I am really unable to get.
> >
> > 4. Need to define each country with each month and year as mentioned in
> > the last part of your code.
> >
> > Please suggest me in this regard.
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> > [image: Mailtrack]
> > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > mpaign=signaturevirality5&>
> > Sender
> > notified by
> > Mailtrack
> > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > mpaign=signaturevirality5&>
> > 09/13/19,
> > 06:41:41 PM
> >
> > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:[hidden email]>
> wrote:
> >
> > > Hi Subhamitra,
> > > I'll try to write my answers adjacent to your questions below.
> > >
> > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > > mailto:[hidden email]> wrote:
> > >
> > >> Dear Sir,
> > >>
> > >> Thank you very much for your suggestion.
> > >>
> > >> Yes, your suggested code worked. But, actually, I have data from 3rd
> > >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> > >> countries) sample. From this, I have given the example of the years
> > >> up to 2000. Before applying the same code for the long 24 years, I
> > >> want to learn the logic behind the code. Actually, some part of the
> > >> code is not understandable to me which I mentioned in the bold letter
> as
> > follows.
> > >>
> > >> "spdat<-data.frame(
> > >>
>  dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> > >>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> > >>
> > >> A. Here, I need to define the no. of days in a month, and the no. of
> > >> countries name separately, right? But, what is meant by 15:50, and
> > >> 10:45 in return A, and B respectively?
> > >>
> > >
> > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > > real values of return are, so I made them up using the "sample"
> function.
> > > However, this is not meant to mislead anyone, just to show how
> > > whatever numbers are in your data can be used in calculations. The
> > > colon (":") operator creates a sequence of numbers starting with the
> > > one to the left and ending with the one to the right.
> > >
> > >>
> > >> "# if you only have to get the monthly averages, it can be done this
> > >> way
> > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > >>
> > >> B. Here, I need to define the no. of months, and years separately,
> right?
> > >> or else what 2, and 3 (in bold) indicates?
> > >>
> > >
> > > To get the grouping variable of sequential months that you want, you
> > > only need the month and year values of the dates in the first column.
> > > First I used the "strsplit" function to split the date field at the
> > > hyphens, then used "sapply" to extract ("[") the second (month) and
> > > third (year) parts as two new columns. Because you have more than one
> > > year of data, you need the year values or you will group all Januarys,
> > > all Februarys and so on. Notice how I pass both of the new columns as
> > > a list (a data frame is a type of
> > > list) in the call to get the mean of each month.
> > >
> > >>
> > >> "# get the averages by month and year - is this correct?
> > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> > >>
> > >> C. From this part, I got the exact average values of both January and
> > >> February of 1994 for country A, and B. But, in code, I have a query
> > >> that I need to define  spdat$returnA, and  spdat$returnB separately
> > >> before writing this code, right? Like this, I need to define for each
> > >> 84 countries separately with their respective number of months, and
> > >> years before writing this code, right?
> > >>
> > >
> > > I don't think so. Because I don't know what your data looks like, I am
> > > guessing that for each row, it has columns for each of the 84
> > > countries. I don't know what these columns are named, either. Maybe:
> > >
> > > date             Australia   Belarus   ...    Zambia
> > > 01/01/1994   20             21                 22
> > > ...
> > >
> > >
> > >> Yes, after obtaining the monthly average for each country's data, I
> > >> need to use them for further calculations. So, I want to export the
> > >> result to excel. But, until understanding the code, I think I willn't
> > >> able to apply for the entire sample, and cannot be able to discuss
> > >> the format of the resulted column to export to excel.
> > >>
> > >
> > > Say that we perform the grouped mean calculation for the first two
> > > country columns like this:
> > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > > monmeans
> > >     Australia  Belarus
> > > [1,]  29.70000 30.43333
> > > [2,]  34.17857 27.39286
> > >
> > > We are presented with a 2x2 matrix of monthly means in just the format
> > > someone might use for importing into Excel. The first row is January
> > > 1994, the second February 1994 and so on. By expanding the columns to
> > > include all the countries in your data, You should have the result you
> want.
> > >
> > > Jim
> > >
> >
> >
> > --
> > *Best Regards,*
> > *Subhamitra Patra*
> > *Phd. Research Scholar*
> > *Department of Humanities and Social Sciences* *Indian Institute of
> > Technology, Kharagpur*
> > *INDIA*
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
>
>
> --
> Best Regards,
> Subhamitra Patra
> Phd. Research Scholar
> Department of Humanities and Social Sciences
> Indian Institute of Technology, Kharagpur
> INDIA
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

jholtman
Does this do what you want:

> library(tidyverse)

> input <- read_delim("PERMNO DATE Spread
+ 111 19940103 0.025464308
+ 111 19940104 0.064424296
+ 111 19940105 0.018579337
+ 111 19940106 0.018872211
 ..." ... [TRUNCATED]

> # drop last two digits to get the month
> monthly <- input %>%
+   group_by(PERMNO, month = DATE %/% 100) %>%
+   summarise(avg = mean(Spread))
> monthly
# A tibble: 12 x 3
# Groups:   PERMNO [3]
   PERMNO  month      avg
    <dbl>  <dbl>    <dbl>
 1    111 199401 0.0416
 2    111 199402 0.0508
 3    111 199403 0.0567
 4    111 199404 0.0466
 5    112 199401 0.000533
 6    112 199402 0.000593
 7    112 199403 0.000471
 8    112 199404 0.000587
 9    113 199401 0.000692
10    113 199402 0.000591
11    113 199403 0.000677
12    113 199404 0.000555
>


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Sun, Oct 20, 2019 at 5:10 AM Subhamitra Patra <[hidden email]>
wrote:

> Dear Sir,
>
> Thank you very much for your suggestions.
>
> Due to certain inconveniences, I was unable to work on your suggestions.
>
> Today I worked on both suggestions and got the result that I really wanted
> that monthly averages for each country.
>
> Here, I am asking one more query (just for learning purpose) that if my
> country name and its respective variable is in the panel format, and I want
> to take the monthly average for each country, how the code will be
> arranged. For your convenience, I am providing a small data sample below.
>
> PERMNO DATE Spread
> 111 19940103 0.025464308
> 111 19940104 0.064424296
> 111 19940105 0.018579337
> 111 19940106 0.018872211
> 111 19940107 0.065279782
> 111 19940110 0.063485905
> 111 19940111 0.018355453
> 111 19940112 0.064135683
> 111 19940113 0.063519987
> 111 19940114 0.018277351
> 111 19940117 0.018628417
> 111 19940118 0.065630229
> 111 19940119 0.018713152
> 111 19940120 0.019119037
> 111 19940121 0.068342043
> 111 19940124 0.020843244
> 111 19940125 0.019954211
> 111 19940126 0.018980321
> 111 19940127 0.066827165
> 111 19940128 0.067459235
> 111 19940131 0.068682559
> 111 19940201 0.02081465
> 111 19940202 0.068236091
> 111 19940203 0.068821406
> 111 19940204 0.020075648
> 111 19940207 0.066070584
> 111 19940208 0.066068837
> 111 19940209 0.019077072
> 111 19940210 0.065894875
> 111 19940211 0.018847478
> 111 19940214 0.065040844
> 111 19940215 0.01880332
> 111 19940216 0.018836199
> 111 19940217 0.066888865
> 111 19940218 0.067116793
> 111 19940221 0.068809742
> 111 19940222 0.068230213
> 111 19940223 0.069502855
> 111 19940224 0.070383523
> 111 19940225 0.020430811
> 111 19940228 0.067087257
> 111 19940301 0.066776479
> 111 19940302 0.019959031
> 111 19940303 0.066596469
> 111 19940304 0.019131334
> 111 19940307 0.019312528
> 111 19940308 0.067349909
> 111 19940309 0.068916431
> 111 19940310 0.068620043
> 111 19940311 0.070494844
> 111 19940314 0.071056842
> 111 19940315 0.071042517
> 111 19940316 0.072401771
> 111 19940317 0.071940001
> 111 19940318 0.07352884
> 111 19940321 0.072671688
> 111 19940322 0.072652595
> 111 19940323 0.021352138
> 111 19940324 0.069933727
> 111 19940325 0.068717467
> 111 19940328 0.020470748
> 111 19940329 0.020003748
> 111 19940330 0.065833717
> 111 19940331 0.065268388
> 111 19940401 0.018762356
> 111 19940404 0.064914179
> 111 19940405 0.064706743
> 111 19940406 0.018764175
> 111 19940407 0.06524806
> 111 19940408 0.018593449
> 111 19940411 0.064913949
> 111 19940412 0.01872089
> 111 19940413 0.018729328
> 111 19940414 0.018978773
> 111 19940415 0.065477137
> 111 19940418 0.064614365
> 111 19940419 0.064184148
> 111 19940420 0.018553192
> 111 19940421 0.066872771
> 111 19940422 0.06680782
> 111 19940425 0.067467961
> 111 19940426 0.02014297
> 111 19940427 0.062464016
> 111 19940428 0.062357052
> 112 19940429 0.000233993
> 112 19940103 0.000815264
> 112 19940104 0.000238165
> 112 19940105 0.000813632
> 112 19940106 0.000236915
> 112 19940107 0.000809102
> 112 19940110 0.000801642
> 112 19940111 0.000797932
> 112 19940112 0.000795251
> 112 19940113 0.000795186
> 112 19940114 0.000231359
> 112 19940117 0.000232134
> 112 19940118 0.000233718
> 112 19940119 0.000233993
> 112 19940120 0.000234694
> 112 19940121 0.000235753
> 112 19940124 0.000808653
> 112 19940125 0.000235604
> 112 19940126 0.000805068
> 112 19940127 0.000802337
> 112 19940128 0.000801768
> 112 19940131 0.000233517
> 112 19940201 0.000797431
> 112 19940202 0.000233338
> 112 19940203 0.000233826
> 112 19940204 0.000799519
> 112 19940207 0.000798105
> 112 19940208 0.000792245
> 112 19940209 0.000231113
> 112 19940210 0.000233413
> 112 19940211 0.000798168
> 112 19940214 0.000233282
> 112 19940215 0.000797848
> 112 19940216 0.000785165
> 112 19940217 0.000228426
> 112 19940218 0.000786783
> 112 19940221 0.00078343
> 112 19940222 0.000781459
> 112 19940223 0.000776264
> 112 19940224 0.000226399
> 112 19940225 0.000779066
> 112 19940228 0.000773603
> 112 19940301 0.000226487
> 112 19940302 0.000775233
> 112 19940303 0.000227017
> 112 19940304 0.000227854
> 112 19940307 0.000782814
> 112 19940308 0.000229164
> 112 19940309 0.000787033
> 112 19940310 0.000784049
> 112 19940311 0.000228984
> 112 19940314 0.00078697
> 112 19940315 0.000782567
> 112 19940316 0.000228516
> 112 19940317 0.000786347
> 112 19940318 0.000229236
> 112 19940321 0.000230107
> 112 19940322 0.000792689
> 112 19940323 0.000787284
> 112 19940324 0.000787221
> 112 19940325 0.000227978
> 112 19940328 0.000228713
> 112 19940329 0.000228894
> 112 19940330 0.000229255
> 112 19940331 0.000231003
> 112 19940401 0.000796567
> 112 19940404 0.000790668
> 112 19940405 0.00078195
> 112 19940406 0.000780475
> 112 19940407 0.000228355
> 112 19940408 0.000781723
> 112 19940411 0.000775741
> 112 19940412 0.000226647
> 112 19940413 0.000778876
> 112 19940414 0.000777336
> 112 19940415 0.000775253
> 112 19940418 0.000226362
> 112 19940419 0.000779554
> 112 19940420 0.000774824
> 112 19940421 0.000225582
> 112 19940422 0.000225724
> 112 19940425 0.000773361
> 112 19940426 0.0002256
> 112 19940427 0.000776416
> 113 19940428 0.000280542
> 113 19940429 0.000964148
> 113 19940103 0.000962654
> 113 19940104 0.000281768
> 113 19940105 0.000962219
> 113 19940106 0.000961965
> 113 19940107 0.000958602
> 113 19940110 0.000280056
> 113 19940111 0.000956348
> 113 19940112 0.000952171
> 113 19940113 0.000948176
> 113 19940114 0.000275607
> 113 19940117 0.000275773
> 113 19940118 0.000276738
> 113 19940119 0.000947068
> 113 19940120 0.000940959
> 113 19940121 0.000275224
> 113 19940124 0.000948489
> 113 19940125 0.000940076
> 113 19940126 0.0009309
> 113 19940127 0.000269955
> 113 19940128 0.000270328
> 113 19940131 0.000924234
> 113 19940201 0.000924038
> 113 19940202 0.000269088
> 113 19940203 0.000270247
> 113 19940204 0.000270562
> 113 19940207 0.00092656
> 113 19940208 0.000921819
> 113 19940209 0.000920361
> 113 19940210 0.000268958
> 113 19940211 0.000924758
> 113 19940214 0.000266768
> 113 19940215 0.000911325
> 113 19940216 0.000909294
> 113 19940217 0.000905887
> 113 19940218 0.000262919
> 113 19940221 0.000262978
> 113 19940222 0.000263189
> 113 19940223 0.000904439
> 113 19940224 0.000263512
> 113 19940225 0.000906184
> 113 19940228 0.000265198
> 113 19940301 0.000906126
> 113 19940302 0.000264357
> 113 19940303 0.000265392
> 113 19940304 0.000912495
> 113 19940307 0.000910641
> 113 19940308 0.000266143
> 113 19940309 0.000910113
> 113 19940310 0.000909277
> 113 19940311 0.000905056
> 113 19940314 0.00090285
> 113 19940315 0.000898831
> 113 19940316 0.000896118
> 113 19940317 0.000261294
> 113 19940318 0.000892563
> 113 19940321 0.000890852
> 113 19940322 0.00088639
> 113 19940323 0.000258509
> 113 19940324 0.000260286
> 113 19940325 0.000889354
> 113 19940328 0.000888373
> 113 19940329 0.000885049
> 113 19940330 0.000259116
> 113 19940331 0.000259474
> 113 19940401 0.000260316
> 113 19940404 0.000897493
> 113 19940405 0.000894592
> 113 19940406 0.000260435
> 113 19940407 0.000260989
> 113 19940408 0.000262061
> 113 19940411 0.000262262
> 113 19940412 0.000263604
> 113 19940413 0.000908682
> 113 19940414 0.000265348
> 113 19940415 0.000265637
> 113 19940418 0.00026862
> 113 19940419 0.000918882
> 113 19940420 0.000909904
> 113 19940421 0.000901725
> 113 19940422 0.000900062
> 113 19940425 0.000893547
> 113 19940426 0.000260899
> Here, the 1st column is the name of the countries panel which I identified
> in number, 2nd is the date column arranged in yyyy-mm-dd format, and 3rd is
> the daily variable for each country for 4 months of 1994. I need to take
> the monthly average of spread variable for each country (i.e. noted as 111,
> 112, and 113) in the above example. In short, my monthly spread variable
> should be sorted on the basis of both PERMNO (i.e. country identifier), and
> months of the particular year.
>
> Please educate me that in this data format, how the average code can be
> written?
>
> Thank you very much.
>
>
>
> [image: Mailtrack]
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> Sender
> notified by
> Mailtrack
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> 10/20/19,
> 05:31:23 PM
>
> On Mon, Sep 16, 2019 at 4:31 PM PIKAL Petr <[hidden email]> wrote:
>
> > Original email did not come through (some problems with formating).
> >
> > Hi
> >
> > No, on contrary. I **am** suggesting to change date column to real date
> > asi it is easy to handle with appropriate functions.
> >
> > Here are some fake data
> >
> > > str(spdat)
> > 'data.frame':   260 obs. of  3 variables:
> > $ dates   : Date, format: "1995-01-01" "1995-01-02" "1995-01-03"
> > "1995-01-04" ...
> > $ coutryA : num  0.188 0.405 -0.107 -0.596 -0.529 ...
> > $ countryB: num  9.4 10.76 11.24 8.26 10.71 ..
> >
> > > head(spdat)
> >        dates    coutryA  countryB
> > 1 1995-01-01  0.1875060  9.402851
> > 2 1995-01-02  0.4045193 10.755112
> > 3 1995-01-03 -0.1073904 11.243663
> > 4 1995-01-04 -0.5959683  8.256424
> > 5 1995-01-05 -0.5293772 10.705431
> > 6 1995-01-06 -0.2228029 10.171461
> >
> > First I melt it
> > spdat.m <- melt(spdat, id.var="dates")
> >
> > > head(spdat.m)
> >        dates variable      value
> > 1 1995-01-01  coutryA  0.1875060
> > 2 1995-01-02  coutryA  0.4045193
> > 3 1995-01-03  coutryA -0.1073904
> > 4 1995-01-04  coutryA -0.5959683
> > 5 1995-01-05  coutryA -0.5293772
> > 6 1995-01-06  coutryA -0.2228029
> >
> > I do aggregation
> >
> > > spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates,
> > "%m.%Y"), spdat.m$variable), mean)
> >
> > And now I use dcast  to get required result.
> >
> > > dcast(spdat.ag, Group.1~Group.2)
> > Using x as value column: use value.var to override.
> >    Group.1      coutryA  countryB
> > 1  01.1995  0.098688137 10.177696
> > 2  02.1995  0.352264682  9.609261
> > 3  03.1995  0.155521876 10.043503
> > 4  04.1995 -0.166092393 10.129844
> > 5  05.1995  0.164665188 10.308275
> > 6  06.1995  0.260633585 10.210129
> > 7  07.1995  0.003671979 10.549016
> > 8  08.1995  0.045295990 10.087435
> > 9  09.1995 -0.145488206  9.689876
> > 10 10.1995 -0.225645950  9.743744
> > 11 11.1995  0.030273383 10.025435
> > 12 12.1995  0.043557468 10.105626
> >
> > Cheers
> > Petr
> >
> > Here are the data.
> >
> > > dput(spdat)
> > spdat  <- structure(list(dates = structure(c(9131, 9132, 9133, 9134,
> 9135,
> > 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
> > 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
> > 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
> > 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
> > 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
> > 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
> > 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
> > 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
> > 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
> > 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
> > 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
> > 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
> > 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
> > 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
> > 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
> > 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
> > 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
> > 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
> > 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
> > 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
> > 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
> > 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
> > 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
> > 9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
> > 0.404519257417805, -0.107390371811605, -0.595968278805544,
> > -0.529377240936012,
> > -0.222802921207767, 0.413182392872818, 0.689673026532298,
> > -1.2768723266992,
> > -0.506308625809406, 0.113859233745174, -0.0963423819877653,
> > 0.323987304768398,
> > 1.63846917270538, 0.893233423250338, 0.297732439150487,
> 0.949323101836486,
> > -0.599518074708052, 0.366372319197032, -2.25734971953878,
> > -0.190971733204918,
> > -0.0874143568874351, 1.46699645184047, 0.00702170238687361,
> > 0.11221346278474,
> > -0.8060359607624, 0.340842350476532, 0.798838328074708,
> 0.449214745851041,
> > -0.664972890558734, 0.521830282184173, -1.35020467264521,
> > -0.95240631225826,
> > 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207,
> > 1.48489932847779, 0.529222943794807, 0.0995675049147771,
> > 0.477770516727839,
> > 1.64567253670186, -0.0212651530684566, 0.558952796713992,
> > 0.0409979382929057,
> > 0.428675380654606, 0.0919422583362682, -0.819694497340459,
> > 1.23998830450888,
> > 0.607498144489643, -1.27724580163097, 1.41634774644371,
> > -0.579094515769707,
> > 2.02039606694223, 0.0740478208705996, -1.69826944583929,
> > -0.321482399813063,
> > -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973,
> > 1.31264724137396, -0.0473627194710677, 0.141362267796145,
> > 0.329709761206515,
> > 0.518454586458572, -1.39489985851779, -0.388303591187678,
> > -0.668922704543522,
> > 0.0735115674875065, 1.30737242978235, 0.198503397980751,
> > 0.257831448122427,
> > -1.31173539205588, -1.45147941969116, 0.359725782295977,
> > 0.612882118056585,
> > -0.0733768753346202, -0.508349204402508, 1.35776663767231,
> > 0.997807735669086,
> > -1.41717534266382, -0.894170593324238, -0.68578120845151,
> > -0.211509378018794,
> > 0.436738904337909, -1.46932152770435, 0.0817388759874159,
> > -0.0389350881653141,
> > 0.709198476466861, -0.963669144724435, -0.548607422521798,
> > -0.896886885575286,
> > 0.322231150840934, 1.37327611339939, 0.0310213133870952,
> > 0.796577750757324,
> > -0.2010067423637, -0.241723752424226, 1.37547329580654,
> -1.15382202538982,
> > 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
> > -0.536393730924719, -0.45845011727266, 1.10226256157127,
> > -0.385596991265563,
> > 3.20218061566932, -1.25865250042183, -0.13613128784276,
> 0.483329357746514,
> > -0.597187329618306, 0.710977603908319, -1.07945708269043,
> > -0.477626236401394,
> > 1.51034914684104, 2.35886426985999, -0.0250526828683629,
> > -0.29439443478131,
> > 0.665774016744828, 0.464027472251246, 0.226658374792016,
> > -0.802597030454373,
> > 0.825517059805602, -1.11293193130819, -1.27677400513873,
> 1.60776237113347,
> > 1.12490009531342, 0.95767047134623, 0.0475745549797055,
> > -0.0591587460876868,
> > -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106,
> > 0.764367674339969, 1.49261525602638, 0.549570728337346,
> -1.29658399741794,
> > -1.6289903797869, 0.00573336252135834, 0.0300702149640632,
> > 0.440810830115721,
> > 0.663568666361326, -0.126685900835146, -0.00221628368438927,
> > 0.815321995886579, -0.499280888368945, -0.271814047751667,
> > -0.071025546459042,
> > 1.73165491816826, -0.0294770299043331, 0.833605607221529,
> > -0.670108794857159,
> > -0.303323318026829, 1.29039844459134, -0.818806702120603,
> > -0.445515595649677,
> > -0.0128796557666887, 0.320923705586147, 0.230597275812536,
> > -1.54009153212366,
> > -0.294702981688559, 0.581209734391958, 0.121384768986639,
> > 0.502914098451111,
> > -1.59018268505718, -0.635101104166451, 1.48005776676403,
> > -0.25631761189957,
> > 0.171947814411552, 0.444646195980014, 0.172655758440111,
> > -0.00432159794094836,
> > -0.549321974240026, 0.585055026451421, -1.22813371480849,
> > 0.846807540195381,
> > 0.319629441352597, 0.393525732059709, -1.40275675444594,
> 1.11062585584811,
> > 0.214809571213853, -0.636432711800391, -0.283087127251573,
> > -1.46385553207618,
> > 0.436928676930225, -1.34231945433777, 0.451281957595763,
> > -0.523155001924496,
> > -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
> > 0.346147428691405, -0.464527560160041, 0.337233933370495,
> > 1.11331396366389,
> > -1.00060600083316, -0.734784444487169, 1.40476315358621,
> 1.01671092179193,
> > -0.0144306250829694, -0.923555930346906, -1.02275966525015,
> > 0.619422010219383,
> > 0.603484309754755, -0.774553813657576, 0.0932792545556387,
> > -0.651884521428279,
> > -0.61965612647073, -1.22104834441579, -1.31439612639271,
> > -2.87707752518163,
> > -0.0343801084491906, -0.640678302378492, -1.38653452986558,
> > 0.884963139028743,
> > -0.657454283462004, 0.462842665244993, -0.20881674837534,
> 0.6345884135548,
> > 0.707165108434729, -0.162090928425892, -0.998662309785188,
> > 1.3130254639318,
> > 0.191890764940071, -0.0493619237876962, -0.55183232511689,
> > 0.470263932874487,
> > -0.217088645692971, 0.231550037620628, -0.530406537266415,
> > -0.616522469083808,
> > 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
> > 0.766584887163714, -0.259803384094296, -0.402463714097741,
> > -0.0229799209735185,
> > -0.259677990559218, -1.41529707261105, 0.191362852138627,
> > 1.54483266684747,
> > -1.17947655378489, -0.426265411073274, 0.723010460481118,
> > 1.37405142869537,
> > -0.374771207936141, 0.0513905365832423, -0.369432731236118,
> > -0.945441984794364,
> > 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083
> > ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
> > 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
> > 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
> > 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
> > 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
> > 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
> > 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
> > 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
> > 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
> > 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
> > 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
> > 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
> > 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
> > 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
> > 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
> > 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
> > 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
> > 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
> > 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
> > 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
> > 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
> > 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
> > 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
> > 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
> > 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
> > 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
> > 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
> > 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
> > 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
> > 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
> > 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
> > 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
> > 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
> > 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
> > 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
> > 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
> > 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
> > 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
> > 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
> > 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
> > 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
> > 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
> > 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
> > 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
> > 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
> > 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
> > 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
> > 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
> > 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
> > 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
> > 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
> > 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
> > 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
> > 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
> > 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
> > 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
> > 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
> > 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
> > 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
> > 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
> > 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
> > 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
> > 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
> > 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
> > 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
> > 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame")
> >
> >
> >
> >
> >
> > From: Subhamitra Patra <mailto:[hidden email]>
> > Sent: Friday, September 13, 2019 3:59 PM
> > To: PIKAL Petr <mailto:[hidden email]>; r-help mailing list
> > <mailto:[hidden email]>
> > Subject: Re: [R] Query about calculating the monthly average of daily
> data
> > columns
> >
> > Dear PIKAL,
> >
> > Thank you very much for your suggestion.
> >
> > I tried your previous suggested code and getting the average value for
> > each month for both country A, and B. But in your recent email, you are
> > suggesting not to change the date column to real date. If I am going
> > through your recently suggested code, i.e.
> >
> >  "aggregate(value column, list(format(date column, "%m.%Y"), country
> > column), mean)"
> >
> > I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"),
> > country), mean) : object 'value' not found".
> >
> > Here, my query "may I need to define the date column, country column, and
> > value column separately?"
> >
> > Further, I need something the average value result like below in the data
> > frame
> >
> > Month       Country A   Country B
> > Jan 1994    26.66         35.78
> > Feb 1994    26.13         29.14
> >
> > so that it will be easy for me to export to excel, and to use for the
> > further calculations.
> >
> > Please suggest me in this regard.
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> > Sender notified by
> >
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> > 09/13/19, 07:22:53 PM
> >
> >
> >
> > On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:
> [hidden email]>
> > wrote:
> > Hi
> >
> > I am almost 100% sure that you would spare yourself much trouble if you
> > changed your date column to real date
> >
> > ?as.Date
> >
> > reshape your wide format to long one
> > library(reshape2)
> > ?melt
> >
> > to get 3 column data.frame with one date column, one country column and
> > one value column
> >
> > use ?aggregate and ?format to get summary value
> >
> > something like
> > aggregate(value column, list(format(date column, "%m.%Y"), country
> > column), mean)
> >
> > But if you insist to scratch your left ear with right hand accross your
> > head, you could continue your way.
> >
> > Cheers
> > Petr
> >
> > > -----Original Message-----
> > > From: R-help <mailto:[hidden email]> On Behalf Of
> > Subhamitra
> > > Patra
> > > Sent: Friday, September 13, 2019 3:20 PM
> > > To: Jim Lemon <mailto:[hidden email]>; r-help mailing list
> > <r-help@r-
> > > http://project.org>
> > > Subject: Re: [R] Query about calculating the monthly average of daily
> > data
> > > columns
> > >
> > > Dear Sir,
> > >
> > > Yes, I understood the logic. But, still, I have a few queries that I
> > mentioned
> > > below your answers.
> > >
> > > "# if you only have to get the monthly averages, it can be done this
> way
> > > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > > >
> > > > B. Here, I need to define the no. of months, and years separately,
> > right?
> > > > or else what 2, and 3 (in bold) indicates?
> > > >
> > >
> > > To get the grouping variable of sequential months that you want, you
> only
> > > need the month and year values of the dates in the first column. First
> I
> > used
> > > the "strsplit" function to split the date field at the hyphens, then
> used
> > > "sapply" to extract ("[") the second (month) and *third (year)* parts
> as
> > two
> > > new columns. Because you have more than one year of data, you need the
> > > year values or you will group all Januarys, all Februarys and so on.
> > > Notice how I pass both of the new columns as a list (a data frame is a
> > type of
> > > list) in the call to get the mean of each month.
> > >
> > > 1. Here, as per my understanding, the "3" indicates the 3rd year,
> right?
> > > But, you showed an average for 2 months of the same year. Then, what
> "3"
> > > in the  spdat$year object indicate?
> > >
> > >
> > > C. From this part, I got the exact average values of both January and
> > > > February of 1994 for country A, and B. But, in code, I have a query
> > > > that I need to define  spdat$returnA, and  spdat$returnB separately
> > > > before writing this code, right? Like this, I need to define for each
> > > > 84 countries separately with their respective number of months, and
> > > > years before writing this code, right?
> > > >
> > >
> > > I don't think so. Because I don't know what your data looks like, I am
> > > guessing that for each row, it has columns for each of the 84
> countries.
> > I
> > > don't know what these columns are named, either. Maybe:
> > >
> > > date             Australia   Belarus   ...    Zambia
> > > 01/01/1994   20             21                 22
> > > ...
> > >
> > > Here, due to my misunderstanding about the code, I was wrong. But, what
> > > data structure you guessed, it is absolutely right that for each row, I
> > have
> > > columns for each of the 84 countries. So, I think, I need to define the
> > date
> > > column with no. of months, and years once for all the countries.
> > > Therefore, I got my answer to the first and third question in the
> > previous
> > > email (what you suggested) that I no need to define the column of each
> > > country, as the date, and no. of observations are same for all
> countries.
> > > But, the no. of days are different for each month, and similarly, for
> > each
> > > year. So, I think I need to define date for each year separately.
> > Hence, I have
> > > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
> > have
> > > written the following code. Please correct me in case I am wrong.
> > >
> > >  spdat<-data.frame(
> > >
> > >
> >
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> > > ep(1,21),rep(2,20),
> > > rep(3,23), rep(4,21),
> > >
> >
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> > > ,22)
> > > ),rep(1994,260)
> > >  dates1=
> > >
> >
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> > > 2),rep(2,20),
> > > rep(3,23), rep(4,20),
> > >
> >
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> > > ,21)
> > > ),rep(1995,259) ,sep="-")
> > >
> > > Concerning the exporting of structure of the dataset to excel, I will
> > have
> > > 12*84 matrix. But, please suggest me the way to proceed for the large
> > > sample. I have mentioned below what I understood from your code. Please
> > > correct me if I am wrong.
> > > 1. I need to define the date for each year as the no. of days in each
> > month
> > > are different for each year (as mentioned in my above code). For
> > instance, in
> > > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> > > 2. Need to define the date column as character.
> > > 3. Need to define the monthly average for each month, and year. So, now
> > > code will be as follows.
> > >
> >
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
> > >   %%%%As I need all months average sequentially.
> > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
> > >
> > > Here, this meaning of "3", I am really unable to get.
> > >
> > > 4. Need to define each country with each month and year as mentioned in
> > > the last part of your code.
> > >
> > > Please suggest me in this regard.
> > >
> > > Thank you.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > [image: Mailtrack]
> > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > > mpaign=signaturevirality5&>
> > > Sender
> > > notified by
> > > Mailtrack
> > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > > mpaign=signaturevirality5&>
> > > 09/13/19,
> > > 06:41:41 PM
> > >
> > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:[hidden email]
> >
> > wrote:
> > >
> > > > Hi Subhamitra,
> > > > I'll try to write my answers adjacent to your questions below.
> > > >
> > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > > > mailto:[hidden email]> wrote:
> > > >
> > > >> Dear Sir,
> > > >>
> > > >> Thank you very much for your suggestion.
> > > >>
> > > >> Yes, your suggested code worked. But, actually, I have data from 3rd
> > > >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> > > >> countries) sample. From this, I have given the example of the years
> > > >> up to 2000. Before applying the same code for the long 24 years, I
> > > >> want to learn the logic behind the code. Actually, some part of the
> > > >> code is not understandable to me which I mentioned in the bold
> letter
> > as
> > > follows.
> > > >>
> > > >> "spdat<-data.frame(
> > > >>
> >  dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> > > >>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> > > >>
> > > >> A. Here, I need to define the no. of days in a month, and the no. of
> > > >> countries name separately, right? But, what is meant by 15:50, and
> > > >> 10:45 in return A, and B respectively?
> > > >>
> > > >
> > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what
> the
> > > > real values of return are, so I made them up using the "sample"
> > function.
> > > > However, this is not meant to mislead anyone, just to show how
> > > > whatever numbers are in your data can be used in calculations. The
> > > > colon (":") operator creates a sequence of numbers starting with the
> > > > one to the left and ending with the one to the right.
> > > >
> > > >>
> > > >> "# if you only have to get the monthly averages, it can be done this
> > > >> way
> > > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > > >>
> > > >> B. Here, I need to define the no. of months, and years separately,
> > right?
> > > >> or else what 2, and 3 (in bold) indicates?
> > > >>
> > > >
> > > > To get the grouping variable of sequential months that you want, you
> > > > only need the month and year values of the dates in the first column.
> > > > First I used the "strsplit" function to split the date field at the
> > > > hyphens, then used "sapply" to extract ("[") the second (month) and
> > > > third (year) parts as two new columns. Because you have more than one
> > > > year of data, you need the year values or you will group all
> Januarys,
> > > > all Februarys and so on. Notice how I pass both of the new columns as
> > > > a list (a data frame is a type of
> > > > list) in the call to get the mean of each month.
> > > >
> > > >>
> > > >> "# get the averages by month and year - is this correct?
> > > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> > > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> > > >>
> > > >> C. From this part, I got the exact average values of both January
> and
> > > >> February of 1994 for country A, and B. But, in code, I have a query
> > > >> that I need to define  spdat$returnA, and  spdat$returnB separately
> > > >> before writing this code, right? Like this, I need to define for
> each
> > > >> 84 countries separately with their respective number of months, and
> > > >> years before writing this code, right?
> > > >>
> > > >
> > > > I don't think so. Because I don't know what your data looks like, I
> am
> > > > guessing that for each row, it has columns for each of the 84
> > > > countries. I don't know what these columns are named, either. Maybe:
> > > >
> > > > date             Australia   Belarus   ...    Zambia
> > > > 01/01/1994   20             21                 22
> > > > ...
> > > >
> > > >
> > > >> Yes, after obtaining the monthly average for each country's data, I
> > > >> need to use them for further calculations. So, I want to export the
> > > >> result to excel. But, until understanding the code, I think I
> willn't
> > > >> able to apply for the entire sample, and cannot be able to discuss
> > > >> the format of the resulted column to export to excel.
> > > >>
> > > >
> > > > Say that we perform the grouped mean calculation for the first two
> > > > country columns like this:
> > > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > > > monmeans
> > > >     Australia  Belarus
> > > > [1,]  29.70000 30.43333
> > > > [2,]  34.17857 27.39286
> > > >
> > > > We are presented with a 2x2 matrix of monthly means in just the
> format
> > > > someone might use for importing into Excel. The first row is January
> > > > 1994, the second February 1994 and so on. By expanding the columns to
> > > > include all the countries in your data, You should have the result
> you
> > want.
> > > >
> > > > Jim
> > > >
> > >
> > >
> > > --
> > > *Best Regards,*
> > > *Subhamitra Patra*
> > > *Phd. Research Scholar*
> > > *Department of Humanities and Social Sciences* *Indian Institute of
> > > Technology, Kharagpur*
> > > *INDIA*
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more,
> see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> > partnerů PRECHEZA a.s. jsou zveřejněny na:
> > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> > about processing and protection of business partner’s personal data are
> > available on website:
> > https://www.precheza.cz/en/personal-data-protection-principles/
> > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> > documents attached to it may be confidential and are subject to the
> legally
> > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >
> >
> >
> > --
> > Best Regards,
> > Subhamitra Patra
> > Phd. Research Scholar
> > Department of Humanities and Social Sciences
> > Indian Institute of Technology, Kharagpur
> > INDIA
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Rui Barradas
Hello,

Here are two other ways using aggregate.
The difference is in the way to create a MONTH grouping column.
The second way is base R only.


df1$MONTH <- zoo::as.yearmon(as.Date(as.character(df1$DATE), '%Y%m%d'))
aggregate(Spread ~ PERMNO + MONTH, df1, mean)

df1$MONTH <- df1$DATE %/% 100
aggregate(Spread ~ PERMNO + MONTH, df1, mean)


Hope this helps,

Rui Barradas


Às 17:17 de 20/10/19, jim holtman escreveu:

> Does this do what you want:
>
>> library(tidyverse)
>
>> input <- read_delim("PERMNO DATE Spread
> + 111 19940103 0.025464308
> + 111 19940104 0.064424296
> + 111 19940105 0.018579337
> + 111 19940106 0.018872211
>   ..." ... [TRUNCATED]
>
>> # drop last two digits to get the month
>> monthly <- input %>%
> +   group_by(PERMNO, month = DATE %/% 100) %>%
> +   summarise(avg = mean(Spread))
>> monthly
> # A tibble: 12 x 3
> # Groups:   PERMNO [3]
>     PERMNO  month      avg
>      <dbl>  <dbl>    <dbl>
>   1    111 199401 0.0416
>   2    111 199402 0.0508
>   3    111 199403 0.0567
>   4    111 199404 0.0466
>   5    112 199401 0.000533
>   6    112 199402 0.000593
>   7    112 199403 0.000471
>   8    112 199404 0.000587
>   9    113 199401 0.000692
> 10    113 199402 0.000591
> 11    113 199403 0.000677
> 12    113 199404 0.000555
>>
>
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Sun, Oct 20, 2019 at 5:10 AM Subhamitra Patra <[hidden email]>
> wrote:
>
>> Dear Sir,
>>
>> Thank you very much for your suggestions.
>>
>> Due to certain inconveniences, I was unable to work on your suggestions.
>>
>> Today I worked on both suggestions and got the result that I really wanted
>> that monthly averages for each country.
>>
>> Here, I am asking one more query (just for learning purpose) that if my
>> country name and its respective variable is in the panel format, and I want
>> to take the monthly average for each country, how the code will be
>> arranged. For your convenience, I am providing a small data sample below.
>>
>> PERMNO DATE Spread
>> 111 19940103 0.025464308
>> 111 19940104 0.064424296
>> 111 19940105 0.018579337
>> 111 19940106 0.018872211
>> 111 19940107 0.065279782
>> 111 19940110 0.063485905
>> 111 19940111 0.018355453
>> 111 19940112 0.064135683
>> 111 19940113 0.063519987
>> 111 19940114 0.018277351
>> 111 19940117 0.018628417
>> 111 19940118 0.065630229
>> 111 19940119 0.018713152
>> 111 19940120 0.019119037
>> 111 19940121 0.068342043
>> 111 19940124 0.020843244
>> 111 19940125 0.019954211
>> 111 19940126 0.018980321
>> 111 19940127 0.066827165
>> 111 19940128 0.067459235
>> 111 19940131 0.068682559
>> 111 19940201 0.02081465
>> 111 19940202 0.068236091
>> 111 19940203 0.068821406
>> 111 19940204 0.020075648
>> 111 19940207 0.066070584
>> 111 19940208 0.066068837
>> 111 19940209 0.019077072
>> 111 19940210 0.065894875
>> 111 19940211 0.018847478
>> 111 19940214 0.065040844
>> 111 19940215 0.01880332
>> 111 19940216 0.018836199
>> 111 19940217 0.066888865
>> 111 19940218 0.067116793
>> 111 19940221 0.068809742
>> 111 19940222 0.068230213
>> 111 19940223 0.069502855
>> 111 19940224 0.070383523
>> 111 19940225 0.020430811
>> 111 19940228 0.067087257
>> 111 19940301 0.066776479
>> 111 19940302 0.019959031
>> 111 19940303 0.066596469
>> 111 19940304 0.019131334
>> 111 19940307 0.019312528
>> 111 19940308 0.067349909
>> 111 19940309 0.068916431
>> 111 19940310 0.068620043
>> 111 19940311 0.070494844
>> 111 19940314 0.071056842
>> 111 19940315 0.071042517
>> 111 19940316 0.072401771
>> 111 19940317 0.071940001
>> 111 19940318 0.07352884
>> 111 19940321 0.072671688
>> 111 19940322 0.072652595
>> 111 19940323 0.021352138
>> 111 19940324 0.069933727
>> 111 19940325 0.068717467
>> 111 19940328 0.020470748
>> 111 19940329 0.020003748
>> 111 19940330 0.065833717
>> 111 19940331 0.065268388
>> 111 19940401 0.018762356
>> 111 19940404 0.064914179
>> 111 19940405 0.064706743
>> 111 19940406 0.018764175
>> 111 19940407 0.06524806
>> 111 19940408 0.018593449
>> 111 19940411 0.064913949
>> 111 19940412 0.01872089
>> 111 19940413 0.018729328
>> 111 19940414 0.018978773
>> 111 19940415 0.065477137
>> 111 19940418 0.064614365
>> 111 19940419 0.064184148
>> 111 19940420 0.018553192
>> 111 19940421 0.066872771
>> 111 19940422 0.06680782
>> 111 19940425 0.067467961
>> 111 19940426 0.02014297
>> 111 19940427 0.062464016
>> 111 19940428 0.062357052
>> 112 19940429 0.000233993
>> 112 19940103 0.000815264
>> 112 19940104 0.000238165
>> 112 19940105 0.000813632
>> 112 19940106 0.000236915
>> 112 19940107 0.000809102
>> 112 19940110 0.000801642
>> 112 19940111 0.000797932
>> 112 19940112 0.000795251
>> 112 19940113 0.000795186
>> 112 19940114 0.000231359
>> 112 19940117 0.000232134
>> 112 19940118 0.000233718
>> 112 19940119 0.000233993
>> 112 19940120 0.000234694
>> 112 19940121 0.000235753
>> 112 19940124 0.000808653
>> 112 19940125 0.000235604
>> 112 19940126 0.000805068
>> 112 19940127 0.000802337
>> 112 19940128 0.000801768
>> 112 19940131 0.000233517
>> 112 19940201 0.000797431
>> 112 19940202 0.000233338
>> 112 19940203 0.000233826
>> 112 19940204 0.000799519
>> 112 19940207 0.000798105
>> 112 19940208 0.000792245
>> 112 19940209 0.000231113
>> 112 19940210 0.000233413
>> 112 19940211 0.000798168
>> 112 19940214 0.000233282
>> 112 19940215 0.000797848
>> 112 19940216 0.000785165
>> 112 19940217 0.000228426
>> 112 19940218 0.000786783
>> 112 19940221 0.00078343
>> 112 19940222 0.000781459
>> 112 19940223 0.000776264
>> 112 19940224 0.000226399
>> 112 19940225 0.000779066
>> 112 19940228 0.000773603
>> 112 19940301 0.000226487
>> 112 19940302 0.000775233
>> 112 19940303 0.000227017
>> 112 19940304 0.000227854
>> 112 19940307 0.000782814
>> 112 19940308 0.000229164
>> 112 19940309 0.000787033
>> 112 19940310 0.000784049
>> 112 19940311 0.000228984
>> 112 19940314 0.00078697
>> 112 19940315 0.000782567
>> 112 19940316 0.000228516
>> 112 19940317 0.000786347
>> 112 19940318 0.000229236
>> 112 19940321 0.000230107
>> 112 19940322 0.000792689
>> 112 19940323 0.000787284
>> 112 19940324 0.000787221
>> 112 19940325 0.000227978
>> 112 19940328 0.000228713
>> 112 19940329 0.000228894
>> 112 19940330 0.000229255
>> 112 19940331 0.000231003
>> 112 19940401 0.000796567
>> 112 19940404 0.000790668
>> 112 19940405 0.00078195
>> 112 19940406 0.000780475
>> 112 19940407 0.000228355
>> 112 19940408 0.000781723
>> 112 19940411 0.000775741
>> 112 19940412 0.000226647
>> 112 19940413 0.000778876
>> 112 19940414 0.000777336
>> 112 19940415 0.000775253
>> 112 19940418 0.000226362
>> 112 19940419 0.000779554
>> 112 19940420 0.000774824
>> 112 19940421 0.000225582
>> 112 19940422 0.000225724
>> 112 19940425 0.000773361
>> 112 19940426 0.0002256
>> 112 19940427 0.000776416
>> 113 19940428 0.000280542
>> 113 19940429 0.000964148
>> 113 19940103 0.000962654
>> 113 19940104 0.000281768
>> 113 19940105 0.000962219
>> 113 19940106 0.000961965
>> 113 19940107 0.000958602
>> 113 19940110 0.000280056
>> 113 19940111 0.000956348
>> 113 19940112 0.000952171
>> 113 19940113 0.000948176
>> 113 19940114 0.000275607
>> 113 19940117 0.000275773
>> 113 19940118 0.000276738
>> 113 19940119 0.000947068
>> 113 19940120 0.000940959
>> 113 19940121 0.000275224
>> 113 19940124 0.000948489
>> 113 19940125 0.000940076
>> 113 19940126 0.0009309
>> 113 19940127 0.000269955
>> 113 19940128 0.000270328
>> 113 19940131 0.000924234
>> 113 19940201 0.000924038
>> 113 19940202 0.000269088
>> 113 19940203 0.000270247
>> 113 19940204 0.000270562
>> 113 19940207 0.00092656
>> 113 19940208 0.000921819
>> 113 19940209 0.000920361
>> 113 19940210 0.000268958
>> 113 19940211 0.000924758
>> 113 19940214 0.000266768
>> 113 19940215 0.000911325
>> 113 19940216 0.000909294
>> 113 19940217 0.000905887
>> 113 19940218 0.000262919
>> 113 19940221 0.000262978
>> 113 19940222 0.000263189
>> 113 19940223 0.000904439
>> 113 19940224 0.000263512
>> 113 19940225 0.000906184
>> 113 19940228 0.000265198
>> 113 19940301 0.000906126
>> 113 19940302 0.000264357
>> 113 19940303 0.000265392
>> 113 19940304 0.000912495
>> 113 19940307 0.000910641
>> 113 19940308 0.000266143
>> 113 19940309 0.000910113
>> 113 19940310 0.000909277
>> 113 19940311 0.000905056
>> 113 19940314 0.00090285
>> 113 19940315 0.000898831
>> 113 19940316 0.000896118
>> 113 19940317 0.000261294
>> 113 19940318 0.000892563
>> 113 19940321 0.000890852
>> 113 19940322 0.00088639
>> 113 19940323 0.000258509
>> 113 19940324 0.000260286
>> 113 19940325 0.000889354
>> 113 19940328 0.000888373
>> 113 19940329 0.000885049
>> 113 19940330 0.000259116
>> 113 19940331 0.000259474
>> 113 19940401 0.000260316
>> 113 19940404 0.000897493
>> 113 19940405 0.000894592
>> 113 19940406 0.000260435
>> 113 19940407 0.000260989
>> 113 19940408 0.000262061
>> 113 19940411 0.000262262
>> 113 19940412 0.000263604
>> 113 19940413 0.000908682
>> 113 19940414 0.000265348
>> 113 19940415 0.000265637
>> 113 19940418 0.00026862
>> 113 19940419 0.000918882
>> 113 19940420 0.000909904
>> 113 19940421 0.000901725
>> 113 19940422 0.000900062
>> 113 19940425 0.000893547
>> 113 19940426 0.000260899
>> Here, the 1st column is the name of the countries panel which I identified
>> in number, 2nd is the date column arranged in yyyy-mm-dd format, and 3rd is
>> the daily variable for each country for 4 months of 1994. I need to take
>> the monthly average of spread variable for each country (i.e. noted as 111,
>> 112, and 113) in the above example. In short, my monthly spread variable
>> should be sorted on the basis of both PERMNO (i.e. country identifier), and
>> months of the particular year.
>>
>> Please educate me that in this data format, how the average code can be
>> written?
>>
>> Thank you very much.
>>
>>
>>
>> [image: Mailtrack]
>> <
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>>
>> Sender
>> notified by
>> Mailtrack
>> <
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>>
>> 10/20/19,
>> 05:31:23 PM
>>
>> On Mon, Sep 16, 2019 at 4:31 PM PIKAL Petr <[hidden email]> wrote:
>>
>>> Original email did not come through (some problems with formating).
>>>
>>> Hi
>>>
>>> No, on contrary. I **am** suggesting to change date column to real date
>>> asi it is easy to handle with appropriate functions.
>>>
>>> Here are some fake data
>>>
>>>> str(spdat)
>>> 'data.frame':   260 obs. of  3 variables:
>>> $ dates   : Date, format: "1995-01-01" "1995-01-02" "1995-01-03"
>>> "1995-01-04" ...
>>> $ coutryA : num  0.188 0.405 -0.107 -0.596 -0.529 ...
>>> $ countryB: num  9.4 10.76 11.24 8.26 10.71 ..
>>>
>>>> head(spdat)
>>>         dates    coutryA  countryB
>>> 1 1995-01-01  0.1875060  9.402851
>>> 2 1995-01-02  0.4045193 10.755112
>>> 3 1995-01-03 -0.1073904 11.243663
>>> 4 1995-01-04 -0.5959683  8.256424
>>> 5 1995-01-05 -0.5293772 10.705431
>>> 6 1995-01-06 -0.2228029 10.171461
>>>
>>> First I melt it
>>> spdat.m <- melt(spdat, id.var="dates")
>>>
>>>> head(spdat.m)
>>>         dates variable      value
>>> 1 1995-01-01  coutryA  0.1875060
>>> 2 1995-01-02  coutryA  0.4045193
>>> 3 1995-01-03  coutryA -0.1073904
>>> 4 1995-01-04  coutryA -0.5959683
>>> 5 1995-01-05  coutryA -0.5293772
>>> 6 1995-01-06  coutryA -0.2228029
>>>
>>> I do aggregation
>>>
>>>> spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates,
>>> "%m.%Y"), spdat.m$variable), mean)
>>>
>>> And now I use dcast  to get required result.
>>>
>>>> dcast(spdat.ag, Group.1~Group.2)
>>> Using x as value column: use value.var to override.
>>>     Group.1      coutryA  countryB
>>> 1  01.1995  0.098688137 10.177696
>>> 2  02.1995  0.352264682  9.609261
>>> 3  03.1995  0.155521876 10.043503
>>> 4  04.1995 -0.166092393 10.129844
>>> 5  05.1995  0.164665188 10.308275
>>> 6  06.1995  0.260633585 10.210129
>>> 7  07.1995  0.003671979 10.549016
>>> 8  08.1995  0.045295990 10.087435
>>> 9  09.1995 -0.145488206  9.689876
>>> 10 10.1995 -0.225645950  9.743744
>>> 11 11.1995  0.030273383 10.025435
>>> 12 12.1995  0.043557468 10.105626
>>>
>>> Cheers
>>> Petr
>>>
>>> Here are the data.
>>>
>>>> dput(spdat)
>>> spdat  <- structure(list(dates = structure(c(9131, 9132, 9133, 9134,
>> 9135,
>>> 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
>>> 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
>>> 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
>>> 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
>>> 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
>>> 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
>>> 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
>>> 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
>>> 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
>>> 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
>>> 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
>>> 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
>>> 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
>>> 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
>>> 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
>>> 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
>>> 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
>>> 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
>>> 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
>>> 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
>>> 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
>>> 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
>>> 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
>>> 9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
>>> 0.404519257417805, -0.107390371811605, -0.595968278805544,
>>> -0.529377240936012,
>>> -0.222802921207767, 0.413182392872818, 0.689673026532298,
>>> -1.2768723266992,
>>> -0.506308625809406, 0.113859233745174, -0.0963423819877653,
>>> 0.323987304768398,
>>> 1.63846917270538, 0.893233423250338, 0.297732439150487,
>> 0.949323101836486,
>>> -0.599518074708052, 0.366372319197032, -2.25734971953878,
>>> -0.190971733204918,
>>> -0.0874143568874351, 1.46699645184047, 0.00702170238687361,
>>> 0.11221346278474,
>>> -0.8060359607624, 0.340842350476532, 0.798838328074708,
>> 0.449214745851041,
>>> -0.664972890558734, 0.521830282184173, -1.35020467264521,
>>> -0.95240631225826,
>>> 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207,
>>> 1.48489932847779, 0.529222943794807, 0.0995675049147771,
>>> 0.477770516727839,
>>> 1.64567253670186, -0.0212651530684566, 0.558952796713992,
>>> 0.0409979382929057,
>>> 0.428675380654606, 0.0919422583362682, -0.819694497340459,
>>> 1.23998830450888,
>>> 0.607498144489643, -1.27724580163097, 1.41634774644371,
>>> -0.579094515769707,
>>> 2.02039606694223, 0.0740478208705996, -1.69826944583929,
>>> -0.321482399813063,
>>> -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973,
>>> 1.31264724137396, -0.0473627194710677, 0.141362267796145,
>>> 0.329709761206515,
>>> 0.518454586458572, -1.39489985851779, -0.388303591187678,
>>> -0.668922704543522,
>>> 0.0735115674875065, 1.30737242978235, 0.198503397980751,
>>> 0.257831448122427,
>>> -1.31173539205588, -1.45147941969116, 0.359725782295977,
>>> 0.612882118056585,
>>> -0.0733768753346202, -0.508349204402508, 1.35776663767231,
>>> 0.997807735669086,
>>> -1.41717534266382, -0.894170593324238, -0.68578120845151,
>>> -0.211509378018794,
>>> 0.436738904337909, -1.46932152770435, 0.0817388759874159,
>>> -0.0389350881653141,
>>> 0.709198476466861, -0.963669144724435, -0.548607422521798,
>>> -0.896886885575286,
>>> 0.322231150840934, 1.37327611339939, 0.0310213133870952,
>>> 0.796577750757324,
>>> -0.2010067423637, -0.241723752424226, 1.37547329580654,
>> -1.15382202538982,
>>> 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
>>> -0.536393730924719, -0.45845011727266, 1.10226256157127,
>>> -0.385596991265563,
>>> 3.20218061566932, -1.25865250042183, -0.13613128784276,
>> 0.483329357746514,
>>> -0.597187329618306, 0.710977603908319, -1.07945708269043,
>>> -0.477626236401394,
>>> 1.51034914684104, 2.35886426985999, -0.0250526828683629,
>>> -0.29439443478131,
>>> 0.665774016744828, 0.464027472251246, 0.226658374792016,
>>> -0.802597030454373,
>>> 0.825517059805602, -1.11293193130819, -1.27677400513873,
>> 1.60776237113347,
>>> 1.12490009531342, 0.95767047134623, 0.0475745549797055,
>>> -0.0591587460876868,
>>> -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106,
>>> 0.764367674339969, 1.49261525602638, 0.549570728337346,
>> -1.29658399741794,
>>> -1.6289903797869, 0.00573336252135834, 0.0300702149640632,
>>> 0.440810830115721,
>>> 0.663568666361326, -0.126685900835146, -0.00221628368438927,
>>> 0.815321995886579, -0.499280888368945, -0.271814047751667,
>>> -0.071025546459042,
>>> 1.73165491816826, -0.0294770299043331, 0.833605607221529,
>>> -0.670108794857159,
>>> -0.303323318026829, 1.29039844459134, -0.818806702120603,
>>> -0.445515595649677,
>>> -0.0128796557666887, 0.320923705586147, 0.230597275812536,
>>> -1.54009153212366,
>>> -0.294702981688559, 0.581209734391958, 0.121384768986639,
>>> 0.502914098451111,
>>> -1.59018268505718, -0.635101104166451, 1.48005776676403,
>>> -0.25631761189957,
>>> 0.171947814411552, 0.444646195980014, 0.172655758440111,
>>> -0.00432159794094836,
>>> -0.549321974240026, 0.585055026451421, -1.22813371480849,
>>> 0.846807540195381,
>>> 0.319629441352597, 0.393525732059709, -1.40275675444594,
>> 1.11062585584811,
>>> 0.214809571213853, -0.636432711800391, -0.283087127251573,
>>> -1.46385553207618,
>>> 0.436928676930225, -1.34231945433777, 0.451281957595763,
>>> -0.523155001924496,
>>> -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
>>> 0.346147428691405, -0.464527560160041, 0.337233933370495,
>>> 1.11331396366389,
>>> -1.00060600083316, -0.734784444487169, 1.40476315358621,
>> 1.01671092179193,
>>> -0.0144306250829694, -0.923555930346906, -1.02275966525015,
>>> 0.619422010219383,
>>> 0.603484309754755, -0.774553813657576, 0.0932792545556387,
>>> -0.651884521428279,
>>> -0.61965612647073, -1.22104834441579, -1.31439612639271,
>>> -2.87707752518163,
>>> -0.0343801084491906, -0.640678302378492, -1.38653452986558,
>>> 0.884963139028743,
>>> -0.657454283462004, 0.462842665244993, -0.20881674837534,
>> 0.6345884135548,
>>> 0.707165108434729, -0.162090928425892, -0.998662309785188,
>>> 1.3130254639318,
>>> 0.191890764940071, -0.0493619237876962, -0.55183232511689,
>>> 0.470263932874487,
>>> -0.217088645692971, 0.231550037620628, -0.530406537266415,
>>> -0.616522469083808,
>>> 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
>>> 0.766584887163714, -0.259803384094296, -0.402463714097741,
>>> -0.0229799209735185,
>>> -0.259677990559218, -1.41529707261105, 0.191362852138627,
>>> 1.54483266684747,
>>> -1.17947655378489, -0.426265411073274, 0.723010460481118,
>>> 1.37405142869537,
>>> -0.374771207936141, 0.0513905365832423, -0.369432731236118,
>>> -0.945441984794364,
>>> 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083
>>> ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
>>> 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
>>> 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
>>> 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
>>> 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
>>> 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
>>> 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
>>> 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
>>> 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
>>> 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
>>> 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
>>> 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
>>> 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
>>> 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
>>> 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
>>> 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
>>> 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
>>> 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
>>> 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
>>> 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
>>> 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
>>> 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
>>> 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
>>> 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
>>> 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
>>> 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
>>> 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
>>> 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
>>> 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
>>> 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
>>> 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
>>> 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
>>> 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
>>> 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
>>> 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
>>> 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
>>> 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
>>> 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
>>> 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
>>> 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
>>> 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
>>> 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
>>> 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
>>> 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
>>> 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
>>> 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
>>> 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
>>> 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
>>> 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
>>> 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
>>> 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
>>> 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
>>> 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
>>> 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
>>> 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
>>> 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
>>> 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
>>> 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
>>> 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
>>> 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
>>> 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
>>> 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
>>> 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
>>> 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
>>> 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
>>> 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame")
>>>
>>>
>>>
>>>
>>>
>>> From: Subhamitra Patra <mailto:[hidden email]>
>>> Sent: Friday, September 13, 2019 3:59 PM
>>> To: PIKAL Petr <mailto:[hidden email]>; r-help mailing list
>>> <mailto:[hidden email]>
>>> Subject: Re: [R] Query about calculating the monthly average of daily
>> data
>>> columns
>>>
>>> Dear PIKAL,
>>>
>>> Thank you very much for your suggestion.
>>>
>>> I tried your previous suggested code and getting the average value for
>>> each month for both country A, and B. But in your recent email, you are
>>> suggesting not to change the date column to real date. If I am going
>>> through your recently suggested code, i.e.
>>>
>>>   "aggregate(value column, list(format(date column, "%m.%Y"), country
>>> column), mean)"
>>>
>>> I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"),
>>> country), mean) : object 'value' not found".
>>>
>>> Here, my query "may I need to define the date column, country column, and
>>> value column separately?"
>>>
>>> Further, I need something the average value result like below in the data
>>> frame
>>>
>>> Month       Country A   Country B
>>> Jan 1994    26.66         35.78
>>> Feb 1994    26.13         29.14
>>>
>>> so that it will be easy for me to export to excel, and to use for the
>>> further calculations.
>>>
>>> Please suggest me in this regard.
>>>
>>> Thank you.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>> Sender notified by
>>>
>>>
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>> 09/13/19, 07:22:53 PM
>>>
>>>
>>>
>>> On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:
>> [hidden email]>
>>> wrote:
>>> Hi
>>>
>>> I am almost 100% sure that you would spare yourself much trouble if you
>>> changed your date column to real date
>>>
>>> ?as.Date
>>>
>>> reshape your wide format to long one
>>> library(reshape2)
>>> ?melt
>>>
>>> to get 3 column data.frame with one date column, one country column and
>>> one value column
>>>
>>> use ?aggregate and ?format to get summary value
>>>
>>> something like
>>> aggregate(value column, list(format(date column, "%m.%Y"), country
>>> column), mean)
>>>
>>> But if you insist to scratch your left ear with right hand accross your
>>> head, you could continue your way.
>>>
>>> Cheers
>>> Petr
>>>
>>>> -----Original Message-----
>>>> From: R-help <mailto:[hidden email]> On Behalf Of
>>> Subhamitra
>>>> Patra
>>>> Sent: Friday, September 13, 2019 3:20 PM
>>>> To: Jim Lemon <mailto:[hidden email]>; r-help mailing list
>>> <r-help@r-
>>>> http://project.org>
>>>> Subject: Re: [R] Query about calculating the monthly average of daily
>>> data
>>>> columns
>>>>
>>>> Dear Sir,
>>>>
>>>> Yes, I understood the logic. But, still, I have a few queries that I
>>> mentioned
>>>> below your answers.
>>>>
>>>> "# if you only have to get the monthly averages, it can be done this
>> way
>>>>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
>>>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>>>>>
>>>>> B. Here, I need to define the no. of months, and years separately,
>>> right?
>>>>> or else what 2, and 3 (in bold) indicates?
>>>>>
>>>>
>>>> To get the grouping variable of sequential months that you want, you
>> only
>>>> need the month and year values of the dates in the first column. First
>> I
>>> used
>>>> the "strsplit" function to split the date field at the hyphens, then
>> used
>>>> "sapply" to extract ("[") the second (month) and *third (year)* parts
>> as
>>> two
>>>> new columns. Because you have more than one year of data, you need the
>>>> year values or you will group all Januarys, all Februarys and so on.
>>>> Notice how I pass both of the new columns as a list (a data frame is a
>>> type of
>>>> list) in the call to get the mean of each month.
>>>>
>>>> 1. Here, as per my understanding, the "3" indicates the 3rd year,
>> right?
>>>> But, you showed an average for 2 months of the same year. Then, what
>> "3"
>>>> in the  spdat$year object indicate?
>>>>
>>>>
>>>> C. From this part, I got the exact average values of both January and
>>>>> February of 1994 for country A, and B. But, in code, I have a query
>>>>> that I need to define  spdat$returnA, and  spdat$returnB separately
>>>>> before writing this code, right? Like this, I need to define for each
>>>>> 84 countries separately with their respective number of months, and
>>>>> years before writing this code, right?
>>>>>
>>>>
>>>> I don't think so. Because I don't know what your data looks like, I am
>>>> guessing that for each row, it has columns for each of the 84
>> countries.
>>> I
>>>> don't know what these columns are named, either. Maybe:
>>>>
>>>> date             Australia   Belarus   ...    Zambia
>>>> 01/01/1994   20             21                 22
>>>> ...
>>>>
>>>> Here, due to my misunderstanding about the code, I was wrong. But, what
>>>> data structure you guessed, it is absolutely right that for each row, I
>>> have
>>>> columns for each of the 84 countries. So, I think, I need to define the
>>> date
>>>> column with no. of months, and years once for all the countries.
>>>> Therefore, I got my answer to the first and third question in the
>>> previous
>>>> email (what you suggested) that I no need to define the column of each
>>>> country, as the date, and no. of observations are same for all
>> countries.
>>>> But, the no. of days are different for each month, and similarly, for
>>> each
>>>> year. So, I think I need to define date for each year separately.
>>> Hence, I have
>>>> given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
>>> have
>>>> written the following code. Please correct me in case I am wrong.
>>>>
>>>>   spdat<-data.frame(
>>>>
>>>>
>>>
>> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
>>>> ep(1,21),rep(2,20),
>>>> rep(3,23), rep(4,21),
>>>>
>>>
>> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
>>>> ,22)
>>>> ),rep(1994,260)
>>>>   dates1=
>>>>
>>>
>> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
>>>> 2),rep(2,20),
>>>> rep(3,23), rep(4,20),
>>>>
>>>
>> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
>>>> ,21)
>>>> ),rep(1995,259) ,sep="-")
>>>>
>>>> Concerning the exporting of structure of the dataset to excel, I will
>>> have
>>>> 12*84 matrix. But, please suggest me the way to proceed for the large
>>>> sample. I have mentioned below what I understood from your code. Please
>>>> correct me if I am wrong.
>>>> 1. I need to define the date for each year as the no. of days in each
>>> month
>>>> are different for each year (as mentioned in my above code). For
>>> instance, in
>>>> my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
>>>> 2. Need to define the date column as character.
>>>> 3. Need to define the monthly average for each month, and year. So, now
>>>> code will be as follows.
>>>>
>>>
>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
>>>>    %%%%As I need all months average sequentially.
>>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
>>>>
>>>> Here, this meaning of "3", I am really unable to get.
>>>>
>>>> 4. Need to define each country with each month and year as mentioned in
>>>> the last part of your code.
>>>>
>>>> Please suggest me in this regard.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
>>>> mpaign=signaturevirality5&>
>>>> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
>>>> mpaign=signaturevirality5&>
>>>> 09/13/19,
>>>> 06:41:41 PM
>>>>
>>>> On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:[hidden email]
>>>
>>> wrote:
>>>>
>>>>> Hi Subhamitra,
>>>>> I'll try to write my answers adjacent to your questions below.
>>>>>
>>>>> On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
>>>>> mailto:[hidden email]> wrote:
>>>>>
>>>>>> Dear Sir,
>>>>>>
>>>>>> Thank you very much for your suggestion.
>>>>>>
>>>>>> Yes, your suggested code worked. But, actually, I have data from 3rd
>>>>>> January 1994 to 3rd August 2017 for very large (i.e. for 84
>>>>>> countries) sample. From this, I have given the example of the years
>>>>>> up to 2000. Before applying the same code for the long 24 years, I
>>>>>> want to learn the logic behind the code. Actually, some part of the
>>>>>> code is not understandable to me which I mentioned in the bold
>> letter
>>> as
>>>> follows.
>>>>>>
>>>>>> "spdat<-data.frame(
>>>>>>
>>>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>>>>>>    returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>>>>>>
>>>>>> A. Here, I need to define the no. of days in a month, and the no. of
>>>>>> countries name separately, right? But, what is meant by 15:50, and
>>>>>> 10:45 in return A, and B respectively?
>>>>>>
>>>>>
>>>>> To paraphrase Donald Trump, this is FAKE DATA! I have no idea what
>> the
>>>>> real values of return are, so I made them up using the "sample"
>>> function.
>>>>> However, this is not meant to mislead anyone, just to show how
>>>>> whatever numbers are in your data can be used in calculations. The
>>>>> colon (":") operator creates a sequence of numbers starting with the
>>>>> one to the left and ending with the one to the right.
>>>>>
>>>>>>
>>>>>> "# if you only have to get the monthly averages, it can be done this
>>>>>> way
>>>>>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
>>>>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>>>>>>
>>>>>> B. Here, I need to define the no. of months, and years separately,
>>> right?
>>>>>> or else what 2, and 3 (in bold) indicates?
>>>>>>
>>>>>
>>>>> To get the grouping variable of sequential months that you want, you
>>>>> only need the month and year values of the dates in the first column.
>>>>> First I used the "strsplit" function to split the date field at the
>>>>> hyphens, then used "sapply" to extract ("[") the second (month) and
>>>>> third (year) parts as two new columns. Because you have more than one
>>>>> year of data, you need the year values or you will group all
>> Januarys,
>>>>> all Februarys and so on. Notice how I pass both of the new columns as
>>>>> a list (a data frame is a type of
>>>>> list) in the call to get the mean of each month.
>>>>>
>>>>>>
>>>>>> "# get the averages by month and year - is this correct?
>>>>>> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
>>>>>> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>>>>>>
>>>>>> C. From this part, I got the exact average values of both January
>> and
>>>>>> February of 1994 for country A, and B. But, in code, I have a query
>>>>>> that I need to define  spdat$returnA, and  spdat$returnB separately
>>>>>> before writing this code, right? Like this, I need to define for
>> each
>>>>>> 84 countries separately with their respective number of months, and
>>>>>> years before writing this code, right?
>>>>>>
>>>>>
>>>>> I don't think so. Because I don't know what your data looks like, I
>> am
>>>>> guessing that for each row, it has columns for each of the 84
>>>>> countries. I don't know what these columns are named, either. Maybe:
>>>>>
>>>>> date             Australia   Belarus   ...    Zambia
>>>>> 01/01/1994   20             21                 22
>>>>> ...
>>>>>
>>>>>
>>>>>> Yes, after obtaining the monthly average for each country's data, I
>>>>>> need to use them for further calculations. So, I want to export the
>>>>>> result to excel. But, until understanding the code, I think I
>> willn't
>>>>>> able to apply for the entire sample, and cannot be able to discuss
>>>>>> the format of the resulted column to export to excel.
>>>>>>
>>>>>
>>>>> Say that we perform the grouped mean calculation for the first two
>>>>> country columns like this:
>>>>> monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
>>>>> monmeans
>>>>>      Australia  Belarus
>>>>> [1,]  29.70000 30.43333
>>>>> [2,]  34.17857 27.39286
>>>>>
>>>>> We are presented with a 2x2 matrix of monthly means in just the
>> format
>>>>> someone might use for importing into Excel. The first row is January
>>>>> 1994, the second February 1994 and so on. By expanding the columns to
>>>>> include all the countries in your data, You should have the result
>> you
>>> want.
>>>>>
>>>>> Jim
>>>>>
>>>>
>>>>
>>>> --
>>>> *Best Regards,*
>>>> *Subhamitra Patra*
>>>> *Phd. Research Scholar*
>>>> *Department of Humanities and Social Sciences* *Indian Institute of
>>>> Technology, Kharagpur*
>>>> *INDIA*
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more,
>> see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
>>> partnerů PRECHEZA a.s. jsou zveřejněny na:
>>> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
>>> about processing and protection of business partner’s personal data are
>>> available on website:
>>> https://www.precheza.cz/en/personal-data-protection-principles/
>>> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
>>> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
>>> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
>>> documents attached to it may be confidential and are subject to the
>> legally
>>> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Subhamitra Patra
>>> Phd. Research Scholar
>>> Department of Humanities and Social Sciences
>>> Indian Institute of Technology, Kharagpur
>>> INDIA
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> *Best Regards,*
>> *Subhamitra Patra*
>> *Phd. Research Scholar*
>> *Department of Humanities and Social Sciences*
>> *Indian Institute of Technology, Kharagpur*
>> *INDIA*
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply | Threaded
Open this post in threaded view
|

Re: Query about calculating the monthly average of daily data columns

Jim Lemon-4
In reply to this post by Subhamitra Patra
Hi Subhamitra,
This is not the only way to do this, but if you only want the monthly
averages, it is simple:

# I had to change the "soft" tabs in your email to commas
# in order to read the data in
spdf<-read.table(text="PERMNO,DATE,Spread
111,19940103,0.025464308
111,19940104,0.064424296
111,19940105,0.018579337
111,19940106,0.018872211
111,19940107,0.065279782
111,19940110,0.063485905
111,19940111,0.018355453
111,19940112,0.064135683
111,19940113,0.063519987
111,19940114,0.018277351
111,19940117,0.018628417
111,19940118,0.065630229
111,19940119,0.018713152
111,19940120,0.019119037
111,19940121,0.068342043
111,19940124,0.020843244
111,19940125,0.019954211
111,19940126,0.018980321
111,19940127,0.066827165
111,19940128,0.067459235
111,19940131,0.068682559
111,19940201,0.02081465
111,19940202,0.068236091
111,19940203,0.068821406
111,19940204,0.020075648
111,19940207,0.066070584
111,19940208,0.066068837
111,19940209,0.019077072
111,19940210,0.065894875
111,19940211,0.018847478
111,19940214,0.065040844
111,19940215,0.01880332
111,19940216,0.018836199
111,19940217,0.066888865
111,19940218,0.067116793
111,19940221,0.068809742
111,19940222,0.068230213
111,19940223,0.069502855
111,19940224,0.070383523
111,19940225,0.020430811
111,19940228,0.067087257
111,19940301,0.066776479
111,19940302,0.019959031
111,19940303,0.066596469
111,19940304,0.019131334
111,19940307,0.019312528
111,19940308,0.067349909
111,19940309,0.068916431
111,19940310,0.068620043
111,19940311,0.070494844
111,19940314,0.071056842
111,19940315,0.071042517
111,19940316,0.072401771
111,19940317,0.071940001
111,19940318,0.07352884
111,19940321,0.072671688
111,19940322,0.072652595
111,19940323,0.021352138
111,19940324,0.069933727
111,19940325,0.068717467
111,19940328,0.020470748
111,19940329,0.020003748
111,19940330,0.065833717
111,19940331,0.065268388
111,19940401,0.018762356
111,19940404,0.064914179
111,19940405,0.064706743
111,19940406,0.018764175
111,19940407,0.06524806
111,19940408,0.018593449
111,19940411,0.064913949
111,19940412,0.01872089
111,19940413,0.018729328
111,19940414,0.018978773
111,19940415,0.065477137
111,19940418,0.064614365
111,19940419,0.064184148
111,19940420,0.018553192
111,19940421,0.066872771
111,19940422,0.06680782
111,19940425,0.067467961
111,19940426,0.02014297
111,19940427,0.062464016
111,19940428,0.062357052
112,19940429,0.000233993
112,19940103,0.000815264
112,19940104,0.000238165
112,19940105,0.000813632
112,19940106,0.000236915
112,19940107,0.000809102
112,19940110,0.000801642
112,19940111,0.000797932
112,19940112,0.000795251
112,19940113,0.000795186
112,19940114,0.000231359
112,19940117,0.000232134
112,19940118,0.000233718
112,19940119,0.000233993
112,19940120,0.000234694
112,19940121,0.000235753
112,19940124,0.000808653
112,19940125,0.000235604
112,19940126,0.000805068
112,19940127,0.000802337
112,19940128,0.000801768
112,19940131,0.000233517
112,19940201,0.000797431
112,19940202,0.000233338
112,19940203,0.000233826
112,19940204,0.000799519
112,19940207,0.000798105
112,19940208,0.000792245
112,19940209,0.000231113
112,19940210,0.000233413
112,19940211,0.000798168
112,19940214,0.000233282
112,19940215,0.000797848
112,19940216,0.000785165
112,19940217,0.000228426
112,19940218,0.000786783
112,19940221,0.00078343
112,19940222,0.000781459
112,19940223,0.000776264
112,19940224,0.000226399
112,19940225,0.000779066
112,19940228,0.000773603
112,19940301,0.000226487
112,19940302,0.000775233
112,19940303,0.000227017
112,19940304,0.000227854
112,19940307,0.000782814
112,19940308,0.000229164
112,19940309,0.000787033
112,19940310,0.000784049
112,19940311,0.000228984
112,19940314,0.00078697
112,19940315,0.000782567
112,19940316,0.000228516
112,19940317,0.000786347
112,19940318,0.000229236
112,19940321,0.000230107
112,19940322,0.000792689
112,19940323,0.000787284
112,19940324,0.000787221
112,19940325,0.000227978",
header=TRUE,sep=",",stringsAsFactors=FALSE)
# split the year and month out of the date string
# as you have more than one year in your complete
# data set
spdf$yrmon<-substr(spdf$DATE,1,6)
# get the mean for each PERMNO and year/month
by(spdf$Spread,spdf[,c("PERMNO","yrmon")],mean)

Jim

On Sun, Oct 20, 2019 at 11:09 PM Subhamitra Patra <
[hidden email]> wrote:

>
> Here, I am asking one more query (just for learning purpose) that if my
> country name and its respective variable is in the panel format, and I want
> to take the monthly average for each country, how the code will be
> arranged. For your convenience, I am providing a small data sample below.
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.