

Dear Rusers,
I have daily data from 03011994 to 29122000. In my datafile, he first
column is date and the second and third columns are the returns of the
country A, and B. Here, the date column is same for both countries. I want
to calculate the monthly average of both country's returns by using a loop,
and then, I want to export the results into excel.
Please help me in this regard.
Please find the attached datasheet.
Thank you.

*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[image: Mailtrack]
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/12/19,
08:23:07 PM
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


No reproducible example, so hard to say. What class is your "date" column?
 factor, character, Date? See ?Date
Once you have an object of appropriate class  see ?format.Date  ?months
can extract the month and ?ave can do your averaging. No explicit looping
is needed.
The tidydata alternative universe can also do all these things if that's
where you prefer to live.
As usual, any attached data was stripped. See ?dput for one way to include
data in your post.
Cheers,
Bert
On Thu, Sep 12, 2019 at 7:54 AM Subhamitra Patra < [hidden email]>
wrote:
> Dear Rusers,
>
> I have daily data from 03011994 to 29122000. In my datafile, he first
> column is date and the second and third columns are the returns of the
> country A, and B. Here, the date column is same for both countries. I want
> to calculate the monthly average of both country's returns by using a loop,
> and then, I want to export the results into excel.
>
> Please help me in this regard.
>
> Please find the attached datasheet.
>
> Thank you.
>
> 
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> >
> Sender
> notified by
> Mailtrack
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> >
> 09/12/19,
> 08:23:07 PM
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
Please include data, say
dput(head(data, 20)) # post the output of this
But, is the problem as simple as
rowMeans(data[2:3], na.rm = TRUE)
?
Hope this helps,
Rui Barradas
Às 15:53 de 12/09/19, Subhamitra Patra escreveu:
> Dear Rusers,
>
> I have daily data from 03011994 to 29122000. In my datafile, he first
> column is date and the second and third columns are the returns of the
> country A, and B. Here, the date column is same for both countries. I want
> to calculate the monthly average of both country's returns by using a loop,
> and then, I want to export the results into excel.
>
> Please help me in this regard.
>
> Please find the attached datasheet.
>
> Thank you.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
Inline.
Às 17:33 de 12/09/19, Bert Gunter escreveu:
> But she wants *monthly* averages, Rui.
Thanks, my mistake.
Ergo ave() or tidyData
> equivalent, right?
Maybe. But ave() returns as many values as the input length, this seems
more suited for tapply or aggregate.
I will first create an example data set.
set.seed(1234)
start < as.Date("03011994", "%d%m%Y")
end < as.Date("29122000", "%d%m%Y")
date < seq(start, end, by = "day")
date < date[as.integer(format(date, "%u")) %in% 1:5]
df1 < data.frame(date,
CountryA = rnorm(length(date)),
CountryB = rnorm(length(date)))
Now the averages by month
month < zoo::as.yearmon(df1[[1]])
aggA < aggregate(CountryA ~ month, df1, mean)
aggB < aggregate(CountryB ~ month, df1, mean)
MonthReturns < merge(aggA, aggB)
head(MonthReturns)
Final clean up.
rm(date, month, aggA, aggB)
Hope this helps,
Rui Barradas
>
>  Bert
>
> On Thu, Sep 12, 2019 at 8:41 AM Rui Barradas < [hidden email]
> <mailto: [hidden email]>> wrote:
>
> Hello,
>
> Please include data, say
>
> dput(head(data, 20)) # post the output of this
>
>
> But, is the problem as simple as
>
> rowMeans(data[2:3], na.rm = TRUE)
>
> ?
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 15:53 de 12/09/19, Subhamitra Patra escreveu:
> > Dear Rusers,
> >
> > I have daily data from 03011994 to 29122000. In my datafile,
> he first
> > column is date and the second and third columns are the returns
> of the
> > country A, and B. Here, the date column is same for both
> countries. I want
> > to calculate the monthly average of both country's returns by
> using a loop,
> > and then, I want to export the results into excel.
> >
> > Please help me in this regard.
> >
> > Please find the attached datasheet.
> >
> > Thank you.
> >
>
> ______________________________________________
> [hidden email] <mailto: [hidden email]> mailing list 
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Subhamitra,
Your data didn't make it through, so I guess the first thing is to
guess what it looks like. Here's a try at just January and February of
1994 so that we can see the result on the screen. The logic will work
just as well for the whole seven years.
# create fake data for the first two months
spdat<data.frame(
dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
returnA=sample(15:50,58,TRUE),returnB=sample(10:45,58,TRUE))
# I'll assume that the dates in your file are character, not factor
spdat$dates<as.character(spdat$dates)
# if you only have to get the monthly averages, it can be done this way
spdat$month<sapply(strsplit(spdat$dates,""),"[",2)
spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
# get the averages by month and year  is this correct?
monthlyA<by(spdat$returnA,spdat[,c("month","year")],mean)
monthlyB<by(spdat$returnB,spdat[,c("month","year")],mean)
Now you have what you say you want:
monthlyA
month: 1
year: 1994
[1] 34.1

month: 2
year: 1994
[1] 33.32143
monthlyB
month: 1
year: 1994
[1] 29.7

month: 2
year: 1994
[1] 27.28571
Sorry I didn't use a loop (for(month in 1:12) ... for (year in
1994:2000) ...), too lazy.
Now you have to let us know how this information is to be formatted to
go into Excel. Excel will import the text as above, but I think you
want something that you can use for further calculations.
Jim
On Fri, Sep 13, 2019 at 12:54 AM Subhamitra Patra
< [hidden email]> wrote:
>
> Dear Rusers,
>
> I have daily data from 03011994 to 29122000. In my datafile, he first
> column is date and the second and third columns are the returns of the
> country A, and B. Here, the date column is same for both countries. I want
> to calculate the monthly average of both country's returns by using a loop,
> and then, I want to export the results into excel.
>
> Please help me in this regard.
>
> Please find the attached datasheet.
>
> Thank you.
>
> 
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
> 09/12/19,
> 08:23:07 PM
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Sir,
Thank you very much for your suggestion.
Yes, your suggested code worked. But, actually, I have data from 3rd
January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
sample. From this, I have given the example of the years up to 2000. Before
applying the same code for the long 24 years, I want to learn the logic
behind the code. Actually, some part of the code is not understandable to
me which I mentioned in the bold letter as follows.
"spdat<data.frame(
dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
A. Here, I need to define the no. of days in a month, and the no. of
countries name separately, right? But, what is meant by 15:50, and 10:45 in
return A, and B respectively?
"# if you only have to get the monthly averages, it can be done this way
spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
B. Here, I need to define the no. of months, and years separately, right?
or else what 2, and 3 (in bold) indicates?
"# get the averages by month and year  is this correct?
monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
C. From this part, I got the exact average values of both January and
February of 1994 for country A, and B. But, in code, I have a query that I
need to define spdat$returnA, and spdat$returnB separately before writing
this code, right? Like this, I need to define for each 84 countries
separately with their respective number of months, and years before writing
this code, right?
Yes, after obtaining the monthly average for each country's data, I need to
use them for further calculations. So, I want to export the result to
excel. But, until understanding the code, I think I willn't able to apply
for the entire sample, and cannot be able to discuss the format of the
resulted column to export to excel.
Therefore, kindly help me to understand the code.
Thank you very much, Sir, and thanks to this R forum for helping the
Rbeginners.
[image: Mailtrack]
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/13/19,
12:57:58 PM
On Fri, Sep 13, 2019 at 3:15 AM Jim Lemon < [hidden email]> wrote:
> Hi Subhamitra,
> Your data didn't make it through, so I guess the first thing is to
> guess what it looks like. Here's a try at just January and February of
> 1994 so that we can see the result on the screen. The logic will work
> just as well for the whole seven years.
>
> # create fake data for the first two months
> spdat<data.frame(
> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> returnA=sample(15:50,58,TRUE),returnB=sample(10:45,58,TRUE))
> # I'll assume that the dates in your file are character, not factor
> spdat$dates<as.character(spdat$dates)
> # if you only have to get the monthly averages, it can be done this way
> spdat$month<sapply(strsplit(spdat$dates,""),"[",2)
> spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
> # get the averages by month and year  is this correct?
> monthlyA<by(spdat$returnA,spdat[,c("month","year")],mean)
> monthlyB<by(spdat$returnB,spdat[,c("month","year")],mean)
>
> Now you have what you say you want:
>
> monthlyA
> month: 1
> year: 1994
> [1] 34.1
> 
> month: 2
> year: 1994
> [1] 33.32143
>
> monthlyB
> month: 1
> year: 1994
> [1] 29.7
> 
> month: 2
> year: 1994
> [1] 27.28571
>
> Sorry I didn't use a loop (for(month in 1:12) ... for (year in
> 1994:2000) ...), too lazy.
> Now you have to let us know how this information is to be formatted to
> go into Excel. Excel will import the text as above, but I think you
> want something that you can use for further calculations.
>
> Jim
>
> On Fri, Sep 13, 2019 at 12:54 AM Subhamitra Patra
> < [hidden email]> wrote:
> >
> > Dear Rusers,
> >
> > I have daily data from 03011994 to 29122000. In my datafile, he first
> > column is date and the second and third columns are the returns of the
> > country A, and B. Here, the date column is same for both countries. I
> want
> > to calculate the monthly average of both country's returns by using a
> loop,
> > and then, I want to export the results into excel.
> >
> > Please help me in this regard.
> >
> > Please find the attached datasheet.
> >
> > Thank you.
> >
> > 
> > *Best Regards,*
> > *Subhamitra Patra*
> > *Phd. Research Scholar*
> > *Department of Humanities and Social Sciences*
> > *Indian Institute of Technology, Kharagpur*
> > *INDIA*
> >
> > [image: Mailtrack]
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> >
> > Sender
> > notified by
> > Mailtrack
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> >
> > 09/12/19,
> > 08:23:07 PM
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>

*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi
I may be completely wrong but reshape/aggregate should by what you want
spdat
dates returnA returnB
1 111994 16 13
2 211994 44 10
3 311994 24 32
.....
> library(reshape2)
> spdat.m < melt(spdat)
Using dates as id variables
> str(spdat.m)
'data.frame': 116 obs. of 3 variables:
$ dates : Factor w/ 58 levels "111994","121994",..: 1 23 44 47 49 51 53 55 57 3 ...
$ variable: Factor w/ 2 levels "returnA","returnB": 1 1 1 1 1 1 1 1 1 1 ...
$ value : int 16 44 24 47 16 35 34 34 26 36 ...
> spdat.m$realdate < as.Date(spdat.m[,1], format="%d%m%Y")
> aggregate(spdat.m$value, list(format(spdat.m$realdate, "%m.%Y"), spdat.m$variable), mean)
Group.1 Group.2 x
1 01.1994 returnA 31.93333
2 02.1994 returnA 32.39286
3 01.1994 returnB 24.26667
4 02.1994 returnB 30.03571
Cheers
Petr
> Original Message
> From: Rhelp < [hidden email]> On Behalf Of Subhamitra
> Patra
> Sent: Friday, September 13, 2019 10:08 AM
> To: Jim Lemon < [hidden email]>
> Cc: rhelp mailing list < [hidden email]>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear Sir,
>
> Thank you very much for your suggestion.
>
> Yes, your suggested code worked. But, actually, I have data from 3rd January
> 1994 to 3rd August 2017 for very large (i.e. for 84 countries) sample. From
> this, I have given the example of the years up to 2000. Before applying the
> same code for the long 24 years, I want to learn the logic behind the code.
> Actually, some part of the code is not understandable to me which I
> mentioned in the bold letter as follows.
>
> "spdat<data.frame(
> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>
> A. Here, I need to define the no. of days in a month, and the no. of countries
> name separately, right? But, what is meant by 15:50, and 10:45 in return A,
> and B respectively?
>
> "# if you only have to get the monthly averages, it can be done this way
> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>
> "# get the averages by month and year  is this correct?
> monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
> monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>
> C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define spdat$returnA, and spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries separately
> with their respective number of months, and years before writing this code,
> right?
>
> Yes, after obtaining the monthly average for each country's data, I need to
> use them for further calculations. So, I want to export the result to excel. But,
> until understanding the code, I think I willn't able to apply for the entire
> sample, and cannot be able to discuss the format of the resulted column to
> export to excel.
>
> Therefore, kindly help me to understand the code.
>
> Thank you very much, Sir, and thanks to this R forum for helping the R
> beginners.
>
>
>
> [image: Mailtrack]
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> mpaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> mpaign=signaturevirality5&>
> 09/13/19,
> 12:57:58 PM
>
> On Fri, Sep 13, 2019 at 3:15 AM Jim Lemon < [hidden email]> wrote:
>
> > Hi Subhamitra,
> > Your data didn't make it through, so I guess the first thing is to
> > guess what it looks like. Here's a try at just January and February of
> > 1994 so that we can see the result on the screen. The logic will work
> > just as well for the whole seven years.
> >
> > # create fake data for the first two months spdat<data.frame(
> > dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> > returnA=sample(15:50,58,TRUE),returnB=sample(10:45,58,TRUE))
> > # I'll assume that the dates in your file are character, not factor
> > spdat$dates<as.character(spdat$dates)
> > # if you only have to get the monthly averages, it can be done this
> > way
> > spdat$month<sapply(strsplit(spdat$dates,""),"[",2)
> > spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
> > # get the averages by month and year  is this correct?
> > monthlyA<by(spdat$returnA,spdat[,c("month","year")],mean)
> > monthlyB<by(spdat$returnB,spdat[,c("month","year")],mean)
> >
> > Now you have what you say you want:
> >
> > monthlyA
> > month: 1
> > year: 1994
> > [1] 34.1
> > 
> > month: 2
> > year: 1994
> > [1] 33.32143
> >
> > monthlyB
> > month: 1
> > year: 1994
> > [1] 29.7
> > 
> > month: 2
> > year: 1994
> > [1] 27.28571
> >
> > Sorry I didn't use a loop (for(month in 1:12) ... for (year in
> > 1994:2000) ...), too lazy.
> > Now you have to let us know how this information is to be formatted to
> > go into Excel. Excel will import the text as above, but I think you
> > want something that you can use for further calculations.
> >
> > Jim
> >
> > On Fri, Sep 13, 2019 at 12:54 AM Subhamitra Patra
> > < [hidden email]> wrote:
> > >
> > > Dear Rusers,
> > >
> > > I have daily data from 03011994 to 29122000. In my datafile, he
> > > first column is date and the second and third columns are the
> > > returns of the country A, and B. Here, the date column is same for
> > > both countries. I
> > want
> > > to calculate the monthly average of both country's returns by using
> > > a
> > loop,
> > > and then, I want to export the results into excel.
> > >
> > > Please help me in this regard.
> > >
> > > Please find the attached datasheet.
> > >
> > > Thank you.
> > >
> > > 
> > > *Best Regards,*
> > > *Subhamitra Patra*
> > > *Phd. Research Scholar*
> > > *Department of Humanities and Social Sciences* *Indian Institute of
> > > Technology, Kharagpur*
> > > *INDIA*
> > >
> > > [image: Mailtrack]
> > > <
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_cam> paig
> > n=signaturevirality5&
> > >
> > > Sender
> > > notified by
> > > Mailtrack
> > > <
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_cam> paig
> > n=signaturevirality5&
> > >
> > > 09/12/19,
> > > 08:23:07 PM
> > > ______________________________________________
> > > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/rhelp> > > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > > and provide commented, minimal, selfcontained, reproducible code.
> >
>
>
> 
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences* *Indian Institute of
> Technology, Kharagpur*
> *INDIA*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasadyochranyosobnichudaju/  Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personaldataprotectionprinciples/Důvěrnost: Tento email a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01dovetek/  This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01disclaimer/______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Subhamitra,
I'll try to write my answers adjacent to your questions below.
On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < [hidden email]>
wrote:
> Dear Sir,
>
> Thank you very much for your suggestion.
>
> Yes, your suggested code worked. But, actually, I have data from 3rd
> January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
> sample. From this, I have given the example of the years up to 2000. Before
> applying the same code for the long 24 years, I want to learn the logic
> behind the code. Actually, some part of the code is not understandable to
> me which I mentioned in the bold letter as follows.
>
> "spdat<data.frame(
> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>
> A. Here, I need to define the no. of days in a month, and the no. of
> countries name separately, right? But, what is meant by 15:50, and 10:45 in
> return A, and B respectively?
>
To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the real
values of return are, so I made them up using the "sample" function.
However, this is not meant to mislead anyone, just to show how whatever
numbers are in your data can be used in calculations. The colon (":")
operator creates a sequence of numbers starting with the one to the left
and ending with the one to the right.
>
> "# if you only have to get the monthly averages, it can be done this way
> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>
To get the grouping variable of sequential months that you want, you only
need the month and year values of the dates in the first column. First I
used the "strsplit" function to split the date field at the hyphens, then
used "sapply" to extract ("[") the second (month) and third (year) parts as
two new columns. Because you have more than one year of data, you need the
year values or you will group all Januarys, all Februarys and so on. Notice
how I pass both of the new columns as a list (a data frame is a type of
list) in the call to get the mean of each month.
>
> "# get the averages by month and year  is this correct?
> monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
> monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>
> C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define spdat$returnA, and spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries
> separately with their respective number of months, and years before writing
> this code, right?
>
I don't think so. Because I don't know what your data looks like, I am
guessing that for each row, it has columns for each of the 84 countries. I
don't know what these columns are named, either. Maybe:
date Australia Belarus ... Zambia
01/01/1994 20 21 22
...
> Yes, after obtaining the monthly average for each country's data, I need
> to use them for further calculations. So, I want to export the result to
> excel. But, until understanding the code, I think I willn't able to apply
> for the entire sample, and cannot be able to discuss the format of the
> resulted column to export to excel.
>
Say that we perform the grouped mean calculation for the first two country
columns like this:
monmeans<sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
monmeans
Australia Belarus
[1,] 29.70000 30.43333
[2,] 34.17857 27.39286
We are presented with a 2x2 matrix of monthly means in just the format
someone might use for importing into Excel. The first row is January 1994,
the second February 1994 and so on. By expanding the columns to include all
the countries in your data, You should have the result you want.
Jim
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Sir,
Yes, I understood the logic. But, still, I have a few queries that I
mentioned below your answers.
"# if you only have to get the monthly averages, it can be done this way
> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>
To get the grouping variable of sequential months that you want, you only
need the month and year values of the dates in the first column. First I
used the "strsplit" function to split the date field at the hyphens, then
used "sapply" to extract ("[") the second (month) and *third (year)* parts
as two new columns. Because you have more than one year of data, you need
the year values or you will group all Januarys, all Februarys and so on.
Notice how I pass both of the new columns as a list (a data frame is a type
of list) in the call to get the mean of each month.
1. Here, as per my understanding, the "3" indicates the 3rd year, right?
But, you showed an average for 2 months of the same year. Then, what "3" in
the spdat$year object indicate?
C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define spdat$returnA, and spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries
> separately with their respective number of months, and years before writing
> this code, right?
>
I don't think so. Because I don't know what your data looks like, I am
guessing that for each row, it has columns for each of the 84 countries. I
don't know what these columns are named, either. Maybe:
date Australia Belarus ... Zambia
01/01/1994 20 21 22
...
Here, due to my misunderstanding about the code, I was wrong. But, what
data structure you guessed, it is absolutely right that for each row, I
have columns for each of the 84 countries. So, I think, I need to define
the date column with no. of months, and years once for all the countries.
Therefore, I got my answer to the first and third question in the previous
email (what you suggested) that I no need to define the column of each
country, as the date, and no. of observations are same for all countries.
But, the no. of days are different for each month, and similarly, for each
year. So, I think I need to define date for each year separately. Hence, I
have given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
have written the following code. Please correct me in case I am wrong.
spdat<data.frame(
dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20),
rep(3,23), rep(4,21),
rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22)
),rep(1994,260)
dates1=
paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20),
rep(3,23), rep(4,20),
rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21)
),rep(1995,259) ,sep="")
Concerning the exporting of structure of the dataset to excel, I will have
12*84 matrix. But, please suggest me the way to proceed for the large
sample. I have mentioned below what I understood from your code. Please
correct me if I am wrong.
1. I need to define the date for each year as the no. of days in each month
are different for each year (as mentioned in my above code). For instance,
in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
2. Need to define the date column as character.
3. Need to define the monthly average for each month, and year. So, now
code will be as follows.
spdat$month<sapply(strsplit(spdat$dates,""),"[",2,3,4,5,6,7,8,9,10,11,12)
%%%%As I need all months average sequentially.
spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
Here, this meaning of "3", I am really unable to get.
4. Need to define each country with each month and year as mentioned in the
last part of your code.
Please suggest me in this regard.
Thank you.
[image: Mailtrack]
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/13/19,
06:41:41 PM
On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon < [hidden email]> wrote:
> Hi Subhamitra,
> I'll try to write my answers adjacent to your questions below.
>
> On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> [hidden email]> wrote:
>
>> Dear Sir,
>>
>> Thank you very much for your suggestion.
>>
>> Yes, your suggested code worked. But, actually, I have data from 3rd
>> January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
>> sample. From this, I have given the example of the years up to 2000. Before
>> applying the same code for the long 24 years, I want to learn the logic
>> behind the code. Actually, some part of the code is not understandable to
>> me which I mentioned in the bold letter as follows.
>>
>> "spdat<data.frame(
>> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
>> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>>
>> A. Here, I need to define the no. of days in a month, and the no. of
>> countries name separately, right? But, what is meant by 15:50, and 10:45 in
>> return A, and B respectively?
>>
>
> To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> real values of return are, so I made them up using the "sample" function.
> However, this is not meant to mislead anyone, just to show how whatever
> numbers are in your data can be used in calculations. The colon (":")
> operator creates a sequence of numbers starting with the one to the left
> and ending with the one to the right.
>
>>
>> "# if you only have to get the monthly averages, it can be done this way
>> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
>> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
>>
>> B. Here, I need to define the no. of months, and years separately, right?
>> or else what 2, and 3 (in bold) indicates?
>>
>
> To get the grouping variable of sequential months that you want, you only
> need the month and year values of the dates in the first column. First I
> used the "strsplit" function to split the date field at the hyphens, then
> used "sapply" to extract ("[") the second (month) and third (year) parts as
> two new columns. Because you have more than one year of data, you need the
> year values or you will group all Januarys, all Februarys and so on. Notice
> how I pass both of the new columns as a list (a data frame is a type of
> list) in the call to get the mean of each month.
>
>>
>> "# get the averages by month and year  is this correct?
>> monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
>> monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>>
>> C. From this part, I got the exact average values of both January and
>> February of 1994 for country A, and B. But, in code, I have a query that I
>> need to define spdat$returnA, and spdat$returnB separately before writing
>> this code, right? Like this, I need to define for each 84 countries
>> separately with their respective number of months, and years before writing
>> this code, right?
>>
>
> I don't think so. Because I don't know what your data looks like, I am
> guessing that for each row, it has columns for each of the 84 countries. I
> don't know what these columns are named, either. Maybe:
>
> date Australia Belarus ... Zambia
> 01/01/1994 20 21 22
> ...
>
>
>> Yes, after obtaining the monthly average for each country's data, I need
>> to use them for further calculations. So, I want to export the result to
>> excel. But, until understanding the code, I think I willn't able to apply
>> for the entire sample, and cannot be able to discuss the format of the
>> resulted column to export to excel.
>>
>
> Say that we perform the grouped mean calculation for the first two country
> columns like this:
> monmeans<sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> monmeans
> Australia Belarus
> [1,] 29.70000 30.43333
> [2,] 34.17857 27.39286
>
> We are presented with a 2x2 matrix of monthly means in just the format
> someone might use for importing into Excel. The first row is January 1994,
> the second February 1994 and so on. By expanding the columns to include all
> the countries in your data, You should have the result you want.
>
> Jim
>

*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi
I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date
?as.Date
reshape your wide format to long one
library(reshape2)
?melt
to get 3 column data.frame with one date column, one country column and one value column
use ?aggregate and ?format to get summary value
something like
aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)
But if you insist to scratch your left ear with right hand accross your head, you could continue your way.
Cheers
Petr
> Original Message
> From: Rhelp < [hidden email]> On Behalf Of Subhamitra
> Patra
> Sent: Friday, September 13, 2019 3:20 PM
> To: Jim Lemon < [hidden email]>; rhelp mailing list <rhelp@r
> project.org>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear Sir,
>
> Yes, I understood the logic. But, still, I have a few queries that I mentioned
> below your answers.
>
> "# if you only have to get the monthly averages, it can be done this way
> > spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> > spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
> >
> > B. Here, I need to define the no. of months, and years separately, right?
> > or else what 2, and 3 (in bold) indicates?
> >
>
> To get the grouping variable of sequential months that you want, you only
> need the month and year values of the dates in the first column. First I used
> the "strsplit" function to split the date field at the hyphens, then used
> "sapply" to extract ("[") the second (month) and *third (year)* parts as two
> new columns. Because you have more than one year of data, you need the
> year values or you will group all Januarys, all Februarys and so on.
> Notice how I pass both of the new columns as a list (a data frame is a type of
> list) in the call to get the mean of each month.
>
> 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> But, you showed an average for 2 months of the same year. Then, what "3"
> in the spdat$year object indicate?
>
>
> C. From this part, I got the exact average values of both January and
> > February of 1994 for country A, and B. But, in code, I have a query
> > that I need to define spdat$returnA, and spdat$returnB separately
> > before writing this code, right? Like this, I need to define for each
> > 84 countries separately with their respective number of months, and
> > years before writing this code, right?
> >
>
> I don't think so. Because I don't know what your data looks like, I am
> guessing that for each row, it has columns for each of the 84 countries. I
> don't know what these columns are named, either. Maybe:
>
> date Australia Belarus ... Zambia
> 01/01/1994 20 21 22
> ...
>
> Here, due to my misunderstanding about the code, I was wrong. But, what
> data structure you guessed, it is absolutely right that for each row, I have
> columns for each of the 84 countries. So, I think, I need to define the date
> column with no. of months, and years once for all the countries.
> Therefore, I got my answer to the first and third question in the previous
> email (what you suggested) that I no need to define the column of each
> country, as the date, and no. of observations are same for all countries.
> But, the no. of days are different for each month, and similarly, for each
> year. So, I think I need to define date for each year separately. Hence, I have
> given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have
> written the following code. Please correct me in case I am wrong.
>
> spdat<data.frame(
>
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> ep(1,21),rep(2,20),
> rep(3,23), rep(4,21),
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> ,22)
> ),rep(1994,260)
> dates1=
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> 2),rep(2,20),
> rep(3,23), rep(4,20),
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> ,21)
> ),rep(1995,259) ,sep="")
>
> Concerning the exporting of structure of the dataset to excel, I will have
> 12*84 matrix. But, please suggest me the way to proceed for the large
> sample. I have mentioned below what I understood from your code. Please
> correct me if I am wrong.
> 1. I need to define the date for each year as the no. of days in each month
> are different for each year (as mentioned in my above code). For instance, in
> my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> 2. Need to define the date column as character.
> 3. Need to define the monthly average for each month, and year. So, now
> code will be as follows.
> spdat$month<sapply(strsplit(spdat$dates,""),"[",2,3,4,5,6,7,8,9,10,11,12)
> %%%%As I need all months average sequentially.
> spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
>
> Here, this meaning of "3", I am really unable to get.
>
> 4. Need to define each country with each month and year as mentioned in
> the last part of your code.
>
> Please suggest me in this regard.
>
> Thank you.
>
>
>
>
>
>
>
> [image: Mailtrack]
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> mpaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> mpaign=signaturevirality5&>
> 09/13/19,
> 06:41:41 PM
>
> On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon < [hidden email]> wrote:
>
> > Hi Subhamitra,
> > I'll try to write my answers adjacent to your questions below.
> >
> > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > [hidden email]> wrote:
> >
> >> Dear Sir,
> >>
> >> Thank you very much for your suggestion.
> >>
> >> Yes, your suggested code worked. But, actually, I have data from 3rd
> >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> >> countries) sample. From this, I have given the example of the years
> >> up to 2000. Before applying the same code for the long 24 years, I
> >> want to learn the logic behind the code. Actually, some part of the
> >> code is not understandable to me which I mentioned in the bold letter as
> follows.
> >>
> >> "spdat<data.frame(
> >> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> >>
> >> A. Here, I need to define the no. of days in a month, and the no. of
> >> countries name separately, right? But, what is meant by 15:50, and
> >> 10:45 in return A, and B respectively?
> >>
> >
> > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > real values of return are, so I made them up using the "sample" function.
> > However, this is not meant to mislead anyone, just to show how
> > whatever numbers are in your data can be used in calculations. The
> > colon (":") operator creates a sequence of numbers starting with the
> > one to the left and ending with the one to the right.
> >
> >>
> >> "# if you only have to get the monthly averages, it can be done this
> >> way
> >> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> >> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
> >>
> >> B. Here, I need to define the no. of months, and years separately, right?
> >> or else what 2, and 3 (in bold) indicates?
> >>
> >
> > To get the grouping variable of sequential months that you want, you
> > only need the month and year values of the dates in the first column.
> > First I used the "strsplit" function to split the date field at the
> > hyphens, then used "sapply" to extract ("[") the second (month) and
> > third (year) parts as two new columns. Because you have more than one
> > year of data, you need the year values or you will group all Januarys,
> > all Februarys and so on. Notice how I pass both of the new columns as
> > a list (a data frame is a type of
> > list) in the call to get the mean of each month.
> >
> >>
> >> "# get the averages by month and year  is this correct?
> >> monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
> >> monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> >>
> >> C. From this part, I got the exact average values of both January and
> >> February of 1994 for country A, and B. But, in code, I have a query
> >> that I need to define spdat$returnA, and spdat$returnB separately
> >> before writing this code, right? Like this, I need to define for each
> >> 84 countries separately with their respective number of months, and
> >> years before writing this code, right?
> >>
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84
> > countries. I don't know what these columns are named, either. Maybe:
> >
> > date Australia Belarus ... Zambia
> > 01/01/1994 20 21 22
> > ...
> >
> >
> >> Yes, after obtaining the monthly average for each country's data, I
> >> need to use them for further calculations. So, I want to export the
> >> result to excel. But, until understanding the code, I think I willn't
> >> able to apply for the entire sample, and cannot be able to discuss
> >> the format of the resulted column to export to excel.
> >>
> >
> > Say that we perform the grouped mean calculation for the first two
> > country columns like this:
> > monmeans<sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > monmeans
> > Australia Belarus
> > [1,] 29.70000 30.43333
> > [2,] 34.17857 27.39286
> >
> > We are presented with a 2x2 matrix of monthly means in just the format
> > someone might use for importing into Excel. The first row is January
> > 1994, the second February 1994 and so on. By expanding the columns to
> > include all the countries in your data, You should have the result you want.
> >
> > Jim
> >
>
>
> 
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences* *Indian Institute of
> Technology, Kharagpur*
> *INDIA*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasadyochranyosobnichudaju/  Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personaldataprotectionprinciples/Důvěrnost: Tento email a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01dovetek/  This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01disclaimer/______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear PIKAL,
Thank you very much for your suggestion.
I tried your previous suggested code and getting the average value for each
month for both country A, and B. But in your recent email, you are
suggesting not to change the date column to real date. If I am going
through your recently suggested code, i.e.
"aggregate(value column, list(format(date column, "%m.%Y"), country
column), mean)"
I am getting an Error that "*aggregate(value, list(format(date, "%m.%Y"),
country), mean) : **object 'value' not found"*.
Here, my query "*may I need to define the date column, country column, and
value column separately?"*
Further, I need something the average value result like below in the data
frame
Month Country A Country B
Jan 1994 26.66 35.78
Feb 1994 26.13 29.14
so that it will be easy for me to export to excel, and to use for the
further calculations.
Please suggest me in this regard.
Thank you.
[image: Mailtrack]
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
< https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
09/13/19,
07:22:53 PM
On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr < [hidden email]> wrote:
> Hi
>
> I am almost 100% sure that you would spare yourself much trouble if you
> changed your date column to real date
>
> ?as.Date
>
> reshape your wide format to long one
> library(reshape2)
> ?melt
>
> to get 3 column data.frame with one date column, one country column and
> one value column
>
> use ?aggregate and ?format to get summary value
>
> something like
> aggregate(value column, list(format(date column, "%m.%Y"), country
> column), mean)
>
> But if you insist to scratch your left ear with right hand accross your
> head, you could continue your way.
>
> Cheers
> Petr
>
> > Original Message
> > From: Rhelp < [hidden email]> On Behalf Of Subhamitra
> > Patra
> > Sent: Friday, September 13, 2019 3:20 PM
> > To: Jim Lemon < [hidden email]>; rhelp mailing list <rhelp@r
> > project.org>
> > Subject: Re: [R] Query about calculating the monthly average of daily
> data
> > columns
> >
> > Dear Sir,
> >
> > Yes, I understood the logic. But, still, I have a few queries that I
> mentioned
> > below your answers.
> >
> > "# if you only have to get the monthly averages, it can be done this way
> > > spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> > > spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
> > >
> > > B. Here, I need to define the no. of months, and years separately,
> right?
> > > or else what 2, and 3 (in bold) indicates?
> > >
> >
> > To get the grouping variable of sequential months that you want, you only
> > need the month and year values of the dates in the first column. First I
> used
> > the "strsplit" function to split the date field at the hyphens, then used
> > "sapply" to extract ("[") the second (month) and *third (year)* parts as
> two
> > new columns. Because you have more than one year of data, you need the
> > year values or you will group all Januarys, all Februarys and so on.
> > Notice how I pass both of the new columns as a list (a data frame is a
> type of
> > list) in the call to get the mean of each month.
> >
> > 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> > But, you showed an average for 2 months of the same year. Then, what "3"
> > in the spdat$year object indicate?
> >
> >
> > C. From this part, I got the exact average values of both January and
> > > February of 1994 for country A, and B. But, in code, I have a query
> > > that I need to define spdat$returnA, and spdat$returnB separately
> > > before writing this code, right? Like this, I need to define for each
> > > 84 countries separately with their respective number of months, and
> > > years before writing this code, right?
> > >
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84 countries.
> I
> > don't know what these columns are named, either. Maybe:
> >
> > date Australia Belarus ... Zambia
> > 01/01/1994 20 21 22
> > ...
> >
> > Here, due to my misunderstanding about the code, I was wrong. But, what
> > data structure you guessed, it is absolutely right that for each row, I
> have
> > columns for each of the 84 countries. So, I think, I need to define the
> date
> > column with no. of months, and years once for all the countries.
> > Therefore, I got my answer to the first and third question in the
> previous
> > email (what you suggested) that I no need to define the column of each
> > country, as the date, and no. of observations are same for all countries.
> > But, the no. of days are different for each month, and similarly, for
> each
> > year. So, I think I need to define date for each year separately.
> Hence, I have
> > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
> have
> > written the following code. Please correct me in case I am wrong.
> >
> > spdat<data.frame(
> >
> >
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> > ep(1,21),rep(2,20),
> > rep(3,23), rep(4,21),
> >
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> > ,22)
> > ),rep(1994,260)
> > dates1=
> >
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> > 2),rep(2,20),
> > rep(3,23), rep(4,20),
> >
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> > ,21)
> > ),rep(1995,259) ,sep="")
> >
> > Concerning the exporting of structure of the dataset to excel, I will
> have
> > 12*84 matrix. But, please suggest me the way to proceed for the large
> > sample. I have mentioned below what I understood from your code. Please
> > correct me if I am wrong.
> > 1. I need to define the date for each year as the no. of days in each
> month
> > are different for each year (as mentioned in my above code). For
> instance, in
> > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> > 2. Need to define the date column as character.
> > 3. Need to define the monthly average for each month, and year. So, now
> > code will be as follows.
> >
> spdat$month<sapply(strsplit(spdat$dates,""),"[",2,3,4,5,6,7,8,9,10,11,12)
> > %%%%As I need all months average sequentially.
> > spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
> >
> > Here, this meaning of "3", I am really unable to get.
> >
> > 4. Need to define each country with each month and year as mentioned in
> > the last part of your code.
> >
> > Please suggest me in this regard.
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> > [image: Mailtrack]
> > < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> > mpaign=signaturevirality5&>
> > Sender
> > notified by
> > Mailtrack
> > < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> > mpaign=signaturevirality5&>
> > 09/13/19,
> > 06:41:41 PM
> >
> > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon < [hidden email]> wrote:
> >
> > > Hi Subhamitra,
> > > I'll try to write my answers adjacent to your questions below.
> > >
> > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > > [hidden email]> wrote:
> > >
> > >> Dear Sir,
> > >>
> > >> Thank you very much for your suggestion.
> > >>
> > >> Yes, your suggested code worked. But, actually, I have data from 3rd
> > >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> > >> countries) sample. From this, I have given the example of the years
> > >> up to 2000. Before applying the same code for the long 24 years, I
> > >> want to learn the logic behind the code. Actually, some part of the
> > >> code is not understandable to me which I mentioned in the bold letter
> as
> > follows.
> > >>
> > >> "spdat<data.frame(
> > >>
> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> > >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> > >>
> > >> A. Here, I need to define the no. of days in a month, and the no. of
> > >> countries name separately, right? But, what is meant by 15:50, and
> > >> 10:45 in return A, and B respectively?
> > >>
> > >
> > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > > real values of return are, so I made them up using the "sample"
> function.
> > > However, this is not meant to mislead anyone, just to show how
> > > whatever numbers are in your data can be used in calculations. The
> > > colon (":") operator creates a sequence of numbers starting with the
> > > one to the left and ending with the one to the right.
> > >
> > >>
> > >> "# if you only have to get the monthly averages, it can be done this
> > >> way
> > >> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> > >> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
> > >>
> > >> B. Here, I need to define the no. of months, and years separately,
> right?
> > >> or else what 2, and 3 (in bold) indicates?
> > >>
> > >
> > > To get the grouping variable of sequential months that you want, you
> > > only need the month and year values of the dates in the first column.
> > > First I used the "strsplit" function to split the date field at the
> > > hyphens, then used "sapply" to extract ("[") the second (month) and
> > > third (year) parts as two new columns. Because you have more than one
> > > year of data, you need the year values or you will group all Januarys,
> > > all Februarys and so on. Notice how I pass both of the new columns as
> > > a list (a data frame is a type of
> > > list) in the call to get the mean of each month.
> > >
> > >>
> > >> "# get the averages by month and year  is this correct?
> > >> monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
> > >> monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> > >>
> > >> C. From this part, I got the exact average values of both January and
> > >> February of 1994 for country A, and B. But, in code, I have a query
> > >> that I need to define spdat$returnA, and spdat$returnB separately
> > >> before writing this code, right? Like this, I need to define for each
> > >> 84 countries separately with their respective number of months, and
> > >> years before writing this code, right?
> > >>
> > >
> > > I don't think so. Because I don't know what your data looks like, I am
> > > guessing that for each row, it has columns for each of the 84
> > > countries. I don't know what these columns are named, either. Maybe:
> > >
> > > date Australia Belarus ... Zambia
> > > 01/01/1994 20 21 22
> > > ...
> > >
> > >
> > >> Yes, after obtaining the monthly average for each country's data, I
> > >> need to use them for further calculations. So, I want to export the
> > >> result to excel. But, until understanding the code, I think I willn't
> > >> able to apply for the entire sample, and cannot be able to discuss
> > >> the format of the resulted column to export to excel.
> > >>
> > >
> > > Say that we perform the grouped mean calculation for the first two
> > > country columns like this:
> > > monmeans<sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > > monmeans
> > > Australia Belarus
> > > [1,] 29.70000 30.43333
> > > [2,] 34.17857 27.39286
> > >
> > > We are presented with a 2x2 matrix of monthly means in just the format
> > > someone might use for importing into Excel. The first row is January
> > > 1994, the second February 1994 and so on. By expanding the columns to
> > > include all the countries in your data, You should have the result you
> want.
> > >
> > > Jim
> > >
> >
> >
> > 
> > *Best Regards,*
> > *Subhamitra Patra*
> > *Phd. Research Scholar*
> > *Department of Humanities and Social Sciences* *Indian Institute of
> > Technology, Kharagpur*
> > *INDIA*
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/posting> > guide.html
> > and provide commented, minimal, selfcontained, reproducible code.
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasadyochranyosobnichudaju/  Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personaldataprotectionprinciples/> Důvěrnost: Tento email a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> odpovědnosti: https://www.precheza.cz/01dovetek/  This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01disclaimer/>
>

*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Sorry, forgot to include the list.
On Sat, Sep 14, 2019 at 10:27 AM Jim Lemon < [hidden email]> wrote:
>
> See inline
>
> On Fri, Sep 13, 2019 at 11:20 PM Subhamitra Patra < [hidden email]> wrote:
>>
>> Dear Sir,
>>
>> Yes, I understood the logic. But, still, I have a few queries that I mentioned below your answers.
>>
>>> "# if you only have to get the monthly averages, it can be done this way
>>> spdat$month<sapply(strsplit(spdat$dates,""),"[",2)
>>> spdat$year<sapply(strsplit(spdat$dates,""),"[",3)"
>>>
>>> B. Here, I need to define the no. of months, and years separately, right? or else what 2, and 3 (in bold) indicates?
>>
>>
>> To get the grouping variable of sequential months that you want, you only need the month and year values of the dates in the first column. First I used the "strsplit" function to split the date field at the hyphens, then used "sapply" to extract ("[") the second (month) and third (year) parts as two new columns. Because you have more than one year of data, you need the year values or you will group all Januarys, all Februarys and so on. Notice how I pass both of the new columns as a list (a data frame is a type of list) in the call to get the mean of each month.
>>
>> 1. Here, as per my understanding, the "3" indicates the 3rd year, right? But, you showed an average for 2 months of the same year. Then, what "3" in the spdat$year object indicate?
>
>
> No, as I explained in the initial email and below, the "strsplit" function takes one or more strings (your dates) and breaks them at the specified character (""), So
>
> strsplit("111994","")
> [[1]]
> [1] "1" "1" "1994"
>
> That is passed to the "sapply" function that applies the extraction ("[") operator to the result of "strsplit". The "3" indicates that you want to extract the third element, in this case, the year.
>
> > sapply(strsplit("111994",""),"[",3)
> [1] "1994"
>
> So by splitting the dates and extracting the second (month) and third (year) element from each date, we have all the information needed to create a grouping variable for monthly averages.
>
>>
>>
>>> C. From this part, I got the exact average values of both January and February of 1994 for country A, and B. But, in code, I have a query that I need to define spdat$returnA, and spdat$returnB separately before writing this code, right? Like this, I need to define for each 84 countries separately with their respective number of months, and years before writing this code, right?
>>
>>
>> I don't think so. Because I don't know what your data looks like, I am guessing that for each row, it has columns for each of the 84 countries. I don't know what these columns are named, either. Maybe:
>>
>> date Australia Belarus ... Zambia
>> 01/01/1994 20 21 22
>> ...
>>
>> Here, due to my misunderstanding about the code, I was wrong. But, what data structure you guessed, it is absolutely right that for each row, I have columns for each of the 84 countries. So, I think, I need to define the date column with no. of months, and years once for all the countries. Therefore, I got my answer to the first and third question in the previous email (what you suggested) that I no need to define the column of each country, as the date, and no. of observations are same for all countries. But, the no. of days are different for each month, and similarly, for each year. So, I think I need to define date for each year separately. Hence, I have given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have written the following code. Please correct me in case I am wrong.
>>
>> spdat<data.frame(
>> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20),rep(3,23), rep(4,21), rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22)),rep(1994,260)
>> dates1=paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20),rep(3,23), rep(4,20), rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21)),rep(1995,259) ,sep="")
>>
> First, you don't have to recreate the data that you already have. I did because I don't have it and have to guess what it looks like. Remember neither I nor any of the others who have offered help have your data or even a representative sample. If you tried the code above, you surely must know that it doesn't work. I could create code that would produce the dates from 111994 to 31/12/1995 or any other stretch you would like, but it would only confuse you more. _You already have the dates in your data file._ What I have shown you is how to use those dates to create the grouping variable that you want.
>
>> Concerning the exporting of structure of the dataset to excel, I will have 12*84 matrix. But, please suggest me the way to proceed for the large sample. I have mentioned below what I understood from your code. Please correct me if I am wrong.
>> 1. I need to define the date for each year as the no. of days in each month are different for each year (as mentioned in my above code). For instance, in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
>> 2. Need to define the date column as character.
>> 3. Need to define the monthly average for each month, and year. So, now code will be as follows.
>> spdat$month<sapply(strsplit(spdat$dates,""),"[",2,3,4,5,6,7,8,9,10,11,12) %%%%As I need all months average sequentially.
>> spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
>>
>> Here, this meaning of "3", I am really unable to get
>
>
> You have missed the point here, as above. I didn't mean to suggest that you had to recreate the dates that you already have. What I did was to show how you could use the dates that you already have to create a grouping variable for your calculation.
>>
>> .
>>
>> 4. Need to define each country with each month and year as mentioned in the last part of your code.
>>
> What I did was to add the month and year of each row as two separate columns of data. You should be able to see that by looking at spdat after the "strsplit/sapply" operation. Then you have in each row the returns from your 84 countries _and_ the month/year for that row. When I used the "by" function to get the monthly means from spdat, I showed you how to use the same code on your data frame, after creating the month and year columns, to get the monthly average return for the 84 countries for as many years as you have. As this should return a matrix of successive months as rows and countries as columns, you should easily be able to import this into Excel.
>
> The reason I did it this way was to illustrate how to define the grouping variable from the existing information and perform an easily understood calculation. I thought that using methods that automatically perform the operations I used might allow you to get the result without understanding how you had gotten it. Like the data I didn't have, I took a guess at how much my example would help you understand what was happening and give you the skills to do it yourself. I do hope that you get to the point where you are able to think of my example as unsophisticated, for I could have done it with dplyr/ts/reshape and the rest.
>
> Jim
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Original email did not come through (some problems with formating).
Hi
No, on contrary. I **am** suggesting to change date column to real date asi it is easy to handle with appropriate functions.
Here are some fake data
> str(spdat)
'data.frame': 260 obs. of 3 variables:
$ dates : Date, format: "19950101" "19950102" "19950103" "19950104" ...
$ coutryA : num 0.188 0.405 0.107 0.596 0.529 ...
$ countryB: num 9.4 10.76 11.24 8.26 10.71 ..
> head(spdat)
dates coutryA countryB
1 19950101 0.1875060 9.402851
2 19950102 0.4045193 10.755112
3 19950103 0.1073904 11.243663
4 19950104 0.5959683 8.256424
5 19950105 0.5293772 10.705431
6 19950106 0.2228029 10.171461
First I melt it
spdat.m < melt(spdat, id.var="dates")
> head(spdat.m)
dates variable value
1 19950101 coutryA 0.1875060
2 19950102 coutryA 0.4045193
3 19950103 coutryA 0.1073904
4 19950104 coutryA 0.5959683
5 19950105 coutryA 0.5293772
6 19950106 coutryA 0.2228029
I do aggregation
> spdat.ag < aggregate(spdat.m$value, list(format(spdat.m$dates, "%m.%Y"), spdat.m$variable), mean)
And now I use dcast to get required result.
> dcast(spdat.ag, Group.1~Group.2)
Using x as value column: use value.var to override.
Group.1 coutryA countryB
1 01.1995 0.098688137 10.177696
2 02.1995 0.352264682 9.609261
3 03.1995 0.155521876 10.043503
4 04.1995 0.166092393 10.129844
5 05.1995 0.164665188 10.308275
6 06.1995 0.260633585 10.210129
7 07.1995 0.003671979 10.549016
8 08.1995 0.045295990 10.087435
9 09.1995 0.145488206 9.689876
10 10.1995 0.225645950 9.743744
11 11.1995 0.030273383 10.025435
12 12.1995 0.043557468 10.105626
Cheers
Petr
Here are the data.
> dput(spdat)
spdat < structure(list(dates = structure(c(9131, 9132, 9133, 9134, 9135,
9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
0.404519257417805, 0.107390371811605, 0.595968278805544, 0.529377240936012,
0.222802921207767, 0.413182392872818, 0.689673026532298, 1.2768723266992,
0.506308625809406, 0.113859233745174, 0.0963423819877653, 0.323987304768398,
1.63846917270538, 0.893233423250338, 0.297732439150487, 0.949323101836486,
0.599518074708052, 0.366372319197032, 2.25734971953878, 0.190971733204918,
0.0874143568874351, 1.46699645184047, 0.00702170238687361, 0.11221346278474,
0.8060359607624, 0.340842350476532, 0.798838328074708, 0.449214745851041,
0.664972890558734, 0.521830282184173, 1.35020467264521, 0.95240631225826,
1.25607320999849, 1.57018988549379, 0.99477900888445, 0.936218787378207,
1.48489932847779, 0.529222943794807, 0.0995675049147771, 0.477770516727839,
1.64567253670186, 0.0212651530684566, 0.558952796713992, 0.0409979382929057,
0.428675380654606, 0.0919422583362682, 0.819694497340459, 1.23998830450888,
0.607498144489643, 1.27724580163097, 1.41634774644371, 0.579094515769707,
2.02039606694223, 0.0740478208705996, 1.69826944583929, 0.321482399813063,
0.489198601400924, 2.0066750686302, 1.90624857752322, 0.46762405849973,
1.31264724137396, 0.0473627194710677, 0.141362267796145, 0.329709761206515,
0.518454586458572, 1.39489985851779, 0.388303591187678, 0.668922704543522,
0.0735115674875065, 1.30737242978235, 0.198503397980751, 0.257831448122427,
1.31173539205588, 1.45147941969116, 0.359725782295977, 0.612882118056585,
0.0733768753346202, 0.508349204402508, 1.35776663767231, 0.997807735669086,
1.41717534266382, 0.894170593324238, 0.68578120845151, 0.211509378018794,
0.436738904337909, 1.46932152770435, 0.0817388759874159, 0.0389350881653141,
0.709198476466861, 0.963669144724435, 0.548607422521798, 0.896886885575286,
0.322231150840934, 1.37327611339939, 0.0310213133870952, 0.796577750757324,
0.2010067423637, 0.241723752424226, 1.37547329580654, 1.15382202538982,
0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
0.536393730924719, 0.45845011727266, 1.10226256157127, 0.385596991265563,
3.20218061566932, 1.25865250042183, 0.13613128784276, 0.483329357746514,
0.597187329618306, 0.710977603908319, 1.07945708269043, 0.477626236401394,
1.51034914684104, 2.35886426985999, 0.0250526828683629, 0.29439443478131,
0.665774016744828, 0.464027472251246, 0.226658374792016, 0.802597030454373,
0.825517059805602, 1.11293193130819, 1.27677400513873, 1.60776237113347,
1.12490009531342, 0.95767047134623, 0.0475745549797055, 0.0591587460876868,
0.690617365240342, 1.62111622938126, 1.3545210707469, 1.8607927043106,
0.764367674339969, 1.49261525602638, 0.549570728337346, 1.29658399741794,
1.6289903797869, 0.00573336252135834, 0.0300702149640632, 0.440810830115721,
0.663568666361326, 0.126685900835146, 0.00221628368438927,
0.815321995886579, 0.499280888368945, 0.271814047751667, 0.071025546459042,
1.73165491816826, 0.0294770299043331, 0.833605607221529, 0.670108794857159,
0.303323318026829, 1.29039844459134, 0.818806702120603, 0.445515595649677,
0.0128796557666887, 0.320923705586147, 0.230597275812536, 1.54009153212366,
0.294702981688559, 0.581209734391958, 0.121384768986639, 0.502914098451111,
1.59018268505718, 0.635101104166451, 1.48005776676403, 0.25631761189957,
0.171947814411552, 0.444646195980014, 0.172655758440111, 0.00432159794094836,
0.549321974240026, 0.585055026451421, 1.22813371480849, 0.846807540195381,
0.319629441352597, 0.393525732059709, 1.40275675444594, 1.11062585584811,
0.214809571213853, 0.636432711800391, 0.283087127251573, 1.46385553207618,
0.436928676930225, 1.34231945433777, 0.451281957595763, 0.523155001924496,
2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
0.346147428691405, 0.464527560160041, 0.337233933370495, 1.11331396366389,
1.00060600083316, 0.734784444487169, 1.40476315358621, 1.01671092179193,
0.0144306250829694, 0.923555930346906, 1.02275966525015, 0.619422010219383,
0.603484309754755, 0.774553813657576, 0.0932792545556387, 0.651884521428279,
0.61965612647073, 1.22104834441579, 1.31439612639271, 2.87707752518163,
0.0343801084491906, 0.640678302378492, 1.38653452986558, 0.884963139028743,
0.657454283462004, 0.462842665244993, 0.20881674837534, 0.6345884135548,
0.707165108434729, 0.162090928425892, 0.998662309785188, 1.3130254639318,
0.191890764940071, 0.0493619237876962, 0.55183232511689, 0.470263932874487,
0.217088645692971, 0.231550037620628, 0.530406537266415, 0.616522469083808,
0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
0.766584887163714, 0.259803384094296, 0.402463714097741, 0.0229799209735185,
0.259677990559218, 1.41529707261105, 0.191362852138627, 1.54483266684747,
1.17947655378489, 0.426265411073274, 0.723010460481118, 1.37405142869537,
0.374771207936141, 0.0513905365832423, 0.369432731236118, 0.945441984794364,
0.179506648255554, 0.31971255725438, 1.25117960937319, 2.46228549580083
), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
9.59278722974697)), row.names = c(NA, 260L), class = "data.frame")
From: Subhamitra Patra <mailto: [hidden email]>
Sent: Friday, September 13, 2019 3:59 PM
To: PIKAL Petr <mailto: [hidden email]>; rhelp mailing list <mailto: [hidden email]>
Subject: Re: [R] Query about calculating the monthly average of daily data columns
Dear PIKAL,
Thank you very much for your suggestion.
I tried your previous suggested code and getting the average value for each month for both country A, and B. But in your recent email, you are suggesting not to change the date column to real date. If I am going through your recently suggested code, i.e.
"aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)"
I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"), country), mean) : object 'value' not found".
Here, my query "may I need to define the date column, country column, and value column separately?"
Further, I need something the average value result like below in the data frame
Month Country A Country B
Jan 1994 26.66 35.78
Feb 1994 26.13 29.14
so that it will be easy for me to export to excel, and to use for the further calculations.
Please suggest me in this regard.
Thank you.
https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&Sender notified by
https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 09/13/19, 07:22:53 PM
On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto: [hidden email]> wrote:
Hi
I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date
?as.Date
reshape your wide format to long one
library(reshape2)
?melt
to get 3 column data.frame with one date column, one country column and one value column
use ?aggregate and ?format to get summary value
something like
aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)
But if you insist to scratch your left ear with right hand accross your head, you could continue your way.
Cheers
Petr
> Original Message
> From: Rhelp <mailto: [hidden email]> On Behalf Of Subhamitra
> Patra
> Sent: Friday, September 13, 2019 3:20 PM
> To: Jim Lemon <mailto: [hidden email]>; rhelp mailing list <rhelp@r
> http://project.org>
> Subject: Re: [R] Query about calculating the monthly average of daily data
> columns
>
> Dear Sir,
>
> Yes, I understood the logic. But, still, I have a few queries that I mentioned
> below your answers.
>
> "# if you only have to get the monthly averages, it can be done this way
> > spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> > spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
> >
> > B. Here, I need to define the no. of months, and years separately, right?
> > or else what 2, and 3 (in bold) indicates?
> >
>
> To get the grouping variable of sequential months that you want, you only
> need the month and year values of the dates in the first column. First I used
> the "strsplit" function to split the date field at the hyphens, then used
> "sapply" to extract ("[") the second (month) and *third (year)* parts as two
> new columns. Because you have more than one year of data, you need the
> year values or you will group all Januarys, all Februarys and so on.
> Notice how I pass both of the new columns as a list (a data frame is a type of
> list) in the call to get the mean of each month.
>
> 1. Here, as per my understanding, the "3" indicates the 3rd year, right?
> But, you showed an average for 2 months of the same year. Then, what "3"
> in the spdat$year object indicate?
>
>
> C. From this part, I got the exact average values of both January and
> > February of 1994 for country A, and B. But, in code, I have a query
> > that I need to define spdat$returnA, and spdat$returnB separately
> > before writing this code, right? Like this, I need to define for each
> > 84 countries separately with their respective number of months, and
> > years before writing this code, right?
> >
>
> I don't think so. Because I don't know what your data looks like, I am
> guessing that for each row, it has columns for each of the 84 countries. I
> don't know what these columns are named, either. Maybe:
>
> date Australia Belarus ... Zambia
> 01/01/1994 20 21 22
> ...
>
> Here, due to my misunderstanding about the code, I was wrong. But, what
> data structure you guessed, it is absolutely right that for each row, I have
> columns for each of the 84 countries. So, I think, I need to define the date
> column with no. of months, and years once for all the countries.
> Therefore, I got my answer to the first and third question in the previous
> email (what you suggested) that I no need to define the column of each
> country, as the date, and no. of observations are same for all countries.
> But, the no. of days are different for each month, and similarly, for each
> year. So, I think I need to define date for each year separately. Hence, I have
> given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have
> written the following code. Please correct me in case I am wrong.
>
> spdat<data.frame(
>
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> ep(1,21),rep(2,20),
> rep(3,23), rep(4,21),
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> ,22)
> ),rep(1994,260)
> dates1=
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> 2),rep(2,20),
> rep(3,23), rep(4,20),
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> ,21)
> ),rep(1995,259) ,sep="")
>
> Concerning the exporting of structure of the dataset to excel, I will have
> 12*84 matrix. But, please suggest me the way to proceed for the large
> sample. I have mentioned below what I understood from your code. Please
> correct me if I am wrong.
> 1. I need to define the date for each year as the no. of days in each month
> are different for each year (as mentioned in my above code). For instance, in
> my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> 2. Need to define the date column as character.
> 3. Need to define the monthly average for each month, and year. So, now
> code will be as follows.
> spdat$month<sapply(strsplit(spdat$dates,""),"[",2,3,4,5,6,7,8,9,10,11,12)
> %%%%As I need all months average sequentially.
> spdat$year<sapply(strsplit(spdat$dates,""),"[",3)
>
> Here, this meaning of "3", I am really unable to get.
>
> 4. Need to define each country with each month and year as mentioned in
> the last part of your code.
>
> Please suggest me in this regard.
>
> Thank you.
>
>
>
>
>
>
>
> [image: Mailtrack]
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> mpaign=signaturevirality5&>
> Sender
> notified by
> Mailtrack
> < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca> mpaign=signaturevirality5&>
> 09/13/19,
> 06:41:41 PM
>
> On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto: [hidden email]> wrote:
>
> > Hi Subhamitra,
> > I'll try to write my answers adjacent to your questions below.
> >
> > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > mailto: [hidden email]> wrote:
> >
> >> Dear Sir,
> >>
> >> Thank you very much for your suggestion.
> >>
> >> Yes, your suggested code worked. But, actually, I have data from 3rd
> >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> >> countries) sample. From this, I have given the example of the years
> >> up to 2000. Before applying the same code for the long 24 years, I
> >> want to learn the logic behind the code. Actually, some part of the
> >> code is not understandable to me which I mentioned in the bold letter as
> follows.
> >>
> >> "spdat<data.frame(
> >> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep=""),
> >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> >>
> >> A. Here, I need to define the no. of days in a month, and the no. of
> >> countries name separately, right? But, what is meant by 15:50, and
> >> 10:45 in return A, and B respectively?
> >>
> >
> > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the
> > real values of return are, so I made them up using the "sample" function.
> > However, this is not meant to mislead anyone, just to show how
> > whatever numbers are in your data can be used in calculations. The
> > colon (":") operator creates a sequence of numbers starting with the
> > one to the left and ending with the one to the right.
> >
> >>
> >> "# if you only have to get the monthly averages, it can be done this
> >> way
> >> spdat$month<sapply(strsplit(spdat$dates,""),"["*,2*)
> >> spdat$year<sapply(strsplit(spdat$dates,""),"[",*3*)"
> >>
> >> B. Here, I need to define the no. of months, and years separately, right?
> >> or else what 2, and 3 (in bold) indicates?
> >>
> >
> > To get the grouping variable of sequential months that you want, you
> > only need the month and year values of the dates in the first column.
> > First I used the "strsplit" function to split the date field at the
> > hyphens, then used "sapply" to extract ("[") the second (month) and
> > third (year) parts as two new columns. Because you have more than one
> > year of data, you need the year values or you will group all Januarys,
> > all Februarys and so on. Notice how I pass both of the new columns as
> > a list (a data frame is a type of
> > list) in the call to get the mean of each month.
> >
> >>
> >> "# get the averages by month and year  is this correct?
> >> monthlyA<by(*spdat$returnA*,spdat[,c("month","year")],mean)
> >> monthlyB<by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> >>
> >> C. From this part, I got the exact average values of both January and
> >> February of 1994 for country A, and B. But, in code, I have a query
> >> that I need to define spdat$returnA, and spdat$returnB separately
> >> before writing this code, right? Like this, I need to define for each
> >> 84 countries separately with their respective number of months, and
> >> years before writing this code, right?
> >>
> >
> > I don't think so. Because I don't know what your data looks like, I am
> > guessing that for each row, it has columns for each of the 84
> > countries. I don't know what these columns are named, either. Maybe:
> >
> > date Australia Belarus ... Zambia
> > 01/01/1994 20 21 22
> > ...
> >
> >
> >> Yes, after obtaining the monthly average for each country's data, I
> >> need to use them for further calculations. So, I want to export the
> >> result to excel. But, until understanding the code, I think I willn't
> >> able to apply for the entire sample, and cannot be able to discuss
> >> the format of the resulted column to export to excel.
> >>
> >
> > Say that we perform the grouped mean calculation for the first two
> > country columns like this:
> > monmeans<sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > monmeans
> > Australia Belarus
> > [1,] 29.70000 30.43333
> > [2,] 34.17857 27.39286
> >
> > We are presented with a 2x2 matrix of monthly means in just the format
> > someone might use for importing into Excel. The first row is January
> > 1994, the second February 1994 and so on. By expanding the columns to
> > include all the countries in your data, You should have the result you want.
> >
> > Jim
> >
>
>
> 
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences* *Indian Institute of
> Technology, Kharagpur*
> *INDIA*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> mailto: [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasadyochranyosobnichudaju/  Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personaldataprotectionprinciples/Důvěrnost: Tento email a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01dovetek/  This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01disclaimer/
Best Regards,
Subhamitra Patra
Phd. Research Scholar
Department of Humanities and Social Sciences
Indian Institute of Technology, Kharagpur
INDIA
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

