Quantcast

Return calculation for panel data structure

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Return calculation for panel data structure

Arsenio
Return calculation for panel data structure Dear all,

I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form:
R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable.


ticker date_f
date prc iticker
A 20-Jun-03
15876 19.06 IXT
A 23-Jun-03
15879 18.89 IXT
A 24-Jun-03
15880 18.75 IXT
A 25-Jun-03
15881 18.92 IXT
A 26-Jun-03
15882 19.35 IXT
A 27-Jun-03
15883 19.55 IXT
AA 20-Jun-03
15876 26.24 IXB
AA 23-Jun-03
15879 25.07 IXB
AA 24-Jun-03
15880 24.96 IXB
AA 25-Jun-03
15881 24.51 IXB
AA 26-Jun-03
15882 25.24 IXB
AA 27-Jun-03
15883 25.09 IXB
AA 30-Jun-03
15886 25.5 IXB


In  SAS, I would do it this way:

data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */
       set work.1;
       by ticker iticker date_f ; /*sorting by stock, fund and date */

       lag_prc=lag(prc); /* creating the lagged price variable

       if first.iticker then /* if the it's a first date for the grouped unit then.. */
               lag_prc= ; /* setting the first lagged value to missing if the first date for the grouped unit */
       return = (prc - lag_prc)/ lag_prc;
       drop lag_prc;
run;

The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing.
I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky.

Any ideas how to accomplish this? Would appreciate any hints,

Arsenio
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

rbali
Return calculation for panel data structureUse diff and lag from plm:

Lines="ticker date_f  date prc iticker
A 20-Jun-03  15876 19.06 IXT
A 23-Jun-03  15879 18.89 IXT
A 24-Jun-03  15880 18.75 IXT
A 25-Jun-03  15881 18.92 IXT
A 26-Jun-03  15882 19.35 IXT
A 27-Jun-03  15883 19.55 IXT
AA 20-Jun-03 15876 26.24 IXB
AA 23-Jun-03 15879 25.07 IXB
AA 24-Jun-03 15880 24.96 IXB
AA 25-Jun-03 15881 24.51 IXB
AA 26-Jun-03 15882 25.24 IXB
AA 27-Jun-03 15883 25.09 IXB
AA 30-Jun-03 15886 25.5 IXB
"
z <- read.table(textConnection(Lines), header = TRUE)
library(plm)
zz=pdata.frame(z, c("ticker","date"))
pr=zz$prc; str(pr)
zz$return=diff(pr)/lag(pr)
zz

Good luck!

Robert

From: Arsenio
Sent: Tuesday, March 06, 2012 2:47 AM
To: [hidden email]
Subject: [R-SIG-Finance] Return calculation for panel data structure

Dear all,

I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form:
R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable.


      ticker date_f
     date prc iticker
      A 20-Jun-03
     15876 19.06 IXT
      A 23-Jun-03
     15879 18.89 IXT
      A 24-Jun-03
     15880 18.75 IXT
      A 25-Jun-03
     15881 18.92 IXT
      A 26-Jun-03
     15882 19.35 IXT
      A 27-Jun-03
     15883 19.55 IXT
      AA 20-Jun-03
     15876 26.24 IXB
      AA 23-Jun-03
     15879 25.07 IXB
      AA 24-Jun-03
     15880 24.96 IXB
      AA 25-Jun-03
     15881 24.51 IXB
      AA 26-Jun-03
     15882 25.24 IXB
      AA 27-Jun-03
     15883 25.09 IXB
      AA 30-Jun-03
     15886 25.5 IXB


In  SAS, I would do it this way:

data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */
       set work.1;
       by ticker iticker date_f ; /*sorting by stock, fund and date */

       lag_prc=lag(prc); /* creating the lagged price variable

       if first.iticker then /* if the it's a first date for the grouped unit then.. */
               lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */
       return = (prc - lag_prc)/ lag_prc;
       drop lag_prc;
run;

The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing.
I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky.

Any ideas how to accomplish this? Would appreciate any hints,

Arsenio


--------------------------------------------------------------------------------
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Arsenio
Robert,

Thanks for the reply. I tried plm approach but apparently it only accepts one individual id and one time id, because this fails

> zz=pdata.frame(z, c("Stock","Fund","Date"))
Error in match(x, table, nomatch = 0L) :
  'match' requires vector arguments

Any approach that accepts several individual observation identifiers? I will take a look at the econometrics view on CRAN

Btw, sorry if the post came out mangled due to HTML, didn't check.

Thanks,

Arsenio

Tuesday, March 6, 2012, 5:24:35 AM, you wrote:


Use diff and lag from plm:
 
Lines="ticker date_f  date prc iticker
A 20-Jun-03  15876 19.06 IXT
A 23-Jun-03  15879 18.89 IXT
A 24-Jun-03  15880 18.75 IXT
A 25-Jun-03  15881 18.92 IXT
A 26-Jun-03  15882 19.35 IXT
A 27-Jun-03  15883 19.55 IXT
AA 20-Jun-03 15876 26.24 IXB
AA 23-Jun-03 15879 25.07 IXB
AA 24-Jun-03 15880 24.96 IXB
AA 25-Jun-03 15881 24.51 IXB
AA 26-Jun-03 15882 25.24 IXB
AA 27-Jun-03 15883 25.09 IXB
AA 30-Jun-03 15886 25.5 IXB
"
z <- read.table(textConnection(Lines), header = TRUE)
library(plm)
zz=pdata.frame(z, c("ticker","date"))
pr=zz$prc; str(pr)
zz$return=diff(pr)/lag(pr)
zz
 
Good luck!
 
Robert
 
From: Arsenio
Sent: Tuesday, March 06, 2012 2:47 AM
To: [hidden email]
Subject: [R-SIG-Finance] Return calculation for panel data structure
 
Dear all,

I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form:
R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable.


ticker
date_f

date
prc
iticker
A
20-Jun-03

15876
19.06
IXT
A
23-Jun-03

15879
18.89
IXT
A
24-Jun-03

15880
18.75
IXT
A
25-Jun-03

15881
18.92
IXT
A
26-Jun-03

15882
19.35
IXT
A
27-Jun-03

15883
19.55
IXT
AA
20-Jun-03

15876
26.24
IXB
AA
23-Jun-03

15879
25.07
IXB
AA
24-Jun-03

15880
24.96
IXB
AA
25-Jun-03

15881
24.51
IXB
AA
26-Jun-03

15882
25.24
IXB
AA
27-Jun-03

15883
25.09
IXB
AA
30-Jun-03

15886
25.5
IXB



In  SAS, I would do it this way:

data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */
       set work.1;
       by ticker iticker date_f ; /*sorting by stock, fund and date */

       lag_prc=lag(prc); /* creating the lagged price variable

       if first.iticker then /* if the it's a first date for the grouped unit then.. */
               lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */
       return = (prc - lag_prc)/ lag_prc;
       drop lag_prc;
run;

The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing.
I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky.

Any ideas how to accomplish this? Would appreciate any hints,

Arsenio

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.




--
Best regards,
 Arsenio                            mailto:[hidden email]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Matthew Clegg
Here is another approach that might work for you.

apply_by_groups <- function (T, cols, F, ...) {
  # On input, T is a dataframe, and cols is a list of column names.
  # Groups the dataframe T into groups according to the specified
  # column names, e.g., all rows in a given group will have identical
  # values in the specified columns.  Within each group, applies the
  # function F to the dataframe representing that group.  Returns the
  # list of results.

  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
  ends <- c(starts[2:length(starts)]-1, nrow(NT))
  lapply(1:length(starts), function(k) {
F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
}
ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}

> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
ret(df$prc); df} ))
   ticker    date_f  date   prc iticker          ret
1       A 20-Jun-03 15876 19.06     IXT           NA
2       A 23-Jun-03 15879 18.89     IXT -0.008919203
3       A 24-Jun-03 15880 18.75     IXT -0.007411329
4       A 25-Jun-03 15881 18.92     IXT  0.009066667
5       A 26-Jun-03 15882 19.35     IXT  0.022727273
6       A 27-Jun-03 15883 19.55     IXT  0.010335917
7      AA 20-Jun-03 15876 26.24     IXB           NA
8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
13     AA 30-Jun-03 15886 25.50     IXB  0.016341172

In the call to apply_by_groups, you can replace "ticker" with a list of
names, e.g., c("ticker","fund","manager"), etc.

Matthew Clegg

On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote:

> Robert,
>
> Thanks for the reply. I tried plm approach but apparently it only accepts
> one individual id and one time id, because this fails
>
> > zz=pdata.frame(z, c("Stock","Fund","Date"))
> Error in match(x, table, nomatch = 0L) :
>  'match' requires vector arguments
>
> Any approach that accepts several individual observation identifiers? I
> will take a look at the econometrics view on CRAN
>
> Btw, sorry if the post came out mangled due to HTML, didn't check.
>
> Thanks,
>
> Arsenio
>
> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>
>
> Use diff and lag from plm:
>
> Lines="ticker date_f  date prc iticker
> A 20-Jun-03  15876 19.06 IXT
> A 23-Jun-03  15879 18.89 IXT
> A 24-Jun-03  15880 18.75 IXT
> A 25-Jun-03  15881 18.92 IXT
> A 26-Jun-03  15882 19.35 IXT
> A 27-Jun-03  15883 19.55 IXT
> AA 20-Jun-03 15876 26.24 IXB
> AA 23-Jun-03 15879 25.07 IXB
> AA 24-Jun-03 15880 24.96 IXB
> AA 25-Jun-03 15881 24.51 IXB
> AA 26-Jun-03 15882 25.24 IXB
> AA 27-Jun-03 15883 25.09 IXB
> AA 30-Jun-03 15886 25.5 IXB
> "
> z <- read.table(textConnection(Lines), header = TRUE)
> library(plm)
> zz=pdata.frame(z, c("ticker","date"))
> pr=zz$prc; str(pr)
> zz$return=diff(pr)/lag(pr)
> zz
>
> Good luck!
>
> Robert
>
> From: Arsenio
> Sent: Tuesday, March 06, 2012 2:47 AM
> To: [hidden email]
> Subject: [R-SIG-Finance] Return calculation for panel data structure
>
> Dear all,
>
> I have a very simple question and i haven't been able to code it out. It's
> a simple return calculation of a form:
> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
> However it's in the panel data, where stocks belong to say a fund or some
> other grouping variable.
>
>
> ticker
> date_f
>
> date
> prc
> iticker
> A
> 20-Jun-03
>
> 15876
> 19.06
> IXT
> A
> 23-Jun-03
>
> 15879
> 18.89
> IXT
> A
> 24-Jun-03
>
> 15880
> 18.75
> IXT
> A
> 25-Jun-03
>
> 15881
> 18.92
> IXT
> A
> 26-Jun-03
>
> 15882
> 19.35
> IXT
> A
> 27-Jun-03
>
> 15883
> 19.55
> IXT
> AA
> 20-Jun-03
>
> 15876
> 26.24
> IXB
> AA
> 23-Jun-03
>
> 15879
> 25.07
> IXB
> AA
> 24-Jun-03
>
> 15880
> 24.96
> IXB
> AA
> 25-Jun-03
>
> 15881
> 24.51
> IXB
> AA
> 26-Jun-03
>
> 15882
> 25.24
> IXB
> AA
> 27-Jun-03
>
> 15883
> 25.09
> IXB
> AA
> 30-Jun-03
>
> 15886
> 25.5
> IXB
>
>
>
> In  SAS, I would do it this way:
>
> data work.1; /* This Calcs the excluded index return for INDEX constructed
> with PRICES (PR)!!!! */
>       set work.1;
>       by ticker iticker date_f ; /*sorting by stock, fund and date */
>
>       lag_prc=lag(prc); /* creating the lagged price variable
>
>       if first.iticker then /* if the it's a first date for the grouped
> unit then.. */
>               lag_prc= . ; /* setting the first lagged value to missing if
> the first date for the grouped unit */
>       return = (prc - lag_prc)/ lag_prc;
>       drop lag_prc;
> run;
>
> The end result would be the same dataset with just one more column for
> returns for each stock, fund, date combination with the first return for
> the date set to missing.
> I tried Return.calculate type of funcs from performance analytics and
> other solutions using plyr or loops, but they are all too bulky.
>
> Any ideas how to accomplish this? Would appreciate any hints,
>
> Arsenio
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>
>
>
>
> --
> Best regards,
>  Arsenio                            mailto:[hidden email]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>



--
Matthew Clegg
[hidden email]

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Jeffrey Ryan-2
Built into R is ?split or ?aggregate and ?lapply which should serve you well.

Using base R is often the fastest way to skin the cat, if not the prettiest ;-)


> library(quantmod)   # for Delt
> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))

 ticker    date_f  date   prc iticker Delt.1.arithmetic
1       A 20-Jun-03 15876 19.06     IXT                NA
2       A 23-Jun-03 15879 18.89     IXT      -0.008919203
3       A 24-Jun-03 15880 18.75     IXT      -0.007411329
4       A 25-Jun-03 15881 18.92     IXT       0.009066667
5       A 26-Jun-03 15882 19.35     IXT       0.022727273
6       A 27-Jun-03 15883 19.55     IXT       0.010335917
7      AA 20-Jun-03 15876 26.24     IXB                NA
8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415
9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714
10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846
11     AA 26-Jun-03 15882 25.24     IXB       0.029783762
12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948
13     AA 30-Jun-03 15886 25.50     IXB       0.016341172

See ?split for details on multiple identifiers ( just a list, but your
example didn't include it so I won't either)

The do.call(rbind/cbind stuff is just to get the original data.frame
back - there are more verbose ways to do the same thing.

Jeff


On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg <[hidden email]> wrote:

> Here is another approach that might work for you.
>
> apply_by_groups <- function (T, cols, F, ...) {
>  # On input, T is a dataframe, and cols is a list of column names.
>  # Groups the dataframe T into groups according to the specified
>  # column names, e.g., all rows in a given group will have identical
>  # values in the specified columns.  Within each group, applies the
>  # function F to the dataframe representing that group.  Returns the
>  # list of results.
>
>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>  ends <- c(starts[2:length(starts)]-1, nrow(NT))
>  lapply(1:length(starts), function(k) {
> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
> }
> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}
>
>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
> ret(df$prc); df} ))
>   ticker    date_f  date   prc iticker          ret
> 1       A 20-Jun-03 15876 19.06     IXT           NA
> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
> 7      AA 20-Jun-03 15876 26.24     IXB           NA
> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172
>
> In the call to apply_by_groups, you can replace "ticker" with a list of
> names, e.g., c("ticker","fund","manager"), etc.
>
> Matthew Clegg
>
> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote:
>
>> Robert,
>>
>> Thanks for the reply. I tried plm approach but apparently it only accepts
>> one individual id and one time id, because this fails
>>
>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>> Error in match(x, table, nomatch = 0L) :
>>  'match' requires vector arguments
>>
>> Any approach that accepts several individual observation identifiers? I
>> will take a look at the econometrics view on CRAN
>>
>> Btw, sorry if the post came out mangled due to HTML, didn't check.
>>
>> Thanks,
>>
>> Arsenio
>>
>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>>
>>
>> Use diff and lag from plm:
>>
>> Lines="ticker date_f  date prc iticker
>> A 20-Jun-03  15876 19.06 IXT
>> A 23-Jun-03  15879 18.89 IXT
>> A 24-Jun-03  15880 18.75 IXT
>> A 25-Jun-03  15881 18.92 IXT
>> A 26-Jun-03  15882 19.35 IXT
>> A 27-Jun-03  15883 19.55 IXT
>> AA 20-Jun-03 15876 26.24 IXB
>> AA 23-Jun-03 15879 25.07 IXB
>> AA 24-Jun-03 15880 24.96 IXB
>> AA 25-Jun-03 15881 24.51 IXB
>> AA 26-Jun-03 15882 25.24 IXB
>> AA 27-Jun-03 15883 25.09 IXB
>> AA 30-Jun-03 15886 25.5 IXB
>> "
>> z <- read.table(textConnection(Lines), header = TRUE)
>> library(plm)
>> zz=pdata.frame(z, c("ticker","date"))
>> pr=zz$prc; str(pr)
>> zz$return=diff(pr)/lag(pr)
>> zz
>>
>> Good luck!
>>
>> Robert
>>
>> From: Arsenio
>> Sent: Tuesday, March 06, 2012 2:47 AM
>> To: [hidden email]
>> Subject: [R-SIG-Finance] Return calculation for panel data structure
>>
>> Dear all,
>>
>> I have a very simple question and i haven't been able to code it out. It's
>> a simple return calculation of a form:
>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>> However it's in the panel data, where stocks belong to say a fund or some
>> other grouping variable.
>>
>>
>> ticker
>> date_f
>>
>> date
>> prc
>> iticker
>> A
>> 20-Jun-03
>>
>> 15876
>> 19.06
>> IXT
>> A
>> 23-Jun-03
>>
>> 15879
>> 18.89
>> IXT
>> A
>> 24-Jun-03
>>
>> 15880
>> 18.75
>> IXT
>> A
>> 25-Jun-03
>>
>> 15881
>> 18.92
>> IXT
>> A
>> 26-Jun-03
>>
>> 15882
>> 19.35
>> IXT
>> A
>> 27-Jun-03
>>
>> 15883
>> 19.55
>> IXT
>> AA
>> 20-Jun-03
>>
>> 15876
>> 26.24
>> IXB
>> AA
>> 23-Jun-03
>>
>> 15879
>> 25.07
>> IXB
>> AA
>> 24-Jun-03
>>
>> 15880
>> 24.96
>> IXB
>> AA
>> 25-Jun-03
>>
>> 15881
>> 24.51
>> IXB
>> AA
>> 26-Jun-03
>>
>> 15882
>> 25.24
>> IXB
>> AA
>> 27-Jun-03
>>
>> 15883
>> 25.09
>> IXB
>> AA
>> 30-Jun-03
>>
>> 15886
>> 25.5
>> IXB
>>
>>
>>
>> In  SAS, I would do it this way:
>>
>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>> with PRICES (PR)!!!! */
>>       set work.1;
>>       by ticker iticker date_f ; /*sorting by stock, fund and date */
>>
>>       lag_prc=lag(prc); /* creating the lagged price variable
>>
>>       if first.iticker then /* if the it's a first date for the grouped
>> unit then.. */
>>               lag_prc= . ; /* setting the first lagged value to missing if
>> the first date for the grouped unit */
>>       return = (prc - lag_prc)/ lag_prc;
>>       drop lag_prc;
>> run;
>>
>> The end result would be the same dataset with just one more column for
>> returns for each stock, fund, date combination with the first return for
>> the date set to missing.
>> I tried Return.calculate type of funcs from performance analytics and
>> other solutions using plyr or loops, but they are all too bulky.
>>
>> Any ideas how to accomplish this? Would appreciate any hints,
>>
>> Arsenio
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>>
>>
>>
>> --
>> Best regards,
>>  Arsenio                            mailto:[hidden email]
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>
>
> --
> Matthew Clegg
> [hidden email]
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



--
Jeffrey Ryan
[hidden email]

www.lemnica.com
www.esotericR.com

R/Finance 2012: Applied Finance with R
www.RinFinance.com

See you in Chicago!!!!

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Arsenio
Jeffrey,

Thanks a lot!
I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ?

Warning message:
In split.default(seq_len(nrow(x)), f, drop = drop, ...) :
  data length is not a multiple of split variable

I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated.

Thanks for the Delt function that's quite a shortcut!

Arsenio



Wednesday, March 7, 2012, 8:14:31 AM, you wrote:

> Built into R is ?split or ?aggregate and ?lapply which should serve you well.

> Using base R is often the fastest way to skin the cat, if not the prettiest ;-)


>> library(quantmod)   # for Delt
>> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))

>  ticker    date_f  date   prc iticker Delt.1.arithmetic
> 1       A 20-Jun-03 15876 19.06     IXT                NA
> 2       A 23-Jun-03 15879 18.89     IXT      -0.008919203
> 3       A 24-Jun-03 15880 18.75     IXT      -0.007411329
> 4       A 25-Jun-03 15881 18.92     IXT       0.009066667
> 5       A 26-Jun-03 15882 19.35     IXT       0.022727273
> 6       A 27-Jun-03 15883 19.55     IXT       0.010335917
> 7      AA 20-Jun-03 15876 26.24     IXB                NA
> 8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415
> 9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714
> 10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846
> 11     AA 26-Jun-03 15882 25.24     IXB       0.029783762
> 12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948
> 13     AA 30-Jun-03 15886 25.50     IXB       0.016341172

> See ?split for details on multiple identifiers ( just a list, but your
> example didn't include it so I won't either)

> The do.call(rbind/cbind stuff is just to get the original data.frame
> back - there are more verbose ways to do the same thing.

> Jeff


> On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg
> <[hidden email]> wrote:
>> Here is another approach that might work for you.

>> apply_by_groups <- function (T, cols, F, ...) {
>>  # On input, T is a dataframe, and cols is a list of column names.
>>  # Groups the dataframe T into groups according to the specified
>>  # column names, e.g., all rows in a given group will have identical
>>  # values in the specified columns.  Within each group, applies the
>>  # function F to the dataframe representing that group.  Returns the
>>  # list of results.

>>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>>  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>>  ends <- c(starts[2:length(starts)]-1, nrow(NT))
>>  lapply(1:length(starts), function(k) {
>> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
>> }
>> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}

>>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
>> ret(df$prc); df} ))
>>   ticker    date_f  date   prc iticker          ret
>> 1       A 20-Jun-03 15876 19.06     IXT           NA
>> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
>> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
>> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
>> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
>> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
>> 7      AA 20-Jun-03 15876 26.24     IXB           NA
>> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
>> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
>> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
>> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
>> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
>> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172

>> In the call to apply_by_groups, you can replace "ticker" with a list of
>> names, e.g., c("ticker","fund","manager"), etc.

>> Matthew Clegg

>> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote:

>>> Robert,

>>> Thanks for the reply. I tried plm approach but apparently it only accepts
>>> one individual id and one time id, because this fails

>>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>>> Error in match(x, table, nomatch = 0L) :
>>>  'match' requires vector arguments

>>> Any approach that accepts several individual observation identifiers? I
>>> will take a look at the econometrics view on CRAN

>>> Btw, sorry if the post came out mangled due to HTML, didn't check.

>>> Thanks,

>>> Arsenio

>>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:


>>> Use diff and lag from plm:

>>> Lines="ticker date_f  date prc iticker
>>> A 20-Jun-03  15876 19.06 IXT
>>> A 23-Jun-03  15879 18.89 IXT
>>> A 24-Jun-03  15880 18.75 IXT
>>> A 25-Jun-03  15881 18.92 IXT
>>> A 26-Jun-03  15882 19.35 IXT
>>> A 27-Jun-03  15883 19.55 IXT
>>> AA 20-Jun-03 15876 26.24 IXB
>>> AA 23-Jun-03 15879 25.07 IXB
>>> AA 24-Jun-03 15880 24.96 IXB
>>> AA 25-Jun-03 15881 24.51 IXB
>>> AA 26-Jun-03 15882 25.24 IXB
>>> AA 27-Jun-03 15883 25.09 IXB
>>> AA 30-Jun-03 15886 25.5 IXB
>>> "
>>> z <- read.table(textConnection(Lines), header = TRUE)
>>> library(plm)
>>> zz=pdata.frame(z, c("ticker","date"))
>>> pr=zz$prc; str(pr)
>>> zz$return=diff(pr)/lag(pr)
>>> zz

>>> Good luck!

>>> Robert

>>> From: Arsenio
>>> Sent: Tuesday, March 06, 2012 2:47 AM
>>> To: [hidden email]
>>> Subject: [R-SIG-Finance] Return calculation for panel data structure

>>> Dear all,

>>> I have a very simple question and i haven't been able to code it out. It's
>>> a simple return calculation of a form:
>>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>>> However it's in the panel data, where stocks belong to say a fund or some
>>> other grouping variable.


>>> ticker
>>> date_f

>>> date
>>> prc
>>> iticker
>>> A
>>> 20-Jun-03

>>> 15876
>>> 19.06
>>> IXT
>>> A
>>> 23-Jun-03

>>> 15879
>>> 18.89
>>> IXT
>>> A
>>> 24-Jun-03

>>> 15880
>>> 18.75
>>> IXT
>>> A
>>> 25-Jun-03

>>> 15881
>>> 18.92
>>> IXT
>>> A
>>> 26-Jun-03

>>> 15882
>>> 19.35
>>> IXT
>>> A
>>> 27-Jun-03

>>> 15883
>>> 19.55
>>> IXT
>>> AA
>>> 20-Jun-03

>>> 15876
>>> 26.24
>>> IXB
>>> AA
>>> 23-Jun-03

>>> 15879
>>> 25.07
>>> IXB
>>> AA
>>> 24-Jun-03

>>> 15880
>>> 24.96
>>> IXB
>>> AA
>>> 25-Jun-03

>>> 15881
>>> 24.51
>>> IXB
>>> AA
>>> 26-Jun-03

>>> 15882
>>> 25.24
>>> IXB
>>> AA
>>> 27-Jun-03

>>> 15883
>>> 25.09
>>> IXB
>>> AA
>>> 30-Jun-03

>>> 15886
>>> 25.5
>>> IXB



>>> In  SAS, I would do it this way:

>>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>>> with PRICES (PR)!!!! */
>>>       set work.1;
>>>       by ticker iticker date_f ; /*sorting by stock, fund and date */

>>>       lag_prc=lag(prc); /* creating the lagged price variable

>>>       if first.iticker then /* if the it's a first date for the grouped
>>> unit then.. */
>>>               lag_prc= . ; /* setting the first lagged value to missing if
>>> the first date for the grouped unit */
>>>       return = (prc - lag_prc)/ lag_prc;
>>>       drop lag_prc;
>>> run;

>>> The end result would be the same dataset with just one more column for
>>> returns for each stock, fund, date combination with the first return for
>>> the date set to missing.
>>> I tried Return.calculate type of funcs from performance analytics and
>>> other solutions using plyr or loops, but they are all too bulky.

>>> Any ideas how to accomplish this? Would appreciate any hints,

>>> Arsenio

>>> _______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions
>>> should go.




>>> --
>>> Best regards,
>>>  Arsenio                            mailto:[hidden email]

>>> _______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions
>>> should go.




>> --
>> Matthew Clegg
>> [hidden email]

>>        [[alternative HTML version deleted]]

>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions should go.






--
Best regards,
 Arsenio                            mailto:[hidden email]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Arsenio
In reply to this post by Matthew Clegg
Matthew,

Thanks for the code! I have been trying to analyze this, seems to work ok. It's a bit hard to understand this for me, still a newbie at R.

Thanks,

Arsenio

Tuesday, March 6, 2012, 5:38:47 PM, you wrote:

> Here is another approach that might work for you.

> apply_by_groups <- function (T, cols, F, ...) {
>   # On input, T is a dataframe, and cols is a list of column names.
>   # Groups the dataframe T into groups according to the specified
>   # column names, e.g., all rows in a given group will have identical
>   # values in the specified columns.  Within each group, applies the
>   # function F to the dataframe representing that group.  Returns the
>   # list of results.
>  
>   NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>   change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>   starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>   ends <- c(starts[2:length(starts)]-1, nrow(NT))
>   lapply(1:length(starts), function(k) {
> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
> }
> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}

>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- ret(df$prc); df} ))
>    ticker    date_f  date   prc iticker          ret
> 1       A 20-Jun-03 15876 19.06     IXT           NA
> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
> 7      AA 20-Jun-03 15876 26.24     IXB           NA
> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172

> In the call to apply_by_groups, you can replace "ticker" with a
> list of names, e.g., c("ticker","fund","manager"), etc.

> Matthew Clegg

> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote:
> Robert,

> Thanks for the reply. I tried plm approach but apparently it only
> accepts one individual id and one time id, because this fails

>> zz=pdata.frame(z, c("Stock","Fund","Date"))
> Error in match(x, table, nomatch = 0L) :
>  'match' requires vector arguments

> Any approach that accepts several individual observation
> identifiers? I will take a look at the econometrics view on CRAN

> Btw, sorry if the post came out mangled due to HTML, didn't check.

> Thanks,

> Arsenio

> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:


> Use diff and lag from plm:

> Lines="ticker date_f  date prc iticker
> A 20-Jun-03  15876 19.06 IXT
> A 23-Jun-03  15879 18.89 IXT
> A 24-Jun-03  15880 18.75 IXT
> A 25-Jun-03  15881 18.92 IXT
> A 26-Jun-03  15882 19.35 IXT
> A 27-Jun-03  15883 19.55 IXT
> AA 20-Jun-03 15876 26.24 IXB
> AA 23-Jun-03 15879 25.07 IXB
> AA 24-Jun-03 15880 24.96 IXB
> AA 25-Jun-03 15881 24.51 IXB
> AA 26-Jun-03 15882 25.24 IXB
> AA 27-Jun-03 15883 25.09 IXB
> AA 30-Jun-03 15886 25.5 IXB
> "
> z <- read.table(textConnection(Lines), header = TRUE)
> library(plm)
> zz=pdata.frame(z, c("ticker","date"))
> pr=zz$prc; str(pr)
> zz$return=diff(pr)/lag(pr)
> zz

> Good luck!

> Robert

> From: Arsenio
> Sent: Tuesday, March 06, 2012 2:47 AM
> To: [hidden email]
> Subject: [R-SIG-Finance] Return calculation for panel data structure

> Dear all,

> I have a very simple question and i haven't been able to code it
> out. It's a simple return calculation of a form:
> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
> However it's in the panel data, where stocks belong to say a fund or some other grouping variable.


> ticker
> date_f

> date
> prc
> iticker
> A
> 20-Jun-03

> 15876
> 19.06
> IXT
> A
> 23-Jun-03

> 15879
> 18.89
> IXT
> A
> 24-Jun-03

> 15880
> 18.75
> IXT
> A
> 25-Jun-03

> 15881
> 18.92
> IXT
> A
> 26-Jun-03

> 15882
> 19.35
> IXT
> A
> 27-Jun-03

> 15883
> 19.55
> IXT
> AA
> 20-Jun-03

> 15876
> 26.24
> IXB
> AA
> 23-Jun-03

> 15879
> 25.07
> IXB
> AA
> 24-Jun-03

> 15880
> 24.96
> IXB
> AA
> 25-Jun-03

> 15881
> 24.51
> IXB
> AA
> 26-Jun-03

> 15882
> 25.24
> IXB
> AA
> 27-Jun-03

> 15883
> 25.09
> IXB
> AA
> 30-Jun-03

> 15886
> 25.5
> IXB



> In  SAS, I would do it this way:

> data work.1; /* This Calcs the excluded index return for INDEX
> constructed with PRICES (PR)!!!! */
>       set work.1;
>       by ticker iticker date_f ; /*sorting by stock, fund and date */

>       lag_prc=lag(prc); /* creating the lagged price variable

>       if first.iticker then /* if the it's a first date for the grouped unit then.. */
>               lag_prc= . ; /* setting the first lagged value to
> missing if the first date for the grouped unit */
>       return = (prc - lag_prc)/ lag_prc;
>       drop lag_prc;
> run;

> The end result would be the same dataset with just one more column
> for returns for each stock, fund, date combination with the first
> return for the date set to missing.
> I tried Return.calculate type of funcs from performance analytics
> and other solutions using plyr or loops, but they are all too bulky.

> Any ideas how to accomplish this? Would appreciate any hints,

> Arsenio

> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.




> --
> Best regards,
>  Arsenio                            mailto:[hidden email]

> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.






--
Best regards,
 Arsenio                            mailto:[hidden email]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Matthew Clegg
Hi Arsenio,

Glad to help.  Here is how the code works.  First, it sorts the rows of the
dataframe T according to the columns that will be used for grouping.  After
this step is complete, all of the elements of a group will be stored in
consecutive rows in the new table NT.  Then, it computes the starting and
ending indexes of each group.  Row i represents the start of a new group if
one of the elements of the grouping columns differs from its value in the
previous row i-1.  Finally, the function F is applied to each of the
groups.  If you have further questions, I would be happy to help, although
perhaps it would be best to email me directly so as to avoid taking up
bandwidth on the list.

Matthew Clegg

On Thu, Mar 8, 2012 at 11:46 PM, Arsenio <[hidden email]> wrote:

> Matthew,
>
> Thanks for the code! I have been trying to analyze this, seems to work ok.
> It's a bit hard to understand this for me, still a newbie at R.
>
> Thanks,
>
> Arsenio
>
> Tuesday, March 6, 2012, 5:38:47 PM, you wrote:
>
> > Here is another approach that might work for you.
>
> > apply_by_groups <- function (T, cols, F, ...) {
> >   # On input, T is a dataframe, and cols is a list of column names.
> >   # Groups the dataframe T into groups according to the specified
> >   # column names, e.g., all rows in a given group will have identical
> >   # values in the specified columns.  Within each group, applies the
> >   # function F to the dataframe representing that group.  Returns the
> >   # list of results.
> >
> >   NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
> >   change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
> >   starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
> >   ends <- c(starts[2:length(starts)]-1, nrow(NT))
> >   lapply(1:length(starts), function(k) {
> > F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
> > }
> > ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}
>
> >> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
> ret(df$prc); df} ))
> >    ticker    date_f  date   prc iticker          ret
> > 1       A 20-Jun-03 15876 19.06     IXT           NA
> > 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
> > 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
> > 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
> > 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
> > 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
> > 7      AA 20-Jun-03 15876 26.24     IXB           NA
> > 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
> > 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
> > 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
> > 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
> > 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
> > 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172
>
> > In the call to apply_by_groups, you can replace "ticker" with a
> > list of names, e.g., c("ticker","fund","manager"), etc.
>
> > Matthew Clegg
>
> > On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote:
> > Robert,
>
> > Thanks for the reply. I tried plm approach but apparently it only
> > accepts one individual id and one time id, because this fails
>
> >> zz=pdata.frame(z, c("Stock","Fund","Date"))
> > Error in match(x, table, nomatch = 0L) :
> >  'match' requires vector arguments
>
> > Any approach that accepts several individual observation
> > identifiers? I will take a look at the econometrics view on CRAN
>
> > Btw, sorry if the post came out mangled due to HTML, didn't check.
>
> > Thanks,
>
> > Arsenio
>
> > Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>
>
> > Use diff and lag from plm:
>
> > Lines="ticker date_f  date prc iticker
> > A 20-Jun-03  15876 19.06 IXT
> > A 23-Jun-03  15879 18.89 IXT
> > A 24-Jun-03  15880 18.75 IXT
> > A 25-Jun-03  15881 18.92 IXT
> > A 26-Jun-03  15882 19.35 IXT
> > A 27-Jun-03  15883 19.55 IXT
> > AA 20-Jun-03 15876 26.24 IXB
> > AA 23-Jun-03 15879 25.07 IXB
> > AA 24-Jun-03 15880 24.96 IXB
> > AA 25-Jun-03 15881 24.51 IXB
> > AA 26-Jun-03 15882 25.24 IXB
> > AA 27-Jun-03 15883 25.09 IXB
> > AA 30-Jun-03 15886 25.5 IXB
> > "
> > z <- read.table(textConnection(Lines), header = TRUE)
> > library(plm)
> > zz=pdata.frame(z, c("ticker","date"))
> > pr=zz$prc; str(pr)
> > zz$return=diff(pr)/lag(pr)
> > zz
>
> > Good luck!
>
> > Robert
>
> > From: Arsenio
> > Sent: Tuesday, March 06, 2012 2:47 AM
> > To: [hidden email]
> > Subject: [R-SIG-Finance] Return calculation for panel data structure
>
> > Dear all,
>
> > I have a very simple question and i haven't been able to code it
> > out. It's a simple return calculation of a form:
> > R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
> > However it's in the panel data, where stocks belong to say a fund or
> some other grouping variable.
>
>
> > ticker
> > date_f
>
> > date
> > prc
> > iticker
> > A
> > 20-Jun-03
>
> > 15876
> > 19.06
> > IXT
> > A
> > 23-Jun-03
>
> > 15879
> > 18.89
> > IXT
> > A
> > 24-Jun-03
>
> > 15880
> > 18.75
> > IXT
> > A
> > 25-Jun-03
>
> > 15881
> > 18.92
> > IXT
> > A
> > 26-Jun-03
>
> > 15882
> > 19.35
> > IXT
> > A
> > 27-Jun-03
>
> > 15883
> > 19.55
> > IXT
> > AA
> > 20-Jun-03
>
> > 15876
> > 26.24
> > IXB
> > AA
> > 23-Jun-03
>
> > 15879
> > 25.07
> > IXB
> > AA
> > 24-Jun-03
>
> > 15880
> > 24.96
> > IXB
> > AA
> > 25-Jun-03
>
> > 15881
> > 24.51
> > IXB
> > AA
> > 26-Jun-03
>
> > 15882
> > 25.24
> > IXB
> > AA
> > 27-Jun-03
>
> > 15883
> > 25.09
> > IXB
> > AA
> > 30-Jun-03
>
> > 15886
> > 25.5
> > IXB
>
>
>
> > In  SAS, I would do it this way:
>
> > data work.1; /* This Calcs the excluded index return for INDEX
> > constructed with PRICES (PR)!!!! */
> >       set work.1;
> >       by ticker iticker date_f ; /*sorting by stock, fund and date */
>
> >       lag_prc=lag(prc); /* creating the lagged price variable
>
> >       if first.iticker then /* if the it's a first date for the grouped
> unit then.. */
> >               lag_prc= . ; /* setting the first lagged value to
> > missing if the first date for the grouped unit */
> >       return = (prc - lag_prc)/ lag_prc;
> >       drop lag_prc;
> > run;
>
> > The end result would be the same dataset with just one more column
> > for returns for each stock, fund, date combination with the first
> > return for the date set to missing.
> > I tried Return.calculate type of funcs from performance analytics
> > and other solutions using plyr or loops, but they are all too bulky.
>
> > Any ideas how to accomplish this? Would appreciate any hints,
>
> > Arsenio
>
> > _______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > -- Subscriber-posting only. If you want to post, subscribe first.
> > -- Also note that this is not the r-help list where general R questions
> should go.
>
>
>
>
> > --
> > Best regards,
> >  Arsenio                            mailto:[hidden email]
>
> > _______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > -- Subscriber-posting only. If you want to post, subscribe first.
> > -- Also note that this is not the r-help list where general R questions
> should go.
>
>
>
>
>
>
> --
> Best regards,
>  Arsenio                            mailto:[hidden email]
>
>


--
Matthew Clegg
[hidden email]

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Return calculation for panel data structure

Jeffrey Ryan-2
In reply to this post by Arsenio
It should work, but you didn't provide the data that it is breaking on, so...

Hijacking the iticker column, we can make that our second identifier
to group with:

z <- rbind(z,transform(z,iticker=paste(iticker,2,sep="")))

#... some z to see what this is doing
#
#11     AA 26-Jun-03 15882 25.24     IXB
#12     AA 27-Jun-03 15883 25.09     IXB
#13     AA 30-Jun-03 15886 25.50     IXB
#14      A 20-Jun-03 15876 19.06    IXT2
#15      A 23-Jun-03 15879 18.89    IXT2
#16      A 24-Jun-03 15880 18.75    IXT2
#17      A 25-Jun-03 15881 18.92    IXT2
#...

And now to get return calculations by ticker, iticker do this:

> cbind(z,do.call(rbind,lapply(split(z,list(z$ticker,z$iticker)), function(x) Delt(x$prc))))

   ticker    date_f  date   prc iticker Delt.1.arithmetic
1       A 20-Jun-03 15876 19.06     IXT                NA
2       A 23-Jun-03 15879 18.89     IXT      -0.044588415
3       A 24-Jun-03 15880 18.75     IXT      -0.004387714
4       A 25-Jun-03 15881 18.92     IXT      -0.018028846
5       A 26-Jun-03 15882 19.35     IXT       0.029783762
6       A 27-Jun-03 15883 19.55     IXT      -0.005942948
7      AA 20-Jun-03 15876 26.24     IXB       0.016341172
8      AA 23-Jun-03 15879 25.07     IXB                NA
9      AA 24-Jun-03 15880 24.96     IXB      -0.008919203
10     AA 25-Jun-03 15881 24.51     IXB      -0.007411329
11     AA 26-Jun-03 15882 25.24     IXB       0.009066667
12     AA 27-Jun-03 15883 25.09     IXB       0.022727273
13     AA 30-Jun-03 15886 25.50     IXB       0.010335917
14      A 20-Jun-03 15876 19.06    IXT2                NA
15      A 23-Jun-03 15879 18.89    IXT2      -0.008919203
16      A 24-Jun-03 15880 18.75    IXT2      -0.007411329
17      A 25-Jun-03 15881 18.92    IXT2       0.009066667
18      A 26-Jun-03 15882 19.35    IXT2       0.022727273
19      A 27-Jun-03 15883 19.55    IXT2       0.010335917
20     AA 20-Jun-03 15876 26.24    IXB2                NA
21     AA 23-Jun-03 15879 25.07    IXB2      -0.044588415
22     AA 24-Jun-03 15880 24.96    IXB2      -0.004387714
23     AA 25-Jun-03 15881 24.51    IXB2      -0.018028846
24     AA 26-Jun-03 15882 25.24    IXB2       0.029783762
25     AA 27-Jun-03 15883 25.09    IXB2      -0.005942948
26     AA 30-Jun-03 15886 25.50    IXB2       0.016341172

If you have a missing value in one of your identifiers (a factor
here), you will get an error, but na.omit() called on 'z' will remove
it before processing. If you don't have the value I don't see how you
could properly group it anyway though.

As far as 'base R being too complicated for SAS users' ;-) ... I still
think you are much better off sticking as close to the core language
as you can - as that gives you the most understanding and flexibility.
 You could also functionalize the scary parts:

make_it_happen <- function(z) {
   require("quantmod")
   cbind(z,do.call(rbind,lapply(split(z,list(z$ticker,z$iticker)),
function(x) Delt(x$prc))))
}
make_it_happen(z)


Joking aside, the code is actually not bad if you take it apart as
Matthew did with his.

1) split(z, list(z$ticker, z$iticker))
returns a list of data.frames that are grouped by your identifiers
(factors in R-speak)

2) lapply uses each element in the list and applies the function(x) ... to it.

3) do.call(rbind takes the list returned by lapply (columns of
returns) and row binds them

4) cbind just attached the new column from (3) to the original 'z'

With that, we've diverged pretty hard from "finance" - so further
questions should probably be sent to R-help.

HTH
Jeff

On Thu, Mar 8, 2012 at 10:43 PM, Arsenio <[hidden email]> wrote:

> Jeffrey,
>
> Thanks a lot!
> I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ?
>
> Warning message:
> In split.default(seq_len(nrow(x)), f, drop = drop, ...) :
>  data length is not a multiple of split variable
>
> I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated.
>
> Thanks for the Delt function that's quite a shortcut!
>
> Arsenio
>
>
>
> Wednesday, March 7, 2012, 8:14:31 AM, you wrote:
>
>> Built into R is ?split or ?aggregate and ?lapply which should serve you well.
>
>> Using base R is often the fastest way to skin the cat, if not the prettiest ;-)
>
>
>>> library(quantmod)   # for Delt
>>> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))
>
>>  ticker    date_f  date   prc iticker Delt.1.arithmetic
>> 1       A 20-Jun-03 15876 19.06     IXT                NA
>> 2       A 23-Jun-03 15879 18.89     IXT      -0.008919203
>> 3       A 24-Jun-03 15880 18.75     IXT      -0.007411329
>> 4       A 25-Jun-03 15881 18.92     IXT       0.009066667
>> 5       A 26-Jun-03 15882 19.35     IXT       0.022727273
>> 6       A 27-Jun-03 15883 19.55     IXT       0.010335917
>> 7      AA 20-Jun-03 15876 26.24     IXB                NA
>> 8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415
>> 9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714
>> 10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846
>> 11     AA 26-Jun-03 15882 25.24     IXB       0.029783762
>> 12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948
>> 13     AA 30-Jun-03 15886 25.50     IXB       0.016341172
>
>> See ?split for details on multiple identifiers ( just a list, but your
>> example didn't include it so I won't either)
>
>> The do.call(rbind/cbind stuff is just to get the original data.frame
>> back - there are more verbose ways to do the same thing.
>
>> Jeff
>
>
>> On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg
>> <[hidden email]> wrote:
>>> Here is another approach that might work for you.
>
>>> apply_by_groups <- function (T, cols, F, ...) {
>>>  # On input, T is a dataframe, and cols is a list of column names.
>>>  # Groups the dataframe T into groups according to the specified
>>>  # column names, e.g., all rows in a given group will have identical
>>>  # values in the specified columns.  Within each group, applies the
>>>  # function F to the dataframe representing that group.  Returns the
>>>  # list of results.
>
>>>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>>>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>>>  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>>>  ends <- c(starts[2:length(starts)]-1, nrow(NT))
>>>  lapply(1:length(starts), function(k) {
>>> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
>>> }
>>> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}
>
>>>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
>>> ret(df$prc); df} ))
>>>   ticker    date_f  date   prc iticker          ret
>>> 1       A 20-Jun-03 15876 19.06     IXT           NA
>>> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
>>> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
>>> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
>>> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
>>> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
>>> 7      AA 20-Jun-03 15876 26.24     IXB           NA
>>> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
>>> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
>>> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
>>> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
>>> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
>>> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172
>
>>> In the call to apply_by_groups, you can replace "ticker" with a list of
>>> names, e.g., c("ticker","fund","manager"), etc.
>
>>> Matthew Clegg
>
>>> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote:
>
>>>> Robert,
>
>>>> Thanks for the reply. I tried plm approach but apparently it only accepts
>>>> one individual id and one time id, because this fails
>
>>>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>>>> Error in match(x, table, nomatch = 0L) :
>>>>  'match' requires vector arguments
>
>>>> Any approach that accepts several individual observation identifiers? I
>>>> will take a look at the econometrics view on CRAN
>
>>>> Btw, sorry if the post came out mangled due to HTML, didn't check.
>
>>>> Thanks,
>
>>>> Arsenio
>
>>>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>
>
>>>> Use diff and lag from plm:
>
>>>> Lines="ticker date_f  date prc iticker
>>>> A 20-Jun-03  15876 19.06 IXT
>>>> A 23-Jun-03  15879 18.89 IXT
>>>> A 24-Jun-03  15880 18.75 IXT
>>>> A 25-Jun-03  15881 18.92 IXT
>>>> A 26-Jun-03  15882 19.35 IXT
>>>> A 27-Jun-03  15883 19.55 IXT
>>>> AA 20-Jun-03 15876 26.24 IXB
>>>> AA 23-Jun-03 15879 25.07 IXB
>>>> AA 24-Jun-03 15880 24.96 IXB
>>>> AA 25-Jun-03 15881 24.51 IXB
>>>> AA 26-Jun-03 15882 25.24 IXB
>>>> AA 27-Jun-03 15883 25.09 IXB
>>>> AA 30-Jun-03 15886 25.5 IXB
>>>> "
>>>> z <- read.table(textConnection(Lines), header = TRUE)
>>>> library(plm)
>>>> zz=pdata.frame(z, c("ticker","date"))
>>>> pr=zz$prc; str(pr)
>>>> zz$return=diff(pr)/lag(pr)
>>>> zz
>
>>>> Good luck!
>
>>>> Robert
>
>>>> From: Arsenio
>>>> Sent: Tuesday, March 06, 2012 2:47 AM
>>>> To: [hidden email]
>>>> Subject: [R-SIG-Finance] Return calculation for panel data structure
>
>>>> Dear all,
>
>>>> I have a very simple question and i haven't been able to code it out. It's
>>>> a simple return calculation of a form:
>>>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>>>> However it's in the panel data, where stocks belong to say a fund or some
>>>> other grouping variable.
>
>
>>>> ticker
>>>> date_f
>
>>>> date
>>>> prc
>>>> iticker
>>>> A
>>>> 20-Jun-03
>
>>>> 15876
>>>> 19.06
>>>> IXT
>>>> A
>>>> 23-Jun-03
>
>>>> 15879
>>>> 18.89
>>>> IXT
>>>> A
>>>> 24-Jun-03
>
>>>> 15880
>>>> 18.75
>>>> IXT
>>>> A
>>>> 25-Jun-03
>
>>>> 15881
>>>> 18.92
>>>> IXT
>>>> A
>>>> 26-Jun-03
>
>>>> 15882
>>>> 19.35
>>>> IXT
>>>> A
>>>> 27-Jun-03
>
>>>> 15883
>>>> 19.55
>>>> IXT
>>>> AA
>>>> 20-Jun-03
>
>>>> 15876
>>>> 26.24
>>>> IXB
>>>> AA
>>>> 23-Jun-03
>
>>>> 15879
>>>> 25.07
>>>> IXB
>>>> AA
>>>> 24-Jun-03
>
>>>> 15880
>>>> 24.96
>>>> IXB
>>>> AA
>>>> 25-Jun-03
>
>>>> 15881
>>>> 24.51
>>>> IXB
>>>> AA
>>>> 26-Jun-03
>
>>>> 15882
>>>> 25.24
>>>> IXB
>>>> AA
>>>> 27-Jun-03
>
>>>> 15883
>>>> 25.09
>>>> IXB
>>>> AA
>>>> 30-Jun-03
>
>>>> 15886
>>>> 25.5
>>>> IXB
>
>
>
>>>> In  SAS, I would do it this way:
>
>>>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>>>> with PRICES (PR)!!!! */
>>>>       set work.1;
>>>>       by ticker iticker date_f ; /*sorting by stock, fund and date */
>
>>>>       lag_prc=lag(prc); /* creating the lagged price variable
>
>>>>       if first.iticker then /* if the it's a first date for the grouped
>>>> unit then.. */
>>>>               lag_prc= . ; /* setting the first lagged value to missing if
>>>> the first date for the grouped unit */
>>>>       return = (prc - lag_prc)/ lag_prc;
>>>>       drop lag_prc;
>>>> run;
>
>>>> The end result would be the same dataset with just one more column for
>>>> returns for each stock, fund, date combination with the first return for
>>>> the date set to missing.
>>>> I tried Return.calculate type of funcs from performance analytics and
>>>> other solutions using plyr or loops, but they are all too bulky.
>
>>>> Any ideas how to accomplish this? Would appreciate any hints,
>
>>>> Arsenio
>
>>>> _______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>>> -- Also note that this is not the r-help list where general R questions
>>>> should go.
>
>
>
>
>>>> --
>>>> Best regards,
>>>>  Arsenio                            mailto:[hidden email]
>
>>>> _______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>>> -- Also note that this is not the r-help list where general R questions
>>>> should go.
>
>
>
>
>>> --
>>> Matthew Clegg
>>> [hidden email]
>
>>>        [[alternative HTML version deleted]]
>
>>> _______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions should go.
>
>
>
>
>
>
> --
> Best regards,
>  Arsenio                            mailto:[hidden email]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



--
Jeffrey Ryan
[hidden email]

www.lemnica.com
www.esotericR.com

R/Finance 2012: Applied Finance with R
www.RinFinance.com

See you in Chicago!!!!

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Loading...