# Return calculation for panel data structure

9 messages
Open this post in threaded view
|

## Return calculation for panel data structure

Return calculation for panel data structure Dear all,

I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form:
R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable.

 ticker date_f date prc iticker A 20-Jun-03 15876 19.06 IXT A 23-Jun-03 15879 18.89 IXT A 24-Jun-03 15880 18.75 IXT A 25-Jun-03 15881 18.92 IXT A 26-Jun-03 15882 19.35 IXT A 27-Jun-03 15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB

In  SAS, I would do it this way:

data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */
set work.1;
by ticker iticker date_f ; /*sorting by stock, fund and date */

lag_prc=lag(prc); /* creating the lagged price variable

if first.iticker then /* if the it's a first date for the grouped unit then.. */
lag_prc= ; /* setting the first lagged value to missing if the first date for the grouped unit */
return = (prc - lag_prc)/ lag_prc;
drop lag_prc;
run;

The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing.
I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky.

Any ideas how to accomplish this? Would appreciate any hints,

Arsenio
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: Return calculation for panel data structure

 Return calculation for panel data structureUse diff and lag from plm: Lines="ticker date_f  date prc iticker A 20-Jun-03  15876 19.06 IXT A 23-Jun-03  15879 18.89 IXT A 24-Jun-03  15880 18.75 IXT A 25-Jun-03  15881 18.92 IXT A 26-Jun-03  15882 19.35 IXT A 27-Jun-03  15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB " z <- read.table(textConnection(Lines), header = TRUE) library(plm) zz=pdata.frame(z, c("ticker","date")) pr=zz\$prc; str(pr) zz\$return=diff(pr)/lag(pr) zz Good luck! Robert From: Arsenio Sent: Tuesday, March 06, 2012 2:47 AM To: [hidden email] Subject: [R-SIG-Finance] Return calculation for panel data structure Dear all, I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form: R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable.       ticker date_f      date prc iticker       A 20-Jun-03      15876 19.06 IXT       A 23-Jun-03      15879 18.89 IXT       A 24-Jun-03      15880 18.75 IXT       A 25-Jun-03      15881 18.92 IXT       A 26-Jun-03      15882 19.35 IXT       A 27-Jun-03      15883 19.55 IXT       AA 20-Jun-03      15876 26.24 IXB       AA 23-Jun-03      15879 25.07 IXB       AA 24-Jun-03      15880 24.96 IXB       AA 25-Jun-03      15881 24.51 IXB       AA 26-Jun-03      15882 25.24 IXB       AA 27-Jun-03      15883 25.09 IXB       AA 30-Jun-03      15886 25.5 IXB In  SAS, I would do it this way: data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */        set work.1;        by ticker iticker date_f ; /*sorting by stock, fund and date */        lag_prc=lag(prc); /* creating the lagged price variable        if first.iticker then /* if the it's a first date for the grouped unit then.. */                lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */        return = (prc - lag_prc)/ lag_prc;        drop lag_prc; run; The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing. I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky. Any ideas how to accomplish this? Would appreciate any hints, Arsenio -------------------------------------------------------------------------------- _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.         [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: Return calculation for panel data structure

 Robert, Thanks for the reply. I tried plm approach but apparently it only accepts one individual id and one time id, because this fails > zz=pdata.frame(z, c("Stock","Fund","Date")) Error in match(x, table, nomatch = 0L) :   'match' requires vector arguments Any approach that accepts several individual observation identifiers? I will take a look at the econometrics view on CRAN Btw, sorry if the post came out mangled due to HTML, didn't check. Thanks, Arsenio Tuesday, March 6, 2012, 5:24:35 AM, you wrote: Use diff and lag from plm:   Lines="ticker date_f  date prc iticker A 20-Jun-03  15876 19.06 IXT A 23-Jun-03  15879 18.89 IXT A 24-Jun-03  15880 18.75 IXT A 25-Jun-03  15881 18.92 IXT A 26-Jun-03  15882 19.35 IXT A 27-Jun-03  15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB " z <- read.table(textConnection(Lines), header = TRUE) library(plm) zz=pdata.frame(z, c("ticker","date")) pr=zz\$prc; str(pr) zz\$return=diff(pr)/lag(pr) zz   Good luck!   Robert   From: Arsenio Sent: Tuesday, March 06, 2012 2:47 AM To: [hidden email] Subject: [R-SIG-Finance] Return calculation for panel data structure   Dear all, I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form: R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable. ticker date_f date prc iticker A 20-Jun-03 15876 19.06 IXT A 23-Jun-03 15879 18.89 IXT A 24-Jun-03 15880 18.75 IXT A 25-Jun-03 15881 18.92 IXT A 26-Jun-03 15882 19.35 IXT A 27-Jun-03 15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB In  SAS, I would do it this way: data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */        set work.1;        by ticker iticker date_f ; /*sorting by stock, fund and date */        lag_prc=lag(prc); /* creating the lagged price variable        if first.iticker then /* if the it's a first date for the grouped unit then.. */                lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */        return = (prc - lag_prc)/ lag_prc;        drop lag_prc; run; The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing. I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky. Any ideas how to accomplish this? Would appreciate any hints, Arsenio _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. -- Best regards,  Arsenio                            mailto:[hidden email] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: Return calculation for panel data structure

 Here is another approach that might work for you. apply_by_groups <- function (T, cols, F, ...) {   # On input, T is a dataframe, and cols is a list of column names.   # Groups the dataframe T into groups according to the specified   # column names, e.g., all rows in a given group will have identical   # values in the specified columns.  Within each group, applies the   # function F to the dataframe representing that group.  Returns the   # list of results.   NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]   change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }   starts <- which(c(TRUE, sapply(2:nrow(NT), change)))   ends <- c(starts[2:length(starts)]-1, nrow(NT))   lapply(1:length(starts), function(k) { F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) } ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} > do.call("rbind", apply_by_groups(z, "ticker", function(df) { df\$ret <- ret(df\$prc); df} ))    ticker    date_f  date   prc iticker          ret 1       A 20-Jun-03 15876 19.06     IXT           NA 2       A 23-Jun-03 15879 18.89     IXT -0.008919203 3       A 24-Jun-03 15880 18.75     IXT -0.007411329 4       A 25-Jun-03 15881 18.92     IXT  0.009066667 5       A 26-Jun-03 15882 19.35     IXT  0.022727273 6       A 27-Jun-03 15883 19.55     IXT  0.010335917 7      AA 20-Jun-03 15876 26.24     IXB           NA 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172 In the call to apply_by_groups, you can replace "ticker" with a list of names, e.g., c("ticker","fund","manager"), etc. Matthew Clegg On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > Robert, > > Thanks for the reply. I tried plm approach but apparently it only accepts > one individual id and one time id, because this fails > > > zz=pdata.frame(z, c("Stock","Fund","Date")) > Error in match(x, table, nomatch = 0L) : >  'match' requires vector arguments > > Any approach that accepts several individual observation identifiers? I > will take a look at the econometrics view on CRAN > > Btw, sorry if the post came out mangled due to HTML, didn't check. > > Thanks, > > Arsenio > > Tuesday, March 6, 2012, 5:24:35 AM, you wrote: > > > Use diff and lag from plm: > > Lines="ticker date_f  date prc iticker > A 20-Jun-03  15876 19.06 IXT > A 23-Jun-03  15879 18.89 IXT > A 24-Jun-03  15880 18.75 IXT > A 25-Jun-03  15881 18.92 IXT > A 26-Jun-03  15882 19.35 IXT > A 27-Jun-03  15883 19.55 IXT > AA 20-Jun-03 15876 26.24 IXB > AA 23-Jun-03 15879 25.07 IXB > AA 24-Jun-03 15880 24.96 IXB > AA 25-Jun-03 15881 24.51 IXB > AA 26-Jun-03 15882 25.24 IXB > AA 27-Jun-03 15883 25.09 IXB > AA 30-Jun-03 15886 25.5 IXB > " > z <- read.table(textConnection(Lines), header = TRUE) > library(plm) > zz=pdata.frame(z, c("ticker","date")) > pr=zz\$prc; str(pr) > zz\$return=diff(pr)/lag(pr) > zz > > Good luck! > > Robert > > From: Arsenio > Sent: Tuesday, March 06, 2012 2:47 AM > To: [hidden email] > Subject: [R-SIG-Finance] Return calculation for panel data structure > > Dear all, > > I have a very simple question and i haven't been able to code it out. It's > a simple return calculation of a form: > R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. > However it's in the panel data, where stocks belong to say a fund or some > other grouping variable. > > > ticker > date_f > > date > prc > iticker > A > 20-Jun-03 > > 15876 > 19.06 > IXT > A > 23-Jun-03 > > 15879 > 18.89 > IXT > A > 24-Jun-03 > > 15880 > 18.75 > IXT > A > 25-Jun-03 > > 15881 > 18.92 > IXT > A > 26-Jun-03 > > 15882 > 19.35 > IXT > A > 27-Jun-03 > > 15883 > 19.55 > IXT > AA > 20-Jun-03 > > 15876 > 26.24 > IXB > AA > 23-Jun-03 > > 15879 > 25.07 > IXB > AA > 24-Jun-03 > > 15880 > 24.96 > IXB > AA > 25-Jun-03 > > 15881 > 24.51 > IXB > AA > 26-Jun-03 > > 15882 > 25.24 > IXB > AA > 27-Jun-03 > > 15883 > 25.09 > IXB > AA > 30-Jun-03 > > 15886 > 25.5 > IXB > > > > In  SAS, I would do it this way: > > data work.1; /* This Calcs the excluded index return for INDEX constructed > with PRICES (PR)!!!! */ >       set work.1; >       by ticker iticker date_f ; /*sorting by stock, fund and date */ > >       lag_prc=lag(prc); /* creating the lagged price variable > >       if first.iticker then /* if the it's a first date for the grouped > unit then.. */ >               lag_prc= . ; /* setting the first lagged value to missing if > the first date for the grouped unit */ >       return = (prc - lag_prc)/ lag_prc; >       drop lag_prc; > run; > > The end result would be the same dataset with just one more column for > returns for each stock, fund, date combination with the first return for > the date set to missing. > I tried Return.calculate type of funcs from performance analytics and > other solutions using plyr or loops, but they are all too bulky. > > Any ideas how to accomplish this? Would appreciate any hints, > > Arsenio > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > > > > > -- > Best regards, >  Arsenio                            mailto:[hidden email] > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > -- Matthew Clegg [hidden email]         [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: Return calculation for panel data structure

 Built into R is ?split or ?aggregate and ?lapply which should serve you well. Using base R is often the fastest way to skin the cat, if not the prettiest ;-) > library(quantmod)   # for Delt > cbind(z,do.call(rbind,lapply(split(z,z\$ticker), function(x) Delt(x\$prc))))  ticker    date_f  date   prc iticker Delt.1.arithmetic 1       A 20-Jun-03 15876 19.06     IXT                NA 2       A 23-Jun-03 15879 18.89     IXT      -0.008919203 3       A 24-Jun-03 15880 18.75     IXT      -0.007411329 4       A 25-Jun-03 15881 18.92     IXT       0.009066667 5       A 26-Jun-03 15882 19.35     IXT       0.022727273 6       A 27-Jun-03 15883 19.55     IXT       0.010335917 7      AA 20-Jun-03 15876 26.24     IXB                NA 8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415 9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714 10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846 11     AA 26-Jun-03 15882 25.24     IXB       0.029783762 12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948 13     AA 30-Jun-03 15886 25.50     IXB       0.016341172 See ?split for details on multiple identifiers ( just a list, but your example didn't include it so I won't either) The do.call(rbind/cbind stuff is just to get the original data.frame back - there are more verbose ways to do the same thing. Jeff On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg <[hidden email]> wrote: > Here is another approach that might work for you. > > apply_by_groups <- function (T, cols, F, ...) { >  # On input, T is a dataframe, and cols is a list of column names. >  # Groups the dataframe T into groups according to the specified >  # column names, e.g., all rows in a given group will have identical >  # values in the specified columns.  Within each group, applies the >  # function F to the dataframe representing that group.  Returns the >  # list of results. > >  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] >  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } >  starts <- which(c(TRUE, sapply(2:nrow(NT), change))) >  ends <- c(starts[2:length(starts)]-1, nrow(NT)) >  lapply(1:length(starts), function(k) { > F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) > } > ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} > >> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df\$ret <- > ret(df\$prc); df} )) >   ticker    date_f  date   prc iticker          ret > 1       A 20-Jun-03 15876 19.06     IXT           NA > 2       A 23-Jun-03 15879 18.89     IXT -0.008919203 > 3       A 24-Jun-03 15880 18.75     IXT -0.007411329 > 4       A 25-Jun-03 15881 18.92     IXT  0.009066667 > 5       A 26-Jun-03 15882 19.35     IXT  0.022727273 > 6       A 27-Jun-03 15883 19.55     IXT  0.010335917 > 7      AA 20-Jun-03 15876 26.24     IXB           NA > 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415 > 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714 > 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846 > 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762 > 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948 > 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172 > > In the call to apply_by_groups, you can replace "ticker" with a list of > names, e.g., c("ticker","fund","manager"), etc. > > Matthew Clegg > > On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > >> Robert, >> >> Thanks for the reply. I tried plm approach but apparently it only accepts >> one individual id and one time id, because this fails >> >> > zz=pdata.frame(z, c("Stock","Fund","Date")) >> Error in match(x, table, nomatch = 0L) : >>  'match' requires vector arguments >> >> Any approach that accepts several individual observation identifiers? I >> will take a look at the econometrics view on CRAN >> >> Btw, sorry if the post came out mangled due to HTML, didn't check. >> >> Thanks, >> >> Arsenio >> >> Tuesday, March 6, 2012, 5:24:35 AM, you wrote: >> >> >> Use diff and lag from plm: >> >> Lines="ticker date_f  date prc iticker >> A 20-Jun-03  15876 19.06 IXT >> A 23-Jun-03  15879 18.89 IXT >> A 24-Jun-03  15880 18.75 IXT >> A 25-Jun-03  15881 18.92 IXT >> A 26-Jun-03  15882 19.35 IXT >> A 27-Jun-03  15883 19.55 IXT >> AA 20-Jun-03 15876 26.24 IXB >> AA 23-Jun-03 15879 25.07 IXB >> AA 24-Jun-03 15880 24.96 IXB >> AA 25-Jun-03 15881 24.51 IXB >> AA 26-Jun-03 15882 25.24 IXB >> AA 27-Jun-03 15883 25.09 IXB >> AA 30-Jun-03 15886 25.5 IXB >> " >> z <- read.table(textConnection(Lines), header = TRUE) >> library(plm) >> zz=pdata.frame(z, c("ticker","date")) >> pr=zz\$prc; str(pr) >> zz\$return=diff(pr)/lag(pr) >> zz >> >> Good luck! >> >> Robert >> >> From: Arsenio >> Sent: Tuesday, March 06, 2012 2:47 AM >> To: [hidden email] >> Subject: [R-SIG-Finance] Return calculation for panel data structure >> >> Dear all, >> >> I have a very simple question and i haven't been able to code it out. It's >> a simple return calculation of a form: >> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. >> However it's in the panel data, where stocks belong to say a fund or some >> other grouping variable. >> >> >> ticker >> date_f >> >> date >> prc >> iticker >> A >> 20-Jun-03 >> >> 15876 >> 19.06 >> IXT >> A >> 23-Jun-03 >> >> 15879 >> 18.89 >> IXT >> A >> 24-Jun-03 >> >> 15880 >> 18.75 >> IXT >> A >> 25-Jun-03 >> >> 15881 >> 18.92 >> IXT >> A >> 26-Jun-03 >> >> 15882 >> 19.35 >> IXT >> A >> 27-Jun-03 >> >> 15883 >> 19.55 >> IXT >> AA >> 20-Jun-03 >> >> 15876 >> 26.24 >> IXB >> AA >> 23-Jun-03 >> >> 15879 >> 25.07 >> IXB >> AA >> 24-Jun-03 >> >> 15880 >> 24.96 >> IXB >> AA >> 25-Jun-03 >> >> 15881 >> 24.51 >> IXB >> AA >> 26-Jun-03 >> >> 15882 >> 25.24 >> IXB >> AA >> 27-Jun-03 >> >> 15883 >> 25.09 >> IXB >> AA >> 30-Jun-03 >> >> 15886 >> 25.5 >> IXB >> >> >> >> In  SAS, I would do it this way: >> >> data work.1; /* This Calcs the excluded index return for INDEX constructed >> with PRICES (PR)!!!! */ >>       set work.1; >>       by ticker iticker date_f ; /*sorting by stock, fund and date */ >> >>       lag_prc=lag(prc); /* creating the lagged price variable >> >>       if first.iticker then /* if the it's a first date for the grouped >> unit then.. */ >>               lag_prc= . ; /* setting the first lagged value to missing if >> the first date for the grouped unit */ >>       return = (prc - lag_prc)/ lag_prc; >>       drop lag_prc; >> run; >> >> The end result would be the same dataset with just one more column for >> returns for each stock, fund, date combination with the first return for >> the date set to missing. >> I tried Return.calculate type of funcs from performance analytics and >> other solutions using plyr or loops, but they are all too bulky. >> >> Any ideas how to accomplish this? Would appreciate any hints, >> >> Arsenio >> >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. >> >> >> >> >> -- >> Best regards, >>  Arsenio                            mailto:[hidden email] >> >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. >> > > > > -- > Matthew Clegg > [hidden email] > >        [[alternative HTML version deleted]] > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. -- Jeffrey Ryan [hidden email] www.lemnica.com www.esotericR.com R/Finance 2012: Applied Finance with R www.RinFinance.com See you in Chicago!!!! _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: Return calculation for panel data structure

 Jeffrey, Thanks a lot! I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ? Warning message: In split.default(seq_len(nrow(x)), f, drop = drop, ...) :   data length is not a multiple of split variable I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated. Thanks for the Delt function that's quite a shortcut! Arsenio Wednesday, March 7, 2012, 8:14:31 AM, you wrote: > Built into R is ?split or ?aggregate and ?lapply which should serve you well. > Using base R is often the fastest way to skin the cat, if not the prettiest ;-) >> library(quantmod)   # for Delt >> cbind(z,do.call(rbind,lapply(split(z,z\$ticker), function(x) Delt(x\$prc)))) >  ticker    date_f  date   prc iticker Delt.1.arithmetic > 1       A 20-Jun-03 15876 19.06     IXT                NA > 2       A 23-Jun-03 15879 18.89     IXT      -0.008919203 > 3       A 24-Jun-03 15880 18.75     IXT      -0.007411329 > 4       A 25-Jun-03 15881 18.92     IXT       0.009066667 > 5       A 26-Jun-03 15882 19.35     IXT       0.022727273 > 6       A 27-Jun-03 15883 19.55     IXT       0.010335917 > 7      AA 20-Jun-03 15876 26.24     IXB                NA > 8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415 > 9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714 > 10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846 > 11     AA 26-Jun-03 15882 25.24     IXB       0.029783762 > 12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948 > 13     AA 30-Jun-03 15886 25.50     IXB       0.016341172 > See ?split for details on multiple identifiers ( just a list, but your > example didn't include it so I won't either) > The do.call(rbind/cbind stuff is just to get the original data.frame > back - there are more verbose ways to do the same thing. > Jeff > On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg > <[hidden email]> wrote: >> Here is another approach that might work for you. >> apply_by_groups <- function (T, cols, F, ...) { >>  # On input, T is a dataframe, and cols is a list of column names. >>  # Groups the dataframe T into groups according to the specified >>  # column names, e.g., all rows in a given group will have identical >>  # values in the specified columns.  Within each group, applies the >>  # function F to the dataframe representing that group.  Returns the >>  # list of results. >>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] >>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } >>  starts <- which(c(TRUE, sapply(2:nrow(NT), change))) >>  ends <- c(starts[2:length(starts)]-1, nrow(NT)) >>  lapply(1:length(starts), function(k) { >> F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) >> } >> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} >>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df\$ret <- >> ret(df\$prc); df} )) >>   ticker    date_f  date   prc iticker          ret >> 1       A 20-Jun-03 15876 19.06     IXT           NA >> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203 >> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329 >> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667 >> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273 >> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917 >> 7      AA 20-Jun-03 15876 26.24     IXB           NA >> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415 >> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714 >> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846 >> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762 >> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948 >> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172 >> In the call to apply_by_groups, you can replace "ticker" with a list of >> names, e.g., c("ticker","fund","manager"), etc. >> Matthew Clegg >> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: >>> Robert, >>> Thanks for the reply. I tried plm approach but apparently it only accepts >>> one individual id and one time id, because this fails >>> > zz=pdata.frame(z, c("Stock","Fund","Date")) >>> Error in match(x, table, nomatch = 0L) : >>>  'match' requires vector arguments >>> Any approach that accepts several individual observation identifiers? I >>> will take a look at the econometrics view on CRAN >>> Btw, sorry if the post came out mangled due to HTML, didn't check. >>> Thanks, >>> Arsenio >>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote: >>> Use diff and lag from plm: >>> Lines="ticker date_f  date prc iticker >>> A 20-Jun-03  15876 19.06 IXT >>> A 23-Jun-03  15879 18.89 IXT >>> A 24-Jun-03  15880 18.75 IXT >>> A 25-Jun-03  15881 18.92 IXT >>> A 26-Jun-03  15882 19.35 IXT >>> A 27-Jun-03  15883 19.55 IXT >>> AA 20-Jun-03 15876 26.24 IXB >>> AA 23-Jun-03 15879 25.07 IXB >>> AA 24-Jun-03 15880 24.96 IXB >>> AA 25-Jun-03 15881 24.51 IXB >>> AA 26-Jun-03 15882 25.24 IXB >>> AA 27-Jun-03 15883 25.09 IXB >>> AA 30-Jun-03 15886 25.5 IXB >>> " >>> z <- read.table(textConnection(Lines), header = TRUE) >>> library(plm) >>> zz=pdata.frame(z, c("ticker","date")) >>> pr=zz\$prc; str(pr) >>> zz\$return=diff(pr)/lag(pr) >>> zz >>> Good luck! >>> Robert >>> From: Arsenio >>> Sent: Tuesday, March 06, 2012 2:47 AM >>> To: [hidden email] >>> Subject: [R-SIG-Finance] Return calculation for panel data structure >>> Dear all, >>> I have a very simple question and i haven't been able to code it out. It's >>> a simple return calculation of a form: >>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. >>> However it's in the panel data, where stocks belong to say a fund or some >>> other grouping variable. >>> ticker >>> date_f >>> date >>> prc >>> iticker >>> A >>> 20-Jun-03 >>> 15876 >>> 19.06 >>> IXT >>> A >>> 23-Jun-03 >>> 15879 >>> 18.89 >>> IXT >>> A >>> 24-Jun-03 >>> 15880 >>> 18.75 >>> IXT >>> A >>> 25-Jun-03 >>> 15881 >>> 18.92 >>> IXT >>> A >>> 26-Jun-03 >>> 15882 >>> 19.35 >>> IXT >>> A >>> 27-Jun-03 >>> 15883 >>> 19.55 >>> IXT >>> AA >>> 20-Jun-03 >>> 15876 >>> 26.24 >>> IXB >>> AA >>> 23-Jun-03 >>> 15879 >>> 25.07 >>> IXB >>> AA >>> 24-Jun-03 >>> 15880 >>> 24.96 >>> IXB >>> AA >>> 25-Jun-03 >>> 15881 >>> 24.51 >>> IXB >>> AA >>> 26-Jun-03 >>> 15882 >>> 25.24 >>> IXB >>> AA >>> 27-Jun-03 >>> 15883 >>> 25.09 >>> IXB >>> AA >>> 30-Jun-03 >>> 15886 >>> 25.5 >>> IXB >>> In  SAS, I would do it this way: >>> data work.1; /* This Calcs the excluded index return for INDEX constructed >>> with PRICES (PR)!!!! */ >>>       set work.1; >>>       by ticker iticker date_f ; /*sorting by stock, fund and date */ >>>       lag_prc=lag(prc); /* creating the lagged price variable >>>       if first.iticker then /* if the it's a first date for the grouped >>> unit then.. */ >>>               lag_prc= . ; /* setting the first lagged value to missing if >>> the first date for the grouped unit */ >>>       return = (prc - lag_prc)/ lag_prc; >>>       drop lag_prc; >>> run; >>> The end result would be the same dataset with just one more column for >>> returns for each stock, fund, date combination with the first return for >>> the date set to missing. >>> I tried Return.calculate type of funcs from performance analytics and >>> other solutions using plyr or loops, but they are all too bulky. >>> Any ideas how to accomplish this? Would appreciate any hints, >>> Arsenio >>> _______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>>> -- Subscriber-posting only. If you want to post, subscribe first. >>> -- Also note that this is not the r-help list where general R questions >>> should go. >>> -- >>> Best regards, >>>  Arsenio                            mailto:[hidden email] >>> _______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>>> -- Subscriber-posting only. If you want to post, subscribe first. >>> -- Also note that this is not the r-help list where general R questions >>> should go. >> -- >> Matthew Clegg >> [hidden email] >>        [[alternative HTML version deleted]] >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions should go. -- Best regards,  Arsenio                            mailto:[hidden email] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: Return calculation for panel data structure

 In reply to this post by Matthew Clegg Matthew, Thanks for the code! I have been trying to analyze this, seems to work ok. It's a bit hard to understand this for me, still a newbie at R. Thanks, Arsenio Tuesday, March 6, 2012, 5:38:47 PM, you wrote: > Here is another approach that might work for you. > apply_by_groups <- function (T, cols, F, ...) { >   # On input, T is a dataframe, and cols is a list of column names. >   # Groups the dataframe T into groups according to the specified >   # column names, e.g., all rows in a given group will have identical >   # values in the specified columns.  Within each group, applies the >   # function F to the dataframe representing that group.  Returns the >   # list of results. >   >   NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] >   change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } >   starts <- which(c(TRUE, sapply(2:nrow(NT), change))) >   ends <- c(starts[2:length(starts)]-1, nrow(NT)) >   lapply(1:length(starts), function(k) { > F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) > } > ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} >> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df\$ret <- ret(df\$prc); df} )) >    ticker    date_f  date   prc iticker          ret > 1       A 20-Jun-03 15876 19.06     IXT           NA > 2       A 23-Jun-03 15879 18.89     IXT -0.008919203 > 3       A 24-Jun-03 15880 18.75     IXT -0.007411329 > 4       A 25-Jun-03 15881 18.92     IXT  0.009066667 > 5       A 26-Jun-03 15882 19.35     IXT  0.022727273 > 6       A 27-Jun-03 15883 19.55     IXT  0.010335917 > 7      AA 20-Jun-03 15876 26.24     IXB           NA > 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415 > 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714 > 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846 > 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762 > 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948 > 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172 > In the call to apply_by_groups, you can replace "ticker" with a > list of names, e.g., c("ticker","fund","manager"), etc. > Matthew Clegg > On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > Robert, > Thanks for the reply. I tried plm approach but apparently it only > accepts one individual id and one time id, because this fails >> zz=pdata.frame(z, c("Stock","Fund","Date")) > Error in match(x, table, nomatch = 0L) : >  'match' requires vector arguments > Any approach that accepts several individual observation > identifiers? I will take a look at the econometrics view on CRAN > Btw, sorry if the post came out mangled due to HTML, didn't check. > Thanks, > Arsenio > Tuesday, March 6, 2012, 5:24:35 AM, you wrote: > Use diff and lag from plm: > Lines="ticker date_f  date prc iticker > A 20-Jun-03  15876 19.06 IXT > A 23-Jun-03  15879 18.89 IXT > A 24-Jun-03  15880 18.75 IXT > A 25-Jun-03  15881 18.92 IXT > A 26-Jun-03  15882 19.35 IXT > A 27-Jun-03  15883 19.55 IXT > AA 20-Jun-03 15876 26.24 IXB > AA 23-Jun-03 15879 25.07 IXB > AA 24-Jun-03 15880 24.96 IXB > AA 25-Jun-03 15881 24.51 IXB > AA 26-Jun-03 15882 25.24 IXB > AA 27-Jun-03 15883 25.09 IXB > AA 30-Jun-03 15886 25.5 IXB > " > z <- read.table(textConnection(Lines), header = TRUE) > library(plm) > zz=pdata.frame(z, c("ticker","date")) > pr=zz\$prc; str(pr) > zz\$return=diff(pr)/lag(pr) > zz > Good luck! > Robert > From: Arsenio > Sent: Tuesday, March 06, 2012 2:47 AM > To: [hidden email] > Subject: [R-SIG-Finance] Return calculation for panel data structure > Dear all, > I have a very simple question and i haven't been able to code it > out. It's a simple return calculation of a form: > R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. > However it's in the panel data, where stocks belong to say a fund or some other grouping variable. > ticker > date_f > date > prc > iticker > A > 20-Jun-03 > 15876 > 19.06 > IXT > A > 23-Jun-03 > 15879 > 18.89 > IXT > A > 24-Jun-03 > 15880 > 18.75 > IXT > A > 25-Jun-03 > 15881 > 18.92 > IXT > A > 26-Jun-03 > 15882 > 19.35 > IXT > A > 27-Jun-03 > 15883 > 19.55 > IXT > AA > 20-Jun-03 > 15876 > 26.24 > IXB > AA > 23-Jun-03 > 15879 > 25.07 > IXB > AA > 24-Jun-03 > 15880 > 24.96 > IXB > AA > 25-Jun-03 > 15881 > 24.51 > IXB > AA > 26-Jun-03 > 15882 > 25.24 > IXB > AA > 27-Jun-03 > 15883 > 25.09 > IXB > AA > 30-Jun-03 > 15886 > 25.5 > IXB > In  SAS, I would do it this way: > data work.1; /* This Calcs the excluded index return for INDEX > constructed with PRICES (PR)!!!! */ >       set work.1; >       by ticker iticker date_f ; /*sorting by stock, fund and date */ >       lag_prc=lag(prc); /* creating the lagged price variable >       if first.iticker then /* if the it's a first date for the grouped unit then.. */ >               lag_prc= . ; /* setting the first lagged value to > missing if the first date for the grouped unit */ >       return = (prc - lag_prc)/ lag_prc; >       drop lag_prc; > run; > The end result would be the same dataset with just one more column > for returns for each stock, fund, date combination with the first > return for the date set to missing. > I tried Return.calculate type of funcs from performance analytics > and other solutions using plyr or loops, but they are all too bulky. > Any ideas how to accomplish this? Would appreciate any hints, > Arsenio > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. > -- > Best regards, >  Arsenio                            mailto:[hidden email] > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. -- Best regards,  Arsenio                            mailto:[hidden email] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.