I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form: R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable.
In SAS, I would do it this way: data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */ set work.1; by ticker iticker date_f ; /*sorting by stock, fund and date */ lag_prc=lag(prc); /* creating the lagged price variable if first.iticker then /* if the it's a first date for the grouped unit then.. */ lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */ return = (prc - lag_prc)/ lag_prc; drop lag_prc; run; The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing. I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky. Any ideas how to accomplish this? Would appreciate any hints, Arsenio _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Return calculation for panel data structureUse diff and lag from plm:
Lines="ticker date_f date prc iticker A 20-Jun-03 15876 19.06 IXT A 23-Jun-03 15879 18.89 IXT A 24-Jun-03 15880 18.75 IXT A 25-Jun-03 15881 18.92 IXT A 26-Jun-03 15882 19.35 IXT A 27-Jun-03 15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB " z <- read.table(textConnection(Lines), header = TRUE) library(plm) zz=pdata.frame(z, c("ticker","date")) pr=zz$prc; str(pr) zz$return=diff(pr)/lag(pr) zz Good luck! Robert From: Arsenio Sent: Tuesday, March 06, 2012 2:47 AM To: [hidden email] Subject: [R-SIG-Finance] Return calculation for panel data structure Dear all, I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form: R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable. ticker date_f date prc iticker A 20-Jun-03 15876 19.06 IXT A 23-Jun-03 15879 18.89 IXT A 24-Jun-03 15880 18.75 IXT A 25-Jun-03 15881 18.92 IXT A 26-Jun-03 15882 19.35 IXT A 27-Jun-03 15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB In SAS, I would do it this way: data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */ set work.1; by ticker iticker date_f ; /*sorting by stock, fund and date */ lag_prc=lag(prc); /* creating the lagged price variable if first.iticker then /* if the it's a first date for the grouped unit then.. */ lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */ return = (prc - lag_prc)/ lag_prc; drop lag_prc; run; The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing. I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky. Any ideas how to accomplish this? Would appreciate any hints, Arsenio -------------------------------------------------------------------------------- _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Robert,
Thanks for the reply. I tried plm approach but apparently it only accepts one individual id and one time id, because this fails > zz=pdata.frame(z, c("Stock","Fund","Date")) Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments Any approach that accepts several individual observation identifiers? I will take a look at the econometrics view on CRAN Btw, sorry if the post came out mangled due to HTML, didn't check. Thanks, Arsenio Tuesday, March 6, 2012, 5:24:35 AM, you wrote: Use diff and lag from plm: Lines="ticker date_f date prc iticker A 20-Jun-03 15876 19.06 IXT A 23-Jun-03 15879 18.89 IXT A 24-Jun-03 15880 18.75 IXT A 25-Jun-03 15881 18.92 IXT A 26-Jun-03 15882 19.35 IXT A 27-Jun-03 15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB " z <- read.table(textConnection(Lines), header = TRUE) library(plm) zz=pdata.frame(z, c("ticker","date")) pr=zz$prc; str(pr) zz$return=diff(pr)/lag(pr) zz Good luck! Robert From: Arsenio Sent: Tuesday, March 06, 2012 2:47 AM To: [hidden email] Subject: [R-SIG-Finance] Return calculation for panel data structure Dear all, I have a very simple question and i haven't been able to code it out. It's a simple return calculation of a form: R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. However it's in the panel data, where stocks belong to say a fund or some other grouping variable. ticker date_f date prc iticker A 20-Jun-03 15876 19.06 IXT A 23-Jun-03 15879 18.89 IXT A 24-Jun-03 15880 18.75 IXT A 25-Jun-03 15881 18.92 IXT A 26-Jun-03 15882 19.35 IXT A 27-Jun-03 15883 19.55 IXT AA 20-Jun-03 15876 26.24 IXB AA 23-Jun-03 15879 25.07 IXB AA 24-Jun-03 15880 24.96 IXB AA 25-Jun-03 15881 24.51 IXB AA 26-Jun-03 15882 25.24 IXB AA 27-Jun-03 15883 25.09 IXB AA 30-Jun-03 15886 25.5 IXB In SAS, I would do it this way: data work.1; /* This Calcs the excluded index return for INDEX constructed with PRICES (PR)!!!! */ set work.1; by ticker iticker date_f ; /*sorting by stock, fund and date */ lag_prc=lag(prc); /* creating the lagged price variable if first.iticker then /* if the it's a first date for the grouped unit then.. */ lag_prc= . ; /* setting the first lagged value to missing if the first date for the grouped unit */ return = (prc - lag_prc)/ lag_prc; drop lag_prc; run; The end result would be the same dataset with just one more column for returns for each stock, fund, date combination with the first return for the date set to missing. I tried Return.calculate type of funcs from performance analytics and other solutions using plyr or loops, but they are all too bulky. Any ideas how to accomplish this? Would appreciate any hints, Arsenio _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. -- Best regards, Arsenio mailto:[hidden email] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Here is another approach that might work for you.
apply_by_groups <- function (T, cols, F, ...) { # On input, T is a dataframe, and cols is a list of column names. # Groups the dataframe T into groups according to the specified # column names, e.g., all rows in a given group will have identical # values in the specified columns. Within each group, applies the # function F to the dataframe representing that group. Returns the # list of results. NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } starts <- which(c(TRUE, sapply(2:nrow(NT), change))) ends <- c(starts[2:length(starts)]-1, nrow(NT)) lapply(1:length(starts), function(k) { F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) } ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} > do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- ret(df$prc); df} )) ticker date_f date prc iticker ret 1 A 20-Jun-03 15876 19.06 IXT NA 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 7 AA 20-Jun-03 15876 26.24 IXB NA 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 In the call to apply_by_groups, you can replace "ticker" with a list of names, e.g., c("ticker","fund","manager"), etc. Matthew Clegg On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > Robert, > > Thanks for the reply. I tried plm approach but apparently it only accepts > one individual id and one time id, because this fails > > > zz=pdata.frame(z, c("Stock","Fund","Date")) > Error in match(x, table, nomatch = 0L) : > 'match' requires vector arguments > > Any approach that accepts several individual observation identifiers? I > will take a look at the econometrics view on CRAN > > Btw, sorry if the post came out mangled due to HTML, didn't check. > > Thanks, > > Arsenio > > Tuesday, March 6, 2012, 5:24:35 AM, you wrote: > > > Use diff and lag from plm: > > Lines="ticker date_f date prc iticker > A 20-Jun-03 15876 19.06 IXT > A 23-Jun-03 15879 18.89 IXT > A 24-Jun-03 15880 18.75 IXT > A 25-Jun-03 15881 18.92 IXT > A 26-Jun-03 15882 19.35 IXT > A 27-Jun-03 15883 19.55 IXT > AA 20-Jun-03 15876 26.24 IXB > AA 23-Jun-03 15879 25.07 IXB > AA 24-Jun-03 15880 24.96 IXB > AA 25-Jun-03 15881 24.51 IXB > AA 26-Jun-03 15882 25.24 IXB > AA 27-Jun-03 15883 25.09 IXB > AA 30-Jun-03 15886 25.5 IXB > " > z <- read.table(textConnection(Lines), header = TRUE) > library(plm) > zz=pdata.frame(z, c("ticker","date")) > pr=zz$prc; str(pr) > zz$return=diff(pr)/lag(pr) > zz > > Good luck! > > Robert > > From: Arsenio > Sent: Tuesday, March 06, 2012 2:47 AM > To: [hidden email] > Subject: [R-SIG-Finance] Return calculation for panel data structure > > Dear all, > > I have a very simple question and i haven't been able to code it out. It's > a simple return calculation of a form: > R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. > However it's in the panel data, where stocks belong to say a fund or some > other grouping variable. > > > ticker > date_f > > date > prc > iticker > A > 20-Jun-03 > > 15876 > 19.06 > IXT > A > 23-Jun-03 > > 15879 > 18.89 > IXT > A > 24-Jun-03 > > 15880 > 18.75 > IXT > A > 25-Jun-03 > > 15881 > 18.92 > IXT > A > 26-Jun-03 > > 15882 > 19.35 > IXT > A > 27-Jun-03 > > 15883 > 19.55 > IXT > AA > 20-Jun-03 > > 15876 > 26.24 > IXB > AA > 23-Jun-03 > > 15879 > 25.07 > IXB > AA > 24-Jun-03 > > 15880 > 24.96 > IXB > AA > 25-Jun-03 > > 15881 > 24.51 > IXB > AA > 26-Jun-03 > > 15882 > 25.24 > IXB > AA > 27-Jun-03 > > 15883 > 25.09 > IXB > AA > 30-Jun-03 > > 15886 > 25.5 > IXB > > > > In SAS, I would do it this way: > > data work.1; /* This Calcs the excluded index return for INDEX constructed > with PRICES (PR)!!!! */ > set work.1; > by ticker iticker date_f ; /*sorting by stock, fund and date */ > > lag_prc=lag(prc); /* creating the lagged price variable > > if first.iticker then /* if the it's a first date for the grouped > unit then.. */ > lag_prc= . ; /* setting the first lagged value to missing if > the first date for the grouped unit */ > return = (prc - lag_prc)/ lag_prc; > drop lag_prc; > run; > > The end result would be the same dataset with just one more column for > returns for each stock, fund, date combination with the first return for > the date set to missing. > I tried Return.calculate type of funcs from performance analytics and > other solutions using plyr or loops, but they are all too bulky. > > Any ideas how to accomplish this? Would appreciate any hints, > > Arsenio > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > > > > > -- > Best regards, > Arsenio mailto:[hidden email] > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > -- Matthew Clegg [hidden email] [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Built into R is ?split or ?aggregate and ?lapply which should serve you well.
Using base R is often the fastest way to skin the cat, if not the prettiest ;-) > library(quantmod) # for Delt > cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc)))) ticker date_f date prc iticker Delt.1.arithmetic 1 A 20-Jun-03 15876 19.06 IXT NA 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 7 AA 20-Jun-03 15876 26.24 IXB NA 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 See ?split for details on multiple identifiers ( just a list, but your example didn't include it so I won't either) The do.call(rbind/cbind stuff is just to get the original data.frame back - there are more verbose ways to do the same thing. Jeff On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg <[hidden email]> wrote: > Here is another approach that might work for you. > > apply_by_groups <- function (T, cols, F, ...) { > # On input, T is a dataframe, and cols is a list of column names. > # Groups the dataframe T into groups according to the specified > # column names, e.g., all rows in a given group will have identical > # values in the specified columns. Within each group, applies the > # function F to the dataframe representing that group. Returns the > # list of results. > > NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] > change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } > starts <- which(c(TRUE, sapply(2:nrow(NT), change))) > ends <- c(starts[2:length(starts)]-1, nrow(NT)) > lapply(1:length(starts), function(k) { > F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) > } > ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} > >> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- > ret(df$prc); df} )) > ticker date_f date prc iticker ret > 1 A 20-Jun-03 15876 19.06 IXT NA > 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 > 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 > 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 > 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 > 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 > 7 AA 20-Jun-03 15876 26.24 IXB NA > 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 > 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 > 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 > 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 > 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 > 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 > > In the call to apply_by_groups, you can replace "ticker" with a list of > names, e.g., c("ticker","fund","manager"), etc. > > Matthew Clegg > > On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > >> Robert, >> >> Thanks for the reply. I tried plm approach but apparently it only accepts >> one individual id and one time id, because this fails >> >> > zz=pdata.frame(z, c("Stock","Fund","Date")) >> Error in match(x, table, nomatch = 0L) : >> 'match' requires vector arguments >> >> Any approach that accepts several individual observation identifiers? I >> will take a look at the econometrics view on CRAN >> >> Btw, sorry if the post came out mangled due to HTML, didn't check. >> >> Thanks, >> >> Arsenio >> >> Tuesday, March 6, 2012, 5:24:35 AM, you wrote: >> >> >> Use diff and lag from plm: >> >> Lines="ticker date_f date prc iticker >> A 20-Jun-03 15876 19.06 IXT >> A 23-Jun-03 15879 18.89 IXT >> A 24-Jun-03 15880 18.75 IXT >> A 25-Jun-03 15881 18.92 IXT >> A 26-Jun-03 15882 19.35 IXT >> A 27-Jun-03 15883 19.55 IXT >> AA 20-Jun-03 15876 26.24 IXB >> AA 23-Jun-03 15879 25.07 IXB >> AA 24-Jun-03 15880 24.96 IXB >> AA 25-Jun-03 15881 24.51 IXB >> AA 26-Jun-03 15882 25.24 IXB >> AA 27-Jun-03 15883 25.09 IXB >> AA 30-Jun-03 15886 25.5 IXB >> " >> z <- read.table(textConnection(Lines), header = TRUE) >> library(plm) >> zz=pdata.frame(z, c("ticker","date")) >> pr=zz$prc; str(pr) >> zz$return=diff(pr)/lag(pr) >> zz >> >> Good luck! >> >> Robert >> >> From: Arsenio >> Sent: Tuesday, March 06, 2012 2:47 AM >> To: [hidden email] >> Subject: [R-SIG-Finance] Return calculation for panel data structure >> >> Dear all, >> >> I have a very simple question and i haven't been able to code it out. It's >> a simple return calculation of a form: >> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. >> However it's in the panel data, where stocks belong to say a fund or some >> other grouping variable. >> >> >> ticker >> date_f >> >> date >> prc >> iticker >> A >> 20-Jun-03 >> >> 15876 >> 19.06 >> IXT >> A >> 23-Jun-03 >> >> 15879 >> 18.89 >> IXT >> A >> 24-Jun-03 >> >> 15880 >> 18.75 >> IXT >> A >> 25-Jun-03 >> >> 15881 >> 18.92 >> IXT >> A >> 26-Jun-03 >> >> 15882 >> 19.35 >> IXT >> A >> 27-Jun-03 >> >> 15883 >> 19.55 >> IXT >> AA >> 20-Jun-03 >> >> 15876 >> 26.24 >> IXB >> AA >> 23-Jun-03 >> >> 15879 >> 25.07 >> IXB >> AA >> 24-Jun-03 >> >> 15880 >> 24.96 >> IXB >> AA >> 25-Jun-03 >> >> 15881 >> 24.51 >> IXB >> AA >> 26-Jun-03 >> >> 15882 >> 25.24 >> IXB >> AA >> 27-Jun-03 >> >> 15883 >> 25.09 >> IXB >> AA >> 30-Jun-03 >> >> 15886 >> 25.5 >> IXB >> >> >> >> In SAS, I would do it this way: >> >> data work.1; /* This Calcs the excluded index return for INDEX constructed >> with PRICES (PR)!!!! */ >> set work.1; >> by ticker iticker date_f ; /*sorting by stock, fund and date */ >> >> lag_prc=lag(prc); /* creating the lagged price variable >> >> if first.iticker then /* if the it's a first date for the grouped >> unit then.. */ >> lag_prc= . ; /* setting the first lagged value to missing if >> the first date for the grouped unit */ >> return = (prc - lag_prc)/ lag_prc; >> drop lag_prc; >> run; >> >> The end result would be the same dataset with just one more column for >> returns for each stock, fund, date combination with the first return for >> the date set to missing. >> I tried Return.calculate type of funcs from performance analytics and >> other solutions using plyr or loops, but they are all too bulky. >> >> Any ideas how to accomplish this? Would appreciate any hints, >> >> Arsenio >> >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. >> >> >> >> >> -- >> Best regards, >> Arsenio mailto:[hidden email] >> >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. >> > > > > -- > Matthew Clegg > [hidden email] > > [[alternative HTML version deleted]] > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. -- Jeffrey Ryan [hidden email] www.lemnica.com www.esotericR.com R/Finance 2012: Applied Finance with R www.RinFinance.com See you in Chicago!!!! _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Jeffrey,
Thanks a lot! I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ? Warning message: In split.default(seq_len(nrow(x)), f, drop = drop, ...) : data length is not a multiple of split variable I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated. Thanks for the Delt function that's quite a shortcut! Arsenio Wednesday, March 7, 2012, 8:14:31 AM, you wrote: > Built into R is ?split or ?aggregate and ?lapply which should serve you well. > Using base R is often the fastest way to skin the cat, if not the prettiest ;-) >> library(quantmod) # for Delt >> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc)))) > ticker date_f date prc iticker Delt.1.arithmetic > 1 A 20-Jun-03 15876 19.06 IXT NA > 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 > 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 > 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 > 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 > 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 > 7 AA 20-Jun-03 15876 26.24 IXB NA > 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 > 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 > 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 > 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 > 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 > 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 > See ?split for details on multiple identifiers ( just a list, but your > example didn't include it so I won't either) > The do.call(rbind/cbind stuff is just to get the original data.frame > back - there are more verbose ways to do the same thing. > Jeff > On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg > <[hidden email]> wrote: >> Here is another approach that might work for you. >> apply_by_groups <- function (T, cols, F, ...) { >> # On input, T is a dataframe, and cols is a list of column names. >> # Groups the dataframe T into groups according to the specified >> # column names, e.g., all rows in a given group will have identical >> # values in the specified columns. Within each group, applies the >> # function F to the dataframe representing that group. Returns the >> # list of results. >> NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] >> change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } >> starts <- which(c(TRUE, sapply(2:nrow(NT), change))) >> ends <- c(starts[2:length(starts)]-1, nrow(NT)) >> lapply(1:length(starts), function(k) { >> F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) >> } >> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} >>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- >> ret(df$prc); df} )) >> ticker date_f date prc iticker ret >> 1 A 20-Jun-03 15876 19.06 IXT NA >> 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 >> 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 >> 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 >> 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 >> 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 >> 7 AA 20-Jun-03 15876 26.24 IXB NA >> 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 >> 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 >> 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 >> 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 >> 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 >> 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 >> In the call to apply_by_groups, you can replace "ticker" with a list of >> names, e.g., c("ticker","fund","manager"), etc. >> Matthew Clegg >> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: >>> Robert, >>> Thanks for the reply. I tried plm approach but apparently it only accepts >>> one individual id and one time id, because this fails >>> > zz=pdata.frame(z, c("Stock","Fund","Date")) >>> Error in match(x, table, nomatch = 0L) : >>> 'match' requires vector arguments >>> Any approach that accepts several individual observation identifiers? I >>> will take a look at the econometrics view on CRAN >>> Btw, sorry if the post came out mangled due to HTML, didn't check. >>> Thanks, >>> Arsenio >>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote: >>> Use diff and lag from plm: >>> Lines="ticker date_f date prc iticker >>> A 20-Jun-03 15876 19.06 IXT >>> A 23-Jun-03 15879 18.89 IXT >>> A 24-Jun-03 15880 18.75 IXT >>> A 25-Jun-03 15881 18.92 IXT >>> A 26-Jun-03 15882 19.35 IXT >>> A 27-Jun-03 15883 19.55 IXT >>> AA 20-Jun-03 15876 26.24 IXB >>> AA 23-Jun-03 15879 25.07 IXB >>> AA 24-Jun-03 15880 24.96 IXB >>> AA 25-Jun-03 15881 24.51 IXB >>> AA 26-Jun-03 15882 25.24 IXB >>> AA 27-Jun-03 15883 25.09 IXB >>> AA 30-Jun-03 15886 25.5 IXB >>> " >>> z <- read.table(textConnection(Lines), header = TRUE) >>> library(plm) >>> zz=pdata.frame(z, c("ticker","date")) >>> pr=zz$prc; str(pr) >>> zz$return=diff(pr)/lag(pr) >>> zz >>> Good luck! >>> Robert >>> From: Arsenio >>> Sent: Tuesday, March 06, 2012 2:47 AM >>> To: [hidden email] >>> Subject: [R-SIG-Finance] Return calculation for panel data structure >>> Dear all, >>> I have a very simple question and i haven't been able to code it out. It's >>> a simple return calculation of a form: >>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. >>> However it's in the panel data, where stocks belong to say a fund or some >>> other grouping variable. >>> ticker >>> date_f >>> date >>> prc >>> iticker >>> A >>> 20-Jun-03 >>> 15876 >>> 19.06 >>> IXT >>> A >>> 23-Jun-03 >>> 15879 >>> 18.89 >>> IXT >>> A >>> 24-Jun-03 >>> 15880 >>> 18.75 >>> IXT >>> A >>> 25-Jun-03 >>> 15881 >>> 18.92 >>> IXT >>> A >>> 26-Jun-03 >>> 15882 >>> 19.35 >>> IXT >>> A >>> 27-Jun-03 >>> 15883 >>> 19.55 >>> IXT >>> AA >>> 20-Jun-03 >>> 15876 >>> 26.24 >>> IXB >>> AA >>> 23-Jun-03 >>> 15879 >>> 25.07 >>> IXB >>> AA >>> 24-Jun-03 >>> 15880 >>> 24.96 >>> IXB >>> AA >>> 25-Jun-03 >>> 15881 >>> 24.51 >>> IXB >>> AA >>> 26-Jun-03 >>> 15882 >>> 25.24 >>> IXB >>> AA >>> 27-Jun-03 >>> 15883 >>> 25.09 >>> IXB >>> AA >>> 30-Jun-03 >>> 15886 >>> 25.5 >>> IXB >>> In SAS, I would do it this way: >>> data work.1; /* This Calcs the excluded index return for INDEX constructed >>> with PRICES (PR)!!!! */ >>> set work.1; >>> by ticker iticker date_f ; /*sorting by stock, fund and date */ >>> lag_prc=lag(prc); /* creating the lagged price variable >>> if first.iticker then /* if the it's a first date for the grouped >>> unit then.. */ >>> lag_prc= . ; /* setting the first lagged value to missing if >>> the first date for the grouped unit */ >>> return = (prc - lag_prc)/ lag_prc; >>> drop lag_prc; >>> run; >>> The end result would be the same dataset with just one more column for >>> returns for each stock, fund, date combination with the first return for >>> the date set to missing. >>> I tried Return.calculate type of funcs from performance analytics and >>> other solutions using plyr or loops, but they are all too bulky. >>> Any ideas how to accomplish this? Would appreciate any hints, >>> Arsenio >>> _______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >>> -- Subscriber-posting only. If you want to post, subscribe first. >>> -- Also note that this is not the r-help list where general R questions >>> should go. >>> -- >>> Best regards, >>> Arsenio mailto:[hidden email] >>> _______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >>> -- Subscriber-posting only. If you want to post, subscribe first. >>> -- Also note that this is not the r-help list where general R questions >>> should go. >> -- >> Matthew Clegg >> [hidden email] >> [[alternative HTML version deleted]] >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions should go. -- Best regards, Arsenio mailto:[hidden email] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
In reply to this post by Matthew Clegg
Matthew,
Thanks for the code! I have been trying to analyze this, seems to work ok. It's a bit hard to understand this for me, still a newbie at R. Thanks, Arsenio Tuesday, March 6, 2012, 5:38:47 PM, you wrote: > Here is another approach that might work for you. > apply_by_groups <- function (T, cols, F, ...) { > # On input, T is a dataframe, and cols is a list of column names. > # Groups the dataframe T into groups according to the specified > # column names, e.g., all rows in a given group will have identical > # values in the specified columns. Within each group, applies the > # function F to the dataframe representing that group. Returns the > # list of results. > > NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] > change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } > starts <- which(c(TRUE, sapply(2:nrow(NT), change))) > ends <- c(starts[2:length(starts)]-1, nrow(NT)) > lapply(1:length(starts), function(k) { > F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) > } > ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} >> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- ret(df$prc); df} )) > ticker date_f date prc iticker ret > 1 A 20-Jun-03 15876 19.06 IXT NA > 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 > 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 > 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 > 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 > 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 > 7 AA 20-Jun-03 15876 26.24 IXB NA > 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 > 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 > 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 > 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 > 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 > 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 > In the call to apply_by_groups, you can replace "ticker" with a > list of names, e.g., c("ticker","fund","manager"), etc. > Matthew Clegg > On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > Robert, > Thanks for the reply. I tried plm approach but apparently it only > accepts one individual id and one time id, because this fails >> zz=pdata.frame(z, c("Stock","Fund","Date")) > Error in match(x, table, nomatch = 0L) : > 'match' requires vector arguments > Any approach that accepts several individual observation > identifiers? I will take a look at the econometrics view on CRAN > Btw, sorry if the post came out mangled due to HTML, didn't check. > Thanks, > Arsenio > Tuesday, March 6, 2012, 5:24:35 AM, you wrote: > Use diff and lag from plm: > Lines="ticker date_f date prc iticker > A 20-Jun-03 15876 19.06 IXT > A 23-Jun-03 15879 18.89 IXT > A 24-Jun-03 15880 18.75 IXT > A 25-Jun-03 15881 18.92 IXT > A 26-Jun-03 15882 19.35 IXT > A 27-Jun-03 15883 19.55 IXT > AA 20-Jun-03 15876 26.24 IXB > AA 23-Jun-03 15879 25.07 IXB > AA 24-Jun-03 15880 24.96 IXB > AA 25-Jun-03 15881 24.51 IXB > AA 26-Jun-03 15882 25.24 IXB > AA 27-Jun-03 15883 25.09 IXB > AA 30-Jun-03 15886 25.5 IXB > " > z <- read.table(textConnection(Lines), header = TRUE) > library(plm) > zz=pdata.frame(z, c("ticker","date")) > pr=zz$prc; str(pr) > zz$return=diff(pr)/lag(pr) > zz > Good luck! > Robert > From: Arsenio > Sent: Tuesday, March 06, 2012 2:47 AM > To: [hidden email] > Subject: [R-SIG-Finance] Return calculation for panel data structure > Dear all, > I have a very simple question and i haven't been able to code it > out. It's a simple return calculation of a form: > R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. > However it's in the panel data, where stocks belong to say a fund or some other grouping variable. > ticker > date_f > date > prc > iticker > A > 20-Jun-03 > 15876 > 19.06 > IXT > A > 23-Jun-03 > 15879 > 18.89 > IXT > A > 24-Jun-03 > 15880 > 18.75 > IXT > A > 25-Jun-03 > 15881 > 18.92 > IXT > A > 26-Jun-03 > 15882 > 19.35 > IXT > A > 27-Jun-03 > 15883 > 19.55 > IXT > AA > 20-Jun-03 > 15876 > 26.24 > IXB > AA > 23-Jun-03 > 15879 > 25.07 > IXB > AA > 24-Jun-03 > 15880 > 24.96 > IXB > AA > 25-Jun-03 > 15881 > 24.51 > IXB > AA > 26-Jun-03 > 15882 > 25.24 > IXB > AA > 27-Jun-03 > 15883 > 25.09 > IXB > AA > 30-Jun-03 > 15886 > 25.5 > IXB > In SAS, I would do it this way: > data work.1; /* This Calcs the excluded index return for INDEX > constructed with PRICES (PR)!!!! */ > set work.1; > by ticker iticker date_f ; /*sorting by stock, fund and date */ > lag_prc=lag(prc); /* creating the lagged price variable > if first.iticker then /* if the it's a first date for the grouped unit then.. */ > lag_prc= . ; /* setting the first lagged value to > missing if the first date for the grouped unit */ > return = (prc - lag_prc)/ lag_prc; > drop lag_prc; > run; > The end result would be the same dataset with just one more column > for returns for each stock, fund, date combination with the first > return for the date set to missing. > I tried Return.calculate type of funcs from performance analytics > and other solutions using plyr or loops, but they are all too bulky. > Any ideas how to accomplish this? Would appreciate any hints, > Arsenio > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. > -- > Best regards, > Arsenio mailto:[hidden email] > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. -- Best regards, Arsenio mailto:[hidden email] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Hi Arsenio,
Glad to help. Here is how the code works. First, it sorts the rows of the dataframe T according to the columns that will be used for grouping. After this step is complete, all of the elements of a group will be stored in consecutive rows in the new table NT. Then, it computes the starting and ending indexes of each group. Row i represents the start of a new group if one of the elements of the grouping columns differs from its value in the previous row i-1. Finally, the function F is applied to each of the groups. If you have further questions, I would be happy to help, although perhaps it would be best to email me directly so as to avoid taking up bandwidth on the list. Matthew Clegg On Thu, Mar 8, 2012 at 11:46 PM, Arsenio <[hidden email]> wrote: > Matthew, > > Thanks for the code! I have been trying to analyze this, seems to work ok. > It's a bit hard to understand this for me, still a newbie at R. > > Thanks, > > Arsenio > > Tuesday, March 6, 2012, 5:38:47 PM, you wrote: > > > Here is another approach that might work for you. > > > apply_by_groups <- function (T, cols, F, ...) { > > # On input, T is a dataframe, and cols is a list of column names. > > # Groups the dataframe T into groups according to the specified > > # column names, e.g., all rows in a given group will have identical > > # values in the specified columns. Within each group, applies the > > # function F to the dataframe representing that group. Returns the > > # list of results. > > > > NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] > > change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } > > starts <- which(c(TRUE, sapply(2:nrow(NT), change))) > > ends <- c(starts[2:length(starts)]-1, nrow(NT)) > > lapply(1:length(starts), function(k) { > > F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) > > } > > ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} > > >> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- > ret(df$prc); df} )) > > ticker date_f date prc iticker ret > > 1 A 20-Jun-03 15876 19.06 IXT NA > > 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 > > 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 > > 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 > > 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 > > 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 > > 7 AA 20-Jun-03 15876 26.24 IXB NA > > 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 > > 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 > > 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 > > 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 > > 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 > > 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 > > > In the call to apply_by_groups, you can replace "ticker" with a > > list of names, e.g., c("ticker","fund","manager"), etc. > > > Matthew Clegg > > > On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > > Robert, > > > Thanks for the reply. I tried plm approach but apparently it only > > accepts one individual id and one time id, because this fails > > >> zz=pdata.frame(z, c("Stock","Fund","Date")) > > Error in match(x, table, nomatch = 0L) : > > 'match' requires vector arguments > > > Any approach that accepts several individual observation > > identifiers? I will take a look at the econometrics view on CRAN > > > Btw, sorry if the post came out mangled due to HTML, didn't check. > > > Thanks, > > > Arsenio > > > Tuesday, March 6, 2012, 5:24:35 AM, you wrote: > > > > Use diff and lag from plm: > > > Lines="ticker date_f date prc iticker > > A 20-Jun-03 15876 19.06 IXT > > A 23-Jun-03 15879 18.89 IXT > > A 24-Jun-03 15880 18.75 IXT > > A 25-Jun-03 15881 18.92 IXT > > A 26-Jun-03 15882 19.35 IXT > > A 27-Jun-03 15883 19.55 IXT > > AA 20-Jun-03 15876 26.24 IXB > > AA 23-Jun-03 15879 25.07 IXB > > AA 24-Jun-03 15880 24.96 IXB > > AA 25-Jun-03 15881 24.51 IXB > > AA 26-Jun-03 15882 25.24 IXB > > AA 27-Jun-03 15883 25.09 IXB > > AA 30-Jun-03 15886 25.5 IXB > > " > > z <- read.table(textConnection(Lines), header = TRUE) > > library(plm) > > zz=pdata.frame(z, c("ticker","date")) > > pr=zz$prc; str(pr) > > zz$return=diff(pr)/lag(pr) > > zz > > > Good luck! > > > Robert > > > From: Arsenio > > Sent: Tuesday, March 06, 2012 2:47 AM > > To: [hidden email] > > Subject: [R-SIG-Finance] Return calculation for panel data structure > > > Dear all, > > > I have a very simple question and i haven't been able to code it > > out. It's a simple return calculation of a form: > > R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. > > However it's in the panel data, where stocks belong to say a fund or > some other grouping variable. > > > > ticker > > date_f > > > date > > prc > > iticker > > A > > 20-Jun-03 > > > 15876 > > 19.06 > > IXT > > A > > 23-Jun-03 > > > 15879 > > 18.89 > > IXT > > A > > 24-Jun-03 > > > 15880 > > 18.75 > > IXT > > A > > 25-Jun-03 > > > 15881 > > 18.92 > > IXT > > A > > 26-Jun-03 > > > 15882 > > 19.35 > > IXT > > A > > 27-Jun-03 > > > 15883 > > 19.55 > > IXT > > AA > > 20-Jun-03 > > > 15876 > > 26.24 > > IXB > > AA > > 23-Jun-03 > > > 15879 > > 25.07 > > IXB > > AA > > 24-Jun-03 > > > 15880 > > 24.96 > > IXB > > AA > > 25-Jun-03 > > > 15881 > > 24.51 > > IXB > > AA > > 26-Jun-03 > > > 15882 > > 25.24 > > IXB > > AA > > 27-Jun-03 > > > 15883 > > 25.09 > > IXB > > AA > > 30-Jun-03 > > > 15886 > > 25.5 > > IXB > > > > > In SAS, I would do it this way: > > > data work.1; /* This Calcs the excluded index return for INDEX > > constructed with PRICES (PR)!!!! */ > > set work.1; > > by ticker iticker date_f ; /*sorting by stock, fund and date */ > > > lag_prc=lag(prc); /* creating the lagged price variable > > > if first.iticker then /* if the it's a first date for the grouped > unit then.. */ > > lag_prc= . ; /* setting the first lagged value to > > missing if the first date for the grouped unit */ > > return = (prc - lag_prc)/ lag_prc; > > drop lag_prc; > > run; > > > The end result would be the same dataset with just one more column > > for returns for each stock, fund, date combination with the first > > return for the date set to missing. > > I tried Return.calculate type of funcs from performance analytics > > and other solutions using plyr or loops, but they are all too bulky. > > > Any ideas how to accomplish this? Would appreciate any hints, > > > Arsenio > > > _______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > > -- Subscriber-posting only. If you want to post, subscribe first. > > -- Also note that this is not the r-help list where general R questions > should go. > > > > > > -- > > Best regards, > > Arsenio mailto:[hidden email] > > > _______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > > -- Subscriber-posting only. If you want to post, subscribe first. > > -- Also note that this is not the r-help list where general R questions > should go. > > > > > > > -- > Best regards, > Arsenio mailto:[hidden email] > > -- Matthew Clegg [hidden email] [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
In reply to this post by Arsenio
It should work, but you didn't provide the data that it is breaking on, so...
Hijacking the iticker column, we can make that our second identifier to group with: z <- rbind(z,transform(z,iticker=paste(iticker,2,sep=""))) #... some z to see what this is doing # #11 AA 26-Jun-03 15882 25.24 IXB #12 AA 27-Jun-03 15883 25.09 IXB #13 AA 30-Jun-03 15886 25.50 IXB #14 A 20-Jun-03 15876 19.06 IXT2 #15 A 23-Jun-03 15879 18.89 IXT2 #16 A 24-Jun-03 15880 18.75 IXT2 #17 A 25-Jun-03 15881 18.92 IXT2 #... And now to get return calculations by ticker, iticker do this: > cbind(z,do.call(rbind,lapply(split(z,list(z$ticker,z$iticker)), function(x) Delt(x$prc)))) ticker date_f date prc iticker Delt.1.arithmetic 1 A 20-Jun-03 15876 19.06 IXT NA 2 A 23-Jun-03 15879 18.89 IXT -0.044588415 3 A 24-Jun-03 15880 18.75 IXT -0.004387714 4 A 25-Jun-03 15881 18.92 IXT -0.018028846 5 A 26-Jun-03 15882 19.35 IXT 0.029783762 6 A 27-Jun-03 15883 19.55 IXT -0.005942948 7 AA 20-Jun-03 15876 26.24 IXB 0.016341172 8 AA 23-Jun-03 15879 25.07 IXB NA 9 AA 24-Jun-03 15880 24.96 IXB -0.008919203 10 AA 25-Jun-03 15881 24.51 IXB -0.007411329 11 AA 26-Jun-03 15882 25.24 IXB 0.009066667 12 AA 27-Jun-03 15883 25.09 IXB 0.022727273 13 AA 30-Jun-03 15886 25.50 IXB 0.010335917 14 A 20-Jun-03 15876 19.06 IXT2 NA 15 A 23-Jun-03 15879 18.89 IXT2 -0.008919203 16 A 24-Jun-03 15880 18.75 IXT2 -0.007411329 17 A 25-Jun-03 15881 18.92 IXT2 0.009066667 18 A 26-Jun-03 15882 19.35 IXT2 0.022727273 19 A 27-Jun-03 15883 19.55 IXT2 0.010335917 20 AA 20-Jun-03 15876 26.24 IXB2 NA 21 AA 23-Jun-03 15879 25.07 IXB2 -0.044588415 22 AA 24-Jun-03 15880 24.96 IXB2 -0.004387714 23 AA 25-Jun-03 15881 24.51 IXB2 -0.018028846 24 AA 26-Jun-03 15882 25.24 IXB2 0.029783762 25 AA 27-Jun-03 15883 25.09 IXB2 -0.005942948 26 AA 30-Jun-03 15886 25.50 IXB2 0.016341172 If you have a missing value in one of your identifiers (a factor here), you will get an error, but na.omit() called on 'z' will remove it before processing. If you don't have the value I don't see how you could properly group it anyway though. As far as 'base R being too complicated for SAS users' ;-) ... I still think you are much better off sticking as close to the core language as you can - as that gives you the most understanding and flexibility. You could also functionalize the scary parts: make_it_happen <- function(z) { require("quantmod") cbind(z,do.call(rbind,lapply(split(z,list(z$ticker,z$iticker)), function(x) Delt(x$prc)))) } make_it_happen(z) Joking aside, the code is actually not bad if you take it apart as Matthew did with his. 1) split(z, list(z$ticker, z$iticker)) returns a list of data.frames that are grouped by your identifiers (factors in R-speak) 2) lapply uses each element in the list and applies the function(x) ... to it. 3) do.call(rbind takes the list returned by lapply (columns of returns) and row binds them 4) cbind just attached the new column from (3) to the original 'z' With that, we've diverged pretty hard from "finance" - so further questions should probably be sent to R-help. HTH Jeff On Thu, Mar 8, 2012 at 10:43 PM, Arsenio <[hidden email]> wrote: > Jeffrey, > > Thanks a lot! > I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ? > > Warning message: > In split.default(seq_len(nrow(x)), f, drop = drop, ...) : > data length is not a multiple of split variable > > I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated. > > Thanks for the Delt function that's quite a shortcut! > > Arsenio > > > > Wednesday, March 7, 2012, 8:14:31 AM, you wrote: > >> Built into R is ?split or ?aggregate and ?lapply which should serve you well. > >> Using base R is often the fastest way to skin the cat, if not the prettiest ;-) > > >>> library(quantmod) # for Delt >>> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc)))) > >> ticker date_f date prc iticker Delt.1.arithmetic >> 1 A 20-Jun-03 15876 19.06 IXT NA >> 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 >> 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 >> 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 >> 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 >> 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 >> 7 AA 20-Jun-03 15876 26.24 IXB NA >> 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 >> 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 >> 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 >> 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 >> 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 >> 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 > >> See ?split for details on multiple identifiers ( just a list, but your >> example didn't include it so I won't either) > >> The do.call(rbind/cbind stuff is just to get the original data.frame >> back - there are more verbose ways to do the same thing. > >> Jeff > > >> On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg >> <[hidden email]> wrote: >>> Here is another approach that might work for you. > >>> apply_by_groups <- function (T, cols, F, ...) { >>> # On input, T is a dataframe, and cols is a list of column names. >>> # Groups the dataframe T into groups according to the specified >>> # column names, e.g., all rows in a given group will have identical >>> # values in the specified columns. Within each group, applies the >>> # function F to the dataframe representing that group. Returns the >>> # list of results. > >>> NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),] >>> change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) } >>> starts <- which(c(TRUE, sapply(2:nrow(NT), change))) >>> ends <- c(starts[2:length(starts)]-1, nrow(NT)) >>> lapply(1:length(starts), function(k) { >>> F(NT[starts[k]:ends[k],,drop=FALSE], ...) }) >>> } >>> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)} > >>>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <- >>> ret(df$prc); df} )) >>> ticker date_f date prc iticker ret >>> 1 A 20-Jun-03 15876 19.06 IXT NA >>> 2 A 23-Jun-03 15879 18.89 IXT -0.008919203 >>> 3 A 24-Jun-03 15880 18.75 IXT -0.007411329 >>> 4 A 25-Jun-03 15881 18.92 IXT 0.009066667 >>> 5 A 26-Jun-03 15882 19.35 IXT 0.022727273 >>> 6 A 27-Jun-03 15883 19.55 IXT 0.010335917 >>> 7 AA 20-Jun-03 15876 26.24 IXB NA >>> 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415 >>> 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714 >>> 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846 >>> 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762 >>> 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948 >>> 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172 > >>> In the call to apply_by_groups, you can replace "ticker" with a list of >>> names, e.g., c("ticker","fund","manager"), etc. > >>> Matthew Clegg > >>> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <[hidden email]> wrote: > >>>> Robert, > >>>> Thanks for the reply. I tried plm approach but apparently it only accepts >>>> one individual id and one time id, because this fails > >>>> > zz=pdata.frame(z, c("Stock","Fund","Date")) >>>> Error in match(x, table, nomatch = 0L) : >>>> 'match' requires vector arguments > >>>> Any approach that accepts several individual observation identifiers? I >>>> will take a look at the econometrics view on CRAN > >>>> Btw, sorry if the post came out mangled due to HTML, didn't check. > >>>> Thanks, > >>>> Arsenio > >>>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote: > > >>>> Use diff and lag from plm: > >>>> Lines="ticker date_f date prc iticker >>>> A 20-Jun-03 15876 19.06 IXT >>>> A 23-Jun-03 15879 18.89 IXT >>>> A 24-Jun-03 15880 18.75 IXT >>>> A 25-Jun-03 15881 18.92 IXT >>>> A 26-Jun-03 15882 19.35 IXT >>>> A 27-Jun-03 15883 19.55 IXT >>>> AA 20-Jun-03 15876 26.24 IXB >>>> AA 23-Jun-03 15879 25.07 IXB >>>> AA 24-Jun-03 15880 24.96 IXB >>>> AA 25-Jun-03 15881 24.51 IXB >>>> AA 26-Jun-03 15882 25.24 IXB >>>> AA 27-Jun-03 15883 25.09 IXB >>>> AA 30-Jun-03 15886 25.5 IXB >>>> " >>>> z <- read.table(textConnection(Lines), header = TRUE) >>>> library(plm) >>>> zz=pdata.frame(z, c("ticker","date")) >>>> pr=zz$prc; str(pr) >>>> zz$return=diff(pr)/lag(pr) >>>> zz > >>>> Good luck! > >>>> Robert > >>>> From: Arsenio >>>> Sent: Tuesday, March 06, 2012 2:47 AM >>>> To: [hidden email] >>>> Subject: [R-SIG-Finance] Return calculation for panel data structure > >>>> Dear all, > >>>> I have a very simple question and i haven't been able to code it out. It's >>>> a simple return calculation of a form: >>>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one. >>>> However it's in the panel data, where stocks belong to say a fund or some >>>> other grouping variable. > > >>>> ticker >>>> date_f > >>>> date >>>> prc >>>> iticker >>>> A >>>> 20-Jun-03 > >>>> 15876 >>>> 19.06 >>>> IXT >>>> A >>>> 23-Jun-03 > >>>> 15879 >>>> 18.89 >>>> IXT >>>> A >>>> 24-Jun-03 > >>>> 15880 >>>> 18.75 >>>> IXT >>>> A >>>> 25-Jun-03 > >>>> 15881 >>>> 18.92 >>>> IXT >>>> A >>>> 26-Jun-03 > >>>> 15882 >>>> 19.35 >>>> IXT >>>> A >>>> 27-Jun-03 > >>>> 15883 >>>> 19.55 >>>> IXT >>>> AA >>>> 20-Jun-03 > >>>> 15876 >>>> 26.24 >>>> IXB >>>> AA >>>> 23-Jun-03 > >>>> 15879 >>>> 25.07 >>>> IXB >>>> AA >>>> 24-Jun-03 > >>>> 15880 >>>> 24.96 >>>> IXB >>>> AA >>>> 25-Jun-03 > >>>> 15881 >>>> 24.51 >>>> IXB >>>> AA >>>> 26-Jun-03 > >>>> 15882 >>>> 25.24 >>>> IXB >>>> AA >>>> 27-Jun-03 > >>>> 15883 >>>> 25.09 >>>> IXB >>>> AA >>>> 30-Jun-03 > >>>> 15886 >>>> 25.5 >>>> IXB > > > >>>> In SAS, I would do it this way: > >>>> data work.1; /* This Calcs the excluded index return for INDEX constructed >>>> with PRICES (PR)!!!! */ >>>> set work.1; >>>> by ticker iticker date_f ; /*sorting by stock, fund and date */ > >>>> lag_prc=lag(prc); /* creating the lagged price variable > >>>> if first.iticker then /* if the it's a first date for the grouped >>>> unit then.. */ >>>> lag_prc= . ; /* setting the first lagged value to missing if >>>> the first date for the grouped unit */ >>>> return = (prc - lag_prc)/ lag_prc; >>>> drop lag_prc; >>>> run; > >>>> The end result would be the same dataset with just one more column for >>>> returns for each stock, fund, date combination with the first return for >>>> the date set to missing. >>>> I tried Return.calculate type of funcs from performance analytics and >>>> other solutions using plyr or loops, but they are all too bulky. > >>>> Any ideas how to accomplish this? Would appreciate any hints, > >>>> Arsenio > >>>> _______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >>>> -- Subscriber-posting only. If you want to post, subscribe first. >>>> -- Also note that this is not the r-help list where general R questions >>>> should go. > > > > >>>> -- >>>> Best regards, >>>> Arsenio mailto:[hidden email] > >>>> _______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >>>> -- Subscriber-posting only. If you want to post, subscribe first. >>>> -- Also note that this is not the r-help list where general R questions >>>> should go. > > > > >>> -- >>> Matthew Clegg >>> [hidden email] > >>> [[alternative HTML version deleted]] > >>> _______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >>> -- Subscriber-posting only. If you want to post, subscribe first. >>> -- Also note that this is not the r-help list where general R questions should go. > > > > > > > -- > Best regards, > Arsenio mailto:[hidden email] > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. -- Jeffrey Ryan [hidden email] www.lemnica.com www.esotericR.com R/Finance 2012: Applied Finance with R www.RinFinance.com See you in Chicago!!!! _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Free forum by Nabble | Edit this page |