Hi,
This question is of a general nature: How do people handle panel data in R? For example, I have returns of firms and each firm has daily observations. One way is to use the plm package.. another is to use plyr and just do the operations on (date, firmid) units using something like zoo as a container for each firm so that lagging and differencing can be done. For regression it seems that plm might be the better option? Just curious if somebody has a well worked out system for this. Thanks Alex _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
What kind of models do plan on using?
If you plan on using time series models, then I suggest generating a list where each entry is one firm. This will make it easy to fit models with lapply. If you plan on using panel models, then I suggest using PLM. It is easy enough to manually code within and between estimators, but if you use clustered standard errors or dynamic panel models, then PLM will make you life a lot easier. Richard Herron On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov <[hidden email]> wrote: > > Hi, > This question is of a general nature: How do people handle panel data > in R? For example, I have returns of firms and each firm has daily > observations. One way is to use the plm package.. another is to use > plyr and just do the operations on (date, firmid) units using > something like zoo as a container for each firm so that lagging and > differencing can be done. For regression it seems that plm might be > the better option? Just curious if somebody has a well worked out > system for this. > > Thanks > Alex > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Hi Richard,
Thanks for your response. One issue I have run into with PLM is it seems to be fairly slow with large data sets (14 mil date, firm points). Any tricks with this? Also, it seems to not handle irregularly spaced time points.. it fills in the missing ones with NA so when doing lagging or differencing things don't work correctly. Do you have any advice on fixing this? Thanks, Alex On Sat, May 5, 2012 at 8:43 AM, Richard Herron <[hidden email]> wrote: > What kind of models do plan on using? > > If you plan on using time series models, then I suggest generating a > list where each entry is one firm. This will make it easy to fit > models with lapply. > > If you plan on using panel models, then I suggest using PLM. It is > easy enough to manually code within and between estimators, but if you > use clustered standard errors or dynamic panel models, then PLM will > make you life a lot easier. > > Richard Herron > > > On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov > <[hidden email]> wrote: >> >> Hi, >> This question is of a general nature: How do people handle panel data >> in R? For example, I have returns of firms and each firm has daily >> observations. One way is to use the plm package.. another is to use >> plyr and just do the operations on (date, firmid) units using >> something like zoo as a container for each firm so that lagging and >> differencing can be done. For regression it seems that plm might be >> the better option? Just curious if somebody has a well worked out >> system for this. >> >> Thanks >> Alex >> >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions should go. _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
What kind of models are you estimating? I would use PLM if I were
doing models with firm fixed effects (FE). But I don't think I see firm FE with daily observations. I usually see firm FE at the annual level. If you're either estimating time series models or aggregating daily observations to the month-level for cross-sectional models, then a list of firm-level time series would be best (or if you're only using the return series you could put this in one wide xts or zoo object). Re: missing data. xts has -na.locf- for carrying forward the last non-missing observation. I tend to leave missing observations as missing. Could you provide an example of what you would like to estimate? Richard Herron On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov <[hidden email]> wrote: > Hi Richard, > Thanks for your response. One issue I have run into with PLM is it > seems to be fairly slow with large data sets (14 mil date, firm > points). Any tricks with this? Also, it seems to not handle > irregularly spaced time points.. it fills in the missing ones with NA > so when doing lagging or differencing things don't work correctly. Do > you have any advice on fixing this? > > Thanks, > Alex > > On Sat, May 5, 2012 at 8:43 AM, Richard Herron > <[hidden email]> wrote: >> What kind of models do plan on using? >> >> If you plan on using time series models, then I suggest generating a >> list where each entry is one firm. This will make it easy to fit >> models with lapply. >> >> If you plan on using panel models, then I suggest using PLM. It is >> easy enough to manually code within and between estimators, but if you >> use clustered standard errors or dynamic panel models, then PLM will >> make you life a lot easier. >> >> Richard Herron >> >> >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov >> <[hidden email]> wrote: >>> >>> Hi, >>> This question is of a general nature: How do people handle panel data >>> in R? For example, I have returns of firms and each firm has daily >>> observations. One way is to use the plm package.. another is to use >>> plyr and just do the operations on (date, firmid) units using >>> something like zoo as a container for each firm so that lagging and >>> differencing can be done. For regression it seems that plm might be >>> the better option? Just curious if somebody has a well worked out >>> system for this. >>> >>> Thanks >>> Alex >>> >>> _______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >>> -- Subscriber-posting only. If you want to post, subscribe first. >>> -- Also note that this is not the r-help list where general R questions should go. _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Sure. I will be using fixed effects for some things. I will mostly be
running regressions (sometimes fixed effect but a lot of the time they will be fama-macbeth type) but the key thing I am looking for is the ability to lag things on the fly without having to run an apply statement to split everything up to a list and lag each firm individually, recombine the list and then run a regression. I currently use zoo but I was under the impression that there is some limit to the number of columns one can have, no? With 30k firms it might not be possible to have such a wide zoo object... am I incorrect about this? On Sat, May 5, 2012 at 3:08 PM, Richard Herron <[hidden email]>wrote: > What kind of models are you estimating? I would use PLM if I were > doing models with firm fixed effects (FE). But I don't think I see > firm FE with daily observations. I usually see firm FE at the annual > level. > > If you're either estimating time series models or aggregating daily > observations to the month-level for cross-sectional models, then a > list of firm-level time series would be best (or if you're only using > the return series you could put this in one wide xts or zoo object). > > Re: missing data. xts has -na.locf- for carrying forward the last > non-missing observation. I tend to leave missing observations as > missing. > > Could you provide an example of what you would like to estimate? > > Richard Herron > > > On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov > <[hidden email]> wrote: > > Hi Richard, > > Thanks for your response. One issue I have run into with PLM is it > > seems to be fairly slow with large data sets (14 mil date, firm > > points). Any tricks with this? Also, it seems to not handle > > irregularly spaced time points.. it fills in the missing ones with NA > > so when doing lagging or differencing things don't work correctly. Do > > you have any advice on fixing this? > > > > Thanks, > > Alex > > > > On Sat, May 5, 2012 at 8:43 AM, Richard Herron > > <[hidden email]> wrote: > >> What kind of models do plan on using? > >> > >> If you plan on using time series models, then I suggest generating a > >> list where each entry is one firm. This will make it easy to fit > >> models with lapply. > >> > >> If you plan on using panel models, then I suggest using PLM. It is > >> easy enough to manually code within and between estimators, but if you > >> use clustered standard errors or dynamic panel models, then PLM will > >> make you life a lot easier. > >> > >> Richard Herron > >> > >> > >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov > >> <[hidden email]> wrote: > >>> > >>> Hi, > >>> This question is of a general nature: How do people handle panel data > >>> in R? For example, I have returns of firms and each firm has daily > >>> observations. One way is to use the plm package.. another is to use > >>> plyr and just do the operations on (date, firmid) units using > >>> something like zoo as a container for each firm so that lagging and > >>> differencing can be done. For regression it seems that plm might be > >>> the better option? Just curious if somebody has a well worked out > >>> system for this. > >>> > >>> Thanks > >>> Alex > >>> > >>> _______________________________________________ > >>> [hidden email] mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance > >>> -- Subscriber-posting only. If you want to post, subscribe first. > >>> -- Also note that this is not the r-help list where general R > questions should go. > [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
I think I would work with the daily data in lists of xts objects (or
one wide xts if only return series), but once I aggregated to the month/year level I would use plm. I don't know of a width limit for xts or data.frame, but I never go close to 30,000. I put each security as an xts object in a list. I typically see people aggregate daily data to monthly data, requiring that there are at least 15 or so daily observations (e.g., generating idiosyncratic vol from a daily return series). This should generate a (unbalanced) panel that you can feed to plm. You can specify formulas with lags on the fly. If you try to lag a variable, but the lag isn't there, plm drops the observation. To use the time series operators in plm estimators you just have to properly format your data (either i,t in the first two columns or use -plm.data-). Richard Herron On Sat, May 5, 2012 at 3:14 PM, Alexander Chernyakov <[hidden email]> wrote: > Sure. I will be using fixed effects for some things. I will mostly be > running regressions (sometimes fixed effect but a lot of the time they will > be fama-macbeth type) but the key thing I am looking for is the ability to > lag things on the fly without having to run an apply statement to split > everything up to a list and lag each firm individually, recombine the list > and then run a regression. > > I currently use zoo but I was under the impression that there is some limit > to the number of columns one can have, no? With 30k firms it might not be > possible to have such a wide zoo object... am I incorrect about this? > > > On Sat, May 5, 2012 at 3:08 PM, Richard Herron <[hidden email]> > wrote: >> >> What kind of models are you estimating? I would use PLM if I were >> doing models with firm fixed effects (FE). But I don't think I see >> firm FE with daily observations. I usually see firm FE at the annual >> level. >> >> If you're either estimating time series models or aggregating daily >> observations to the month-level for cross-sectional models, then a >> list of firm-level time series would be best (or if you're only using >> the return series you could put this in one wide xts or zoo object). >> >> Re: missing data. xts has -na.locf- for carrying forward the last >> non-missing observation. I tend to leave missing observations as >> missing. >> >> Could you provide an example of what you would like to estimate? >> >> Richard Herron >> >> >> On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov >> <[hidden email]> wrote: >> > Hi Richard, >> > Thanks for your response. One issue I have run into with PLM is it >> > seems to be fairly slow with large data sets (14 mil date, firm >> > points). Any tricks with this? Also, it seems to not handle >> > irregularly spaced time points.. it fills in the missing ones with NA >> > so when doing lagging or differencing things don't work correctly. Do >> > you have any advice on fixing this? >> > >> > Thanks, >> > Alex >> > >> > On Sat, May 5, 2012 at 8:43 AM, Richard Herron >> > <[hidden email]> wrote: >> >> What kind of models do plan on using? >> >> >> >> If you plan on using time series models, then I suggest generating a >> >> list where each entry is one firm. This will make it easy to fit >> >> models with lapply. >> >> >> >> If you plan on using panel models, then I suggest using PLM. It is >> >> easy enough to manually code within and between estimators, but if you >> >> use clustered standard errors or dynamic panel models, then PLM will >> >> make you life a lot easier. >> >> >> >> Richard Herron >> >> >> >> >> >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov >> >> <[hidden email]> wrote: >> >>> >> >>> Hi, >> >>> This question is of a general nature: How do people handle panel data >> >>> in R? For example, I have returns of firms and each firm has daily >> >>> observations. One way is to use the plm package.. another is to use >> >>> plyr and just do the operations on (date, firmid) units using >> >>> something like zoo as a container for each firm so that lagging and >> >>> differencing can be done. For regression it seems that plm might be >> >>> the better option? Just curious if somebody has a well worked out >> >>> system for this. >> >>> >> >>> Thanks >> >>> Alex >> >>> >> >>> _______________________________________________ >> >>> [hidden email] mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> >>> -- Subscriber-posting only. If you want to post, subscribe first. >> >>> -- Also note that this is not the r-help list where general R >> >>> questions should go. > > _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Interesting, thank you! do you find it faster to do lists of xts objects or
use plyr on dataframes (converting to a zoo, then doing what you want, converting back to a dataframe and returning which is what i currently do)? thanks alex On Sat, May 5, 2012 at 6:47 PM, Richard Herron <[hidden email]>wrote: > I think I would work with the daily data in lists of xts objects (or > one wide xts if only return series), but once I aggregated to the > month/year level I would use plm. I don't know of a width limit for > xts or data.frame, but I never go close to 30,000. I put each security > as an xts object in a list. > > I typically see people aggregate daily data to monthly data, requiring > that there are at least 15 or so daily observations (e.g., generating > idiosyncratic vol from a daily return series). This should generate a > (unbalanced) panel that you can feed to plm. You can specify formulas > with lags on the fly. If you try to lag a variable, but the lag isn't > there, plm drops the observation. > > To use the time series operators in plm estimators you just have to > properly format your data (either i,t in the first two columns or use > -plm.data-). > > Richard Herron > > > On Sat, May 5, 2012 at 3:14 PM, Alexander Chernyakov > <[hidden email]> wrote: > > Sure. I will be using fixed effects for some things. I will mostly be > > running regressions (sometimes fixed effect but a lot of the time they > will > > be fama-macbeth type) but the key thing I am looking for is the ability > to > > lag things on the fly without having to run an apply statement to split > > everything up to a list and lag each firm individually, recombine the > list > > and then run a regression. > > > > I currently use zoo but I was under the impression that there is some > limit > > to the number of columns one can have, no? With 30k firms it might not > be > > possible to have such a wide zoo object... am I incorrect about this? > > > > > > On Sat, May 5, 2012 at 3:08 PM, Richard Herron < > [hidden email]> > > wrote: > >> > >> What kind of models are you estimating? I would use PLM if I were > >> doing models with firm fixed effects (FE). But I don't think I see > >> firm FE with daily observations. I usually see firm FE at the annual > >> level. > >> > >> If you're either estimating time series models or aggregating daily > >> observations to the month-level for cross-sectional models, then a > >> list of firm-level time series would be best (or if you're only using > >> the return series you could put this in one wide xts or zoo object). > >> > >> Re: missing data. xts has -na.locf- for carrying forward the last > >> non-missing observation. I tend to leave missing observations as > >> missing. > >> > >> Could you provide an example of what you would like to estimate? > >> > >> Richard Herron > >> > >> > >> On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov > >> <[hidden email]> wrote: > >> > Hi Richard, > >> > Thanks for your response. One issue I have run into with PLM is it > >> > seems to be fairly slow with large data sets (14 mil date, firm > >> > points). Any tricks with this? Also, it seems to not handle > >> > irregularly spaced time points.. it fills in the missing ones with NA > >> > so when doing lagging or differencing things don't work correctly. Do > >> > you have any advice on fixing this? > >> > > >> > Thanks, > >> > Alex > >> > > >> > On Sat, May 5, 2012 at 8:43 AM, Richard Herron > >> > <[hidden email]> wrote: > >> >> What kind of models do plan on using? > >> >> > >> >> If you plan on using time series models, then I suggest generating a > >> >> list where each entry is one firm. This will make it easy to fit > >> >> models with lapply. > >> >> > >> >> If you plan on using panel models, then I suggest using PLM. It is > >> >> easy enough to manually code within and between estimators, but if > you > >> >> use clustered standard errors or dynamic panel models, then PLM will > >> >> make you life a lot easier. > >> >> > >> >> Richard Herron > >> >> > >> >> > >> >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov > >> >> <[hidden email]> wrote: > >> >>> > >> >>> Hi, > >> >>> This question is of a general nature: How do people handle panel > data > >> >>> in R? For example, I have returns of firms and each firm has daily > >> >>> observations. One way is to use the plm package.. another is to use > >> >>> plyr and just do the operations on (date, firmid) units using > >> >>> something like zoo as a container for each firm so that lagging and > >> >>> differencing can be done. For regression it seems that plm might be > >> >>> the better option? Just curious if somebody has a well worked out > >> >>> system for this. > >> >>> > >> >>> Thanks > >> >>> Alex > >> >>> > >> >>> _______________________________________________ > >> >>> [hidden email] mailing list > >> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance > >> >>> -- Subscriber-posting only. If you want to post, subscribe first. > >> >>> -- Also note that this is not the r-help list where general R > >> >>> questions should go. > > > > > [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
I think the splitting is fairly expensive computationally (esp if you
have 14mm observations and 30k firms), so you probably want to minimize the amount of splits and recombines that you do. If you can wrap everything in one function (i.e., convert to xts/zoo and do all transformations) that you can call from -ddply-, then the two techniques should be roughly equivalent. If you -ddply- over and over again, then you're better off creating a list, then applying the functions to the list. Richard Herron On Sat, May 5, 2012 at 7:06 PM, Alexander Chernyakov <[hidden email]> wrote: > Interesting, thank you! do you find it faster to do lists of xts objects or > use plyr on dataframes (converting to a zoo, then doing what you want, > converting back to a dataframe and returning which is what i currently do)? > > thanks > alex > > > On Sat, May 5, 2012 at 6:47 PM, Richard Herron <[hidden email]> > wrote: >> >> I think I would work with the daily data in lists of xts objects (or >> one wide xts if only return series), but once I aggregated to the >> month/year level I would use plm. I don't know of a width limit for >> xts or data.frame, but I never go close to 30,000. I put each security >> as an xts object in a list. >> >> I typically see people aggregate daily data to monthly data, requiring >> that there are at least 15 or so daily observations (e.g., generating >> idiosyncratic vol from a daily return series). This should generate a >> (unbalanced) panel that you can feed to plm. You can specify formulas >> with lags on the fly. If you try to lag a variable, but the lag isn't >> there, plm drops the observation. >> >> To use the time series operators in plm estimators you just have to >> properly format your data (either i,t in the first two columns or use >> -plm.data-). >> >> Richard Herron >> >> >> On Sat, May 5, 2012 at 3:14 PM, Alexander Chernyakov >> <[hidden email]> wrote: >> > Sure. I will be using fixed effects for some things. I will mostly be >> > running regressions (sometimes fixed effect but a lot of the time they >> > will >> > be fama-macbeth type) but the key thing I am looking for is the ability >> > to >> > lag things on the fly without having to run an apply statement to split >> > everything up to a list and lag each firm individually, recombine the >> > list >> > and then run a regression. >> > >> > I currently use zoo but I was under the impression that there is some >> > limit >> > to the number of columns one can have, no? With 30k firms it might not >> > be >> > possible to have such a wide zoo object... am I incorrect about this? >> > >> > >> > On Sat, May 5, 2012 at 3:08 PM, Richard Herron >> > <[hidden email]> >> > wrote: >> >> >> >> What kind of models are you estimating? I would use PLM if I were >> >> doing models with firm fixed effects (FE). But I don't think I see >> >> firm FE with daily observations. I usually see firm FE at the annual >> >> level. >> >> >> >> If you're either estimating time series models or aggregating daily >> >> observations to the month-level for cross-sectional models, then a >> >> list of firm-level time series would be best (or if you're only using >> >> the return series you could put this in one wide xts or zoo object). >> >> >> >> Re: missing data. xts has -na.locf- for carrying forward the last >> >> non-missing observation. I tend to leave missing observations as >> >> missing. >> >> >> >> Could you provide an example of what you would like to estimate? >> >> >> >> Richard Herron >> >> >> >> >> >> On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov >> >> <[hidden email]> wrote: >> >> > Hi Richard, >> >> > Thanks for your response. One issue I have run into with PLM is it >> >> > seems to be fairly slow with large data sets (14 mil date, firm >> >> > points). Any tricks with this? Also, it seems to not handle >> >> > irregularly spaced time points.. it fills in the missing ones with NA >> >> > so when doing lagging or differencing things don't work correctly. >> >> > Do >> >> > you have any advice on fixing this? >> >> > >> >> > Thanks, >> >> > Alex >> >> > >> >> > On Sat, May 5, 2012 at 8:43 AM, Richard Herron >> >> > <[hidden email]> wrote: >> >> >> What kind of models do plan on using? >> >> >> >> >> >> If you plan on using time series models, then I suggest generating a >> >> >> list where each entry is one firm. This will make it easy to fit >> >> >> models with lapply. >> >> >> >> >> >> If you plan on using panel models, then I suggest using PLM. It is >> >> >> easy enough to manually code within and between estimators, but if >> >> >> you >> >> >> use clustered standard errors or dynamic panel models, then PLM will >> >> >> make you life a lot easier. >> >> >> >> >> >> Richard Herron >> >> >> >> >> >> >> >> >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov >> >> >> <[hidden email]> wrote: >> >> >>> >> >> >>> Hi, >> >> >>> This question is of a general nature: How do people handle panel >> >> >>> data >> >> >>> in R? For example, I have returns of firms and each firm has >> >> >>> daily >> >> >>> observations. One way is to use the plm package.. another is to >> >> >>> use >> >> >>> plyr and just do the operations on (date, firmid) units using >> >> >>> something like zoo as a container for each firm so that lagging and >> >> >>> differencing can be done. For regression it seems that plm might >> >> >>> be >> >> >>> the better option? Just curious if somebody has a well worked out >> >> >>> system for this. >> >> >>> >> >> >>> Thanks >> >> >>> Alex >> >> >>> >> >> >>> _______________________________________________ >> >> >>> [hidden email] mailing list >> >> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> >> >>> -- Subscriber-posting only. If you want to post, subscribe first. >> >> >>> -- Also note that this is not the r-help list where general R >> >> >>> questions should go. >> > >> > > > _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
Powered by Nabble | Edit this page |