using yahoo and other data to calculate CAPM and FF betas

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

using yahoo and other data to calculate CAPM and FF betas

Andrew West
Thanks to some help from others on the list last year
I was able to improve a function for calculating CAPM
beta coefficients and Fama-French calculations, using
data gathered from the internet.

I have begun working on a more challenging calculation
of Fama-French betas, using a panel data set. After a
couple of days, I now have a roughly working function
allowing one to give a list of stocks (should be
within same industry) to estimate coefficients using
mixed-effects models and compare this to least-squares
modelling. The problem right now is that the function
is not very smart, and I haven't been able to figure
out how to prevent the function from crashing when one
of the companies in the list has a late start or
missing data. Aligning multiple time series into a
panel data dataframe is tough for non-programmers like
me!

The first function is getrffBeta, requiring a number
of packages. It's activated by typing something like:
getrffBeta("GE", 60)
[this indicates you want to analyse GE, using a
rolling 60 month window.]

The second, rougher function is getpanelBeta. It's
activated by typing something like:
getpanelbeta(c("BNI","CSX","NSC","UNP"),"2000-01-01")
[the list of stocks and the starting date of your
analysis]
I hope this proves to be of use to someone, and would
welcome any feedback and/or improvements regarding
this code.

Because YahooMail destroys the code formatting, I'm
attaching the functions as 2 text (r) files.

Regards,
Andrew West


__________________________________________________



_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance

getrffBeta.r (6K) Download Attachment
getpanelbeta.r (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Gabor Grothendieck
On 4/6/06, Andrew West <[hidden email]> wrote:
> I haven't been able to figure
> out how to prevent the function from crashing when one
> of the companies in the list has a late start or
> missing data. Aligning multiple time series into a
> panel data dataframe is tough for non-programmers like
> me!

If t1 and t2 are two ts class time series or two zoo series
then cbind(t1, t2) will create a multivariate series (2 columns)
In the case of zoo, merge(t1, t2) will also work.

na.omit(cbind(t1, t2)) or na.omit(merge(t1,t2))
will eliminate rows that have any NAs in the case of zoo series.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Krishna Kumar-2
Gabor Grothendieck wrote:

>On 4/6/06, Andrew West <[hidden email]> wrote:
>  
>
>>I haven't been able to figure
>>out how to prevent the function from crashing when one
>>of the companies in the list has a late start or
>>missing data. Aligning multiple time series into a
>>panel data dataframe is tough for non-programmers like
>>me!
>>    
>>
>
>If t1 and t2 are two ts class time series or two zoo series
>then cbind(t1, t2) will create a multivariate series (2 columns)
>In the case of zoo, merge(t1, t2) will also work.
>
>na.omit(cbind(t1, t2)) or na.omit(merge(t1,t2))
>will eliminate rows that have any NAs in the case of zoo series.
>
>_______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>
>  
>

To add to Gabor's suggestion you could do the following to get an
approximated series..
so if mydata is a vector with "NA" 's then doing

 >mydata<-approx(mydata,xout=seq(along=mydata))$y

this would approximate the series and then you can do a ts.union

Also there was a very interesting paper that showed that the Fama-French
effect was not really a anamoly when you estimate using
 Robust regression instead of OLS. I can't remember the reference but it
was Doug Martin and someone else from UW ...
R has some nice facilities with rrcov to do the robust regressions!!

Best,
Krishna

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Dirk Eddelbuettel

On 6 April 2006 at 23:27, Krishna Kumar wrote:
| To add to Gabor's suggestion you could do the following to get an
| approximated series..
| so if mydata is a vector with "NA" 's then doing
|
|  >mydata<-approx(mydata,xout=seq(along=mydata))$y

Personally, I'd be careful about interpolating / imputing.  

The zoo class has fine features such as na.locf() and merge() which do most
common operations.  Owe will probably learn a lot just from studying the
documents supplied with the zoo package, and the R News articles.

| Also there was a very interesting paper that showed that the Fama-French
| effect was not really a anamoly when you estimate using Robust regression
| instead of OLS. I can't remember the reference but it

IIRC it was mentioned in the Scherer/Martin book on 'Modern Portofolio
Optimization'. Google'ing for 'doug martin fama french robust' leads to a few
pages at Insightful and UW.

Dirk

--
Hell, there are no rules here - we're trying to accomplish something.
                                                  -- Thomas A. Edison

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Gabor Grothendieck
In reply to this post by Krishna Kumar-2
I mentioned omitting the missing values via na.omit and the poster
below mentioned using linear approximation.  Note that
the zoo package actually has 4 missing value routines:

na.omit - omit missing values
na.approx - replace missing values with linear approximations
na.locf - replace missing values with the last occurrernce carried forward
na.contiguous - remove all but a contiguous stretch of non-missing values

> library(zoo)
> z <- zoo(c(1,NA,3,NA,5))
> na.omit(z)
1 3 5
1 3 5
> na.locf(z)
1 2 3 4 5
1 1 3 3 5
> na.approx(z)
1 2 3 4 5
1 2 3 4 5
> na.contiguous(z)
3
3


On 4/6/06, Krishna Kumar <[hidden email]> wrote:

> Gabor Grothendieck wrote:
>
> >On 4/6/06, Andrew West <[hidden email]> wrote:
> >
> >
> >>I haven't been able to figure
> >>out how to prevent the function from crashing when one
> >>of the companies in the list has a late start or
> >>missing data. Aligning multiple time series into a
> >>panel data dataframe is tough for non-programmers like
> >>me!
> >>
> >>
> >
> >If t1 and t2 are two ts class time series or two zoo series
> >then cbind(t1, t2) will create a multivariate series (2 columns)
> >In the case of zoo, merge(t1, t2) will also work.
> >
> >na.omit(cbind(t1, t2)) or na.omit(merge(t1,t2))
> >will eliminate rows that have any NAs in the case of zoo series.
> >
> >_______________________________________________
> >[hidden email] mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> >
> >
> >
>
> To add to Gabor's suggestion you could do the following to get an
> approximated series..
> so if mydata is a vector with "NA" 's then doing
>
>  >mydata<-approx(mydata,xout=seq(along=mydata))$y
>
> this would approximate the series and then you can do a ts.union
>
> Also there was a very interesting paper that showed that the Fama-French
> effect was not really a anamoly when you estimate using
>  Robust regression instead of OLS. I can't remember the reference but it
> was Doug Martin and someone else from UW ...
> R has some nice facilities with rrcov to do the robust regressions!!
>
> Best,
> Krishna
>
>
>
>

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Martin Maechler
In reply to this post by Krishna Kumar-2
>>>>> "Krishna" == Krishna Kumar <[hidden email]>
>>>>>     on Thu, 06 Apr 2006 23:27:06 -0400 writes:

  .........

    Krishna> Also there was a very interesting paper that showed
    Krishna> that the Fama-French effect was not really a
    Krishna> anamoly when you estimate using Robust regression
    Krishna> instead of OLS. I can't remember the reference but
    Krishna> it was Doug Martin and someone else from UW ...  R
    Krishna> has some nice facilities with rrcov to do the
    Krishna> robust regressions!!

Apropos  "Robust regression":

- Note that 'rrcov' (by Valentin Todorov) has recently been merged
  into the new package "robustbase" -- and the latest rrcov
  version will be merged again.
  The goal of "robustbase" is to provide ``basic robust
  statistics'' to R -- additionally to what's already in 'stats'
  and 'MASS' and trying to be closer to the "state-of-the-art".
  The latest version of robustbase, 0.1-5 has "hit" CRAN yesterday
  and should become available more generally shortly.

  Additionally to the fast ltsReg() {from 'rrcov' originally},
  "robustbase" now also contains  lmrob(), implementing a
  "fast MM" estimator (based on fast-S) from Matias
  Salibian-Barreras and Victor Yohai.

- Further note that there's also a `young' R-SIG-robust mailing
  list with quite a few "robustniks" subscribed -- some of who
  do not read other R-lists AFAIK.

Martin Maechler, ETH Zurich

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Andrew West
In reply to this post by Krishna Kumar-2
Incidentally, one of the functions I attached,
getrffBeta, uses the robust package WLE to calculate
3-factor coefficients. Even if one believes the return
differentials accruing to value-growth and
largecap-smallcap spreads are an anomaly erased
through robust statistics, that does not mean that one
could not include those factors to get a better
calculation of true market beta, does it? You could
take the 3-factor regression output, assign zero
premiums to size and value factors, and use the market
beta that has already controlled for size and value
effects, right?

For most of the companies I've looked at, the AIC on 3
factor models are a lot higher than CAPM models. You
can see it for yourself graphically by running the
getrffBeta function on various stocks. For example,
using CAPM, some of the dramatic swings in the beta of
low-tech stocks (falling dramatically during the
internet surge) may have been more due to value-growth
return trends than a true change in market beta. The 3
factor model thus seems to result in more stable
market beta estimates than the CAPM model.

Regards,
Andrew
--- Krishna Kumar <[hidden email]> wrote:

> Gabor Grothendieck wrote:
>
> >On 4/6/06, Andrew West <[hidden email]> wrote:
> >  
> >
> >>I haven't been able to figure
> >>out how to prevent the function from crashing when
> one
> >>of the companies in the list has a late start or
> >>missing data. Aligning multiple time series into a
> >>panel data dataframe is tough for non-programmers
> like
> >>me!
> >>    
> >>
> >
> >If t1 and t2 are two ts class time series or two
> zoo series
> >then cbind(t1, t2) will create a multivariate
> series (2 columns)
> >In the case of zoo, merge(t1, t2) will also work.
> >
> >na.omit(cbind(t1, t2)) or na.omit(merge(t1,t2))
> >will eliminate rows that have any NAs in the case
> of zoo series.
> >
> >_______________________________________________
> >[hidden email] mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> >
> >  
> >
>
> To add to Gabor's suggestion you could do the
> following to get an
> approximated series..
> so if mydata is a vector with "NA" 's then doing
>
>  >mydata<-approx(mydata,xout=seq(along=mydata))$y
>
> this would approximate the series and then you can
> do a ts.union
>
> Also there was a very interesting paper that showed
> that the Fama-French
> effect was not really a anamoly when you estimate
> using
>  Robust regression instead of OLS. I can't remember
> the reference but it
> was Doug Martin and someone else from UW ...
> R has some nice facilities with rrcov to do the
> robust regressions!!
>
> Best,
> Krishna
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: using yahoo and other data to calculate CAPM and FF betas

Gabor Grothendieck
In reply to this post by Gabor Grothendieck
ince I wrote the post below a new version of zoo came
out.  One addition is the na.trim function which adds a fifth NA
handling routine, na.trim, to the prior four illustrated in my last
post.  It trims leading and/or trailing NAs but leaves others
as is:

> library(zoo)
> zz <- zoo(c(NA,1,NA,3,NA,5,NA))
> na.trim(zz)
 2  3  4  5  6
 1 NA  3 NA  5

On 4/6/06, Gabor Grothendieck <[hidden email]> wrote:

> I mentioned omitting the missing values via na.omit and the poster
> below mentioned using linear approximation.  Note that
> the zoo package actually has 4 missing value routines:
>
> na.omit - omit missing values
> na.approx - replace missing values with linear approximations
> na.locf - replace missing values with the last occurrernce carried forward
> na.contiguous - remove all but a contiguous stretch of non-missing values
>
> > library(zoo)
> > z <- zoo(c(1,NA,3,NA,5))
> > na.omit(z)
> 1 3 5
> 1 3 5
> > na.locf(z)
> 1 2 3 4 5
> 1 1 3 3 5
> > na.approx(z)
> 1 2 3 4 5
> 1 2 3 4 5
> > na.contiguous(z)
> 3
> 3
>
>
> On 4/6/06, Krishna Kumar <[hidden email]> wrote:
> > Gabor Grothendieck wrote:
> >
> > >On 4/6/06, Andrew West <[hidden email]> wrote:
> > >
> > >
> > >>I haven't been able to figure
> > >>out how to prevent the function from crashing when one
> > >>of the companies in the list has a late start or
> > >>missing data. Aligning multiple time series into a
> > >>panel data dataframe is tough for non-programmers like
> > >>me!
> > >>
> > >>
> > >
> > >If t1 and t2 are two ts class time series or two zoo series
> > >then cbind(t1, t2) will create a multivariate series (2 columns)
> > >In the case of zoo, merge(t1, t2) will also work.
> > >
> > >na.omit(cbind(t1, t2)) or na.omit(merge(t1,t2))
> > >will eliminate rows that have any NAs in the case of zoo series.
> > >
> > >_______________________________________________
> > >[hidden email] mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > >
> > >
> > >
> >
> > To add to Gabor's suggestion you could do the following to get an
> > approximated series..
> > so if mydata is a vector with "NA" 's then doing
> >
> >  >mydata<-approx(mydata,xout=seq(along=mydata))$y
> >
> > this would approximate the series and then you can do a ts.union
> >
> > Also there was a very interesting paper that showed that the Fama-French
> > effect was not really a anamoly when you estimate using
> >  Robust regression instead of OLS. I can't remember the reference but it
> > was Doug Martin and someone else from UW ...
> > R has some nice facilities with rrcov to do the robust regressions!!
> >
> > Best,
> > Krishna
> >
> >
> >
> >
>

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance