Forcing results from lm into datframe

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Forcing results from lm into datframe

Small Sandy (NHS Greater Glasgow & Clyde)
Hi

I need some help getting results from multiple linear models into a dataframe.
Let me explain the problem.

I have a dataframe with ejection fraction results measured over a number of quartiles and grouped by base_study.
My dataframe (800 different base_studies) looks like

> afvtprelvefs
basestudy     quartile   ef        ef_std   entropy
CBP0908020  1           21.6    0.53        3.27
CBP0908020  2           32.5    0.61        3.27
CBP0908020  3           30.8    0.63        3.27
CBP0908020  4           33.6    0.37        3.27
CBP0908022  1           42.4    0.52        1.80
CBP0908021  1           29.4    0.70        2.63
CBP0908021  2           29.2    0.42        2.63
CBP0908021  3           29.7    0.89        2.63
CBP0908021  4           29.3    0.50        2.63
CBP0908022  2           45.7    1.30        1.80
...

What I want to do is apply a weighted linear fit to the results from each base study and get the gradient out of it. I then want to plot the gradient against the entropy (which is constant for each base study).

I can get apply a linear fit with

> fits <- by(afvtprelvefs, afvtprelvefs$basestudy, function (x) lm (ef ~ quartile, data=x, weights=1/ef_std))

but how do I get the results from that into a dataframe which I can use?

I thought I might get somewhere with
> sapply(fits, "[[", "coefficients")

But that doesn't give me the basestudy separately so that I can match up the results with the entropy results.

I am sure this must have been answered somewhere before but I have been unable to find a solution.
Many thanks for your help

Sandy Small
NHS Greater Glasgow and Clyde


********************************************************************************************************************

This message may contain confidential information. If yo...{{dropped:24}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

David Winsemius

On Oct 26, 2010, at 8:08 AM, Small Sandy (NHS Greater Glasgow & Clyde)  
wrote:

> Hi
>
> I need some help getting results from multiple linear models into a  
> dataframe.
> Let me explain the problem.
>
> I have a dataframe with ejection fraction results measured over a  
> number of quartiles and grouped by base_study.
> My dataframe (800 different base_studies) looks like
>
>> afvtprelvefs
> basestudy     quartile   ef        ef_std   entropy
> CBP0908020  1           21.6    0.53        3.27
> CBP0908020  2           32.5    0.61        3.27
> CBP0908020  3           30.8    0.63        3.27
> CBP0908020  4           33.6    0.37        3.27
> CBP0908022  1           42.4    0.52        1.80
> CBP0908021  1           29.4    0.70        2.63
> CBP0908021  2           29.2    0.42        2.63
> CBP0908021  3           29.7    0.89        2.63
> CBP0908021  4           29.3    0.50        2.63
> CBP0908022  2           45.7    1.30        1.80
> ...
>
> What I want to do is apply a weighted linear fit to the results from  
> each base study and get the gradient out of it. I then want to plot  
> the gradient against the entropy (which is constant for each base  
> study).
>
> I can get apply a linear fit with
>
>> fits <- by(afvtprelvefs, afvtprelvefs$basestudy, function (x) lm  
>> (ef ~ quartile, data=x, weights=1/ef_std))
>
> but how do I get the results from that into a dataframe which I can  
> use?
>
> I thought I might get somewhere with
>> sapply(fits, "[[", "coefficients")
>
> But that doesn't give me the basestudy separately so that I can  
> match up the results with the entropy results.

The by objects don't play nicely with as.data.frame so I went to a  
more "classical" way of runnning the lm call and I added a coef()  
wrapper to just get the coefficients:

 > splits <-split(afvtprelvefs, afvtprelvefs$basestudy)
 > lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,  
weights=1/ef_std)))
$CBP0908020
(Intercept)    quartile
   20.921397    3.385469

$CBP0908021
(Intercept)    quartile
29.31632071  0.01372604

$CBP0908022
(Intercept)    quartile
        39.1         3.3

 > fits <- lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,  
weights=1/ef_std)))
 > as.data.frame(fits)
             CBP0908020  CBP0908021 CBP0908022
(Intercept)  20.921397 29.31632071       39.1
quartile      3.385469  0.01372604        3.3


The split-lapply strategy is reasonably general. You may need to use  
t() if you were hoping for stufy to be by rows. In this case sapply  
would have obviated the need for the as.data.frame step at the cost of  
returning a matrix rather than a data.frame.
--
David

>
> I am sure this must have been answered somewhere before but I have  
> been unable to find a solution.
> Many thanks for your help
>
> Sandy Small
> NHS Greater Glasgow and Clyde
>
>
> ********************************************************************************************************************
>
> This message may contain confidential information. If yo...{{dropped:
> 24}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

Small Sandy (NHS Greater Glasgow & Clyde)
Thanks David
That's great

As a matter of interest, to get a data frame by studies why do you have to do

fitsdf <- as.data.frame(t(as.data.frame(fits)))

Why doesn't
fitsdf <- as.data.frame(t(fits))
work?

Sandy Small

________________________________________
From: David Winsemius [[hidden email]]
Sent: 26 October 2010 16:37
To: Small Sandy (NHS Greater Glasgow & Clyde)
Cc: [hidden email]
Subject: Re: [R] Forcing results from lm into datframe

On Oct 26, 2010, at 8:08 AM, Small Sandy (NHS Greater Glasgow & Clyde)
wrote:

> Hi
>
> I need some help getting results from multiple linear models into a
> dataframe.
> Let me explain the problem.
>
> I have a dataframe with ejection fraction results measured over a
> number of quartiles and grouped by base_study.
> My dataframe (800 different base_studies) looks like
>
>> afvtprelvefs
> basestudy     quartile   ef        ef_std   entropy
> CBP0908020  1           21.6    0.53        3.27
> CBP0908020  2           32.5    0.61        3.27
> CBP0908020  3           30.8    0.63        3.27
> CBP0908020  4           33.6    0.37        3.27
> CBP0908022  1           42.4    0.52        1.80
> CBP0908021  1           29.4    0.70        2.63
> CBP0908021  2           29.2    0.42        2.63
> CBP0908021  3           29.7    0.89        2.63
> CBP0908021  4           29.3    0.50        2.63
> CBP0908022  2           45.7    1.30        1.80
> ...
>
> What I want to do is apply a weighted linear fit to the results from
> each base study and get the gradient out of it. I then want to plot
> the gradient against the entropy (which is constant for each base
> study).
>
> I can get apply a linear fit with
>
>> fits <- by(afvtprelvefs, afvtprelvefs$basestudy, function (x) lm
>> (ef ~ quartile, data=x, weights=1/ef_std))
>
> but how do I get the results from that into a dataframe which I can
> use?
>
> I thought I might get somewhere with
>> sapply(fits, "[[", "coefficients")
>
> But that doesn't give me the basestudy separately so that I can
> match up the results with the entropy results.

The by objects don't play nicely with as.data.frame so I went to a
more "classical" way of runnning the lm call and I added a coef()
wrapper to just get the coefficients:

 > splits <-split(afvtprelvefs, afvtprelvefs$basestudy)
 > lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,
weights=1/ef_std)))
$CBP0908020
(Intercept)    quartile
   20.921397    3.385469

$CBP0908021
(Intercept)    quartile
29.31632071  0.01372604

$CBP0908022
(Intercept)    quartile
        39.1         3.3

 > fits <- lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,
weights=1/ef_std)))
 > as.data.frame(fits)
             CBP0908020  CBP0908021 CBP0908022
(Intercept)  20.921397 29.31632071       39.1
quartile      3.385469  0.01372604        3.3


The split-lapply strategy is reasonably general. You may need to use
t() if you were hoping for stufy to be by rows. In this case sapply
would have obviated the need for the as.data.frame step at the cost of
returning a matrix rather than a data.frame.
--
David

>
> I am sure this must have been answered somewhere before but I have
> been unable to find a solution.
> Many thanks for your help
>
> Sandy Small
> NHS Greater Glasgow and Clyde
>
>
> ********************************************************************************************************************
>
> This message may contain confidential information. If yo...{{dropped:
> 24}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



********************************************************************************************************************

This message may contain confidential information. If yo...{{dropped:21}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

David Winsemius

On Oct 26, 2010, at 10:22 AM, Small Sandy (NHS Greater Glasgow &  
Clyde) wrote:

> Thanks David
> That's great
>
> As a matter of interest, to get a data frame by studies why do you  
> have to do
>
> fitsdf <- as.data.frame(t(as.data.frame(fits)))

The apply family of functions often return results rotated from what  
new users expect. I am not really sure why that is so in this case,  
but if you wanted to trace it out, you could look at the code, but I  
just looked at as.data.frame.list  (since there are over 20  
as.data.frame methods) and the answer is not immediately apparent to  
me. I thought maybe I sould see a cbind() call in there, and I suppose  
this section .... as.call(c(expression(data.frame), x ... may have  
that effect

--
David.

>
> Why doesn't
> fitsdf <- as.data.frame(t(fits))
> work?
>
> Sandy Small
>
> ________________________________________
> From: David Winsemius [[hidden email]]
> Sent: 26 October 2010 16:37
> To: Small Sandy (NHS Greater Glasgow & Clyde)
> Cc: [hidden email]
> Subject: Re: [R] Forcing results from lm into datframe
>
> On Oct 26, 2010, at 8:08 AM, Small Sandy (NHS Greater Glasgow & Clyde)
> wrote:
>
>> Hi
>>
>> I need some help getting results from multiple linear models into a
>> dataframe.
>> Let me explain the problem.
>>
>> I have a dataframe with ejection fraction results measured over a
>> number of quartiles and grouped by base_study.
>> My dataframe (800 different base_studies) looks like
>>
>>> afvtprelvefs
>> basestudy     quartile   ef        ef_std   entropy
>> CBP0908020  1           21.6    0.53        3.27
>> CBP0908020  2           32.5    0.61        3.27
>> CBP0908020  3           30.8    0.63        3.27
>> CBP0908020  4           33.6    0.37        3.27
>> CBP0908022  1           42.4    0.52        1.80
>> CBP0908021  1           29.4    0.70        2.63
>> CBP0908021  2           29.2    0.42        2.63
>> CBP0908021  3           29.7    0.89        2.63
>> CBP0908021  4           29.3    0.50        2.63
>> CBP0908022  2           45.7    1.30        1.80
>> ...
>>
>> What I want to do is apply a weighted linear fit to the results from
>> each base study and get the gradient out of it. I then want to plot
>> the gradient against the entropy (which is constant for each base
>> study).
>>
>> I can get apply a linear fit with
>>
>>> fits <- by(afvtprelvefs, afvtprelvefs$basestudy, function (x) lm
>>> (ef ~ quartile, data=x, weights=1/ef_std))
>>
>> but how do I get the results from that into a dataframe which I can
>> use?
>>
>> I thought I might get somewhere with
>>> sapply(fits, "[[", "coefficients")
>>
>> But that doesn't give me the basestudy separately so that I can
>> match up the results with the entropy results.
>
> The by objects don't play nicely with as.data.frame so I went to a
> more "classical" way of runnning the lm call and I added a coef()
> wrapper to just get the coefficients:
>
>> splits <-split(afvtprelvefs, afvtprelvefs$basestudy)
>> lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,
> weights=1/ef_std)))
> $CBP0908020
> (Intercept)    quartile
>   20.921397    3.385469
>
> $CBP0908021
> (Intercept)    quartile
> 29.31632071  0.01372604
>
> $CBP0908022
> (Intercept)    quartile
>        39.1         3.3
>
>> fits <- lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,
> weights=1/ef_std)))
>> as.data.frame(fits)
>             CBP0908020  CBP0908021 CBP0908022
> (Intercept)  20.921397 29.31632071       39.1
> quartile      3.385469  0.01372604        3.3
>
>
> The split-lapply strategy is reasonably general. You may need to use
> t() if you were hoping for stufy to be by rows. In this case sapply
> would have obviated the need for the as.data.frame step at the cost of
> returning a matrix rather than a data.frame.
> --
> David
>
>>
>> I am sure this must have been answered somewhere before but I have
>> been unable to find a solution.
>> Many thanks for your help
>>
>> Sandy Small
>> NHS Greater Glasgow and Clyde
>>
>>
>> ********************************************************************************************************************
>>
>> This message may contain confidential information. If yo...{{dropped:
>> 24}}
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ********************************************************************************************************************
>
> This message may contain confidential information. If you are not  
> the intended recipient please inform the
> sender that you have received the message in error before deleting it.
> Please do not disclose, copy or distribute information in this e-
> mail or take any action in reliance on its contents:
> to do so is strictly prohibited and may be unlawful.
>
> Thank you for your co-operation.
>
> NHSmail is the secure email and directory service available for all  
> NHS staff in England and Scotland
> NHSmail is approved for exchanging patient data and other sensitive  
> information with NHSmail and GSI recipients
> NHSmail provides an email address for your career in the NHS and can  
> be accessed anywhere
> For more information and to find out how you can switch, visit www.connectingforhealth.nhs.uk/nhsmail
>
> ********************************************************************************************************************
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

djmuseR
In reply to this post by Small Sandy (NHS Greater Glasgow & Clyde)
Hi:

When it comes to split, apply, combine, think plyr.

library(plyr)
ldply(split(afvtprelvefs, afvtprelvefs$basestudy),
         function(x) coef(lm (ef ~ quartile, data=x, weights=1/ef_std)))

         .id (Intercept)   quartile
1 CBP0908020    20.92140 3.38546887
2 CBP0908021    29.31632 0.01372604
3 CBP0908022    39.10000 3.30000000

ldply() takes a list as input and outputs a data frame. Like lapply, the
first argument is the list and the second argument is the apply function.

HTH,
Dennis


On Tue, Oct 26, 2010 at 10:22 AM, Small Sandy (NHS Greater Glasgow & Clyde)
<[hidden email]> wrote:

> Thanks David
> That's great
>
> As a matter of interest, to get a data frame by studies why do you have to
> do
>
> fitsdf <- as.data.frame(t(as.data.frame(fits)))
>
> Why doesn't
> fitsdf <- as.data.frame(t(fits))
> work?
>
> Sandy Small
>
> ________________________________________
> From: David Winsemius [[hidden email]]
> Sent: 26 October 2010 16:37
> To: Small Sandy (NHS Greater Glasgow & Clyde)
> Cc: [hidden email]
> Subject: Re: [R] Forcing results from lm into datframe
>
> On Oct 26, 2010, at 8:08 AM, Small Sandy (NHS Greater Glasgow & Clyde)
> wrote:
>
> > Hi
> >
> > I need some help getting results from multiple linear models into a
> > dataframe.
> > Let me explain the problem.
> >
> > I have a dataframe with ejection fraction results measured over a
> > number of quartiles and grouped by base_study.
> > My dataframe (800 different base_studies) looks like
> >
> >> afvtprelvefs
> > basestudy     quartile   ef        ef_std   entropy
> > CBP0908020  1           21.6    0.53        3.27
> > CBP0908020  2           32.5    0.61        3.27
> > CBP0908020  3           30.8    0.63        3.27
> > CBP0908020  4           33.6    0.37        3.27
> > CBP0908022  1           42.4    0.52        1.80
> > CBP0908021  1           29.4    0.70        2.63
> > CBP0908021  2           29.2    0.42        2.63
> > CBP0908021  3           29.7    0.89        2.63
> > CBP0908021  4           29.3    0.50        2.63
> > CBP0908022  2           45.7    1.30        1.80
> > ...
> >
> > What I want to do is apply a weighted linear fit to the results from
> > each base study and get the gradient out of it. I then want to plot
> > the gradient against the entropy (which is constant for each base
> > study).
> >
> > I can get apply a linear fit with
> >
> >> fits <- by(afvtprelvefs, afvtprelvefs$basestudy, function (x) lm
> >> (ef ~ quartile, data=x, weights=1/ef_std))
> >
> > but how do I get the results from that into a dataframe which I can
> > use?
> >
> > I thought I might get somewhere with
> >> sapply(fits, "[[", "coefficients")
> >
> > But that doesn't give me the basestudy separately so that I can
> > match up the results with the entropy results.
>
> The by objects don't play nicely with as.data.frame so I went to a
> more "classical" way of runnning the lm call and I added a coef()
> wrapper to just get the coefficients:
>
>  > splits <-split(afvtprelvefs, afvtprelvefs$basestudy)
>  > lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,
> weights=1/ef_std)))
> $CBP0908020
> (Intercept)    quartile
>   20.921397    3.385469
>
> $CBP0908021
> (Intercept)    quartile
> 29.31632071  0.01372604
>
> $CBP0908022
> (Intercept)    quartile
>        39.1         3.3
>
>  > fits <- lapply(splits, function (x) coef(lm (ef ~ quartile, data=x,
> weights=1/ef_std)))
>  > as.data.frame(fits)
>             CBP0908020  CBP0908021 CBP0908022
> (Intercept)  20.921397 29.31632071       39.1
> quartile      3.385469  0.01372604        3.3
>
>
> The split-lapply strategy is reasonably general. You may need to use
> t() if you were hoping for stufy to be by rows. In this case sapply
> would have obviated the need for the as.data.frame step at the cost of
> returning a matrix rather than a data.frame.
> --
> David
>
> >
> > I am sure this must have been answered somewhere before but I have
> > been unable to find a solution.
> > Many thanks for your help
> >
> > Sandy Small
> > NHS Greater Glasgow and Clyde
> >
> >
> >
> ********************************************************************************************************************
> >
> > This message may contain confidential information. If yo...{{dropped:
> > 24}}
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ********************************************************************************************************************
>
> This message may contain confidential information. If ...{{dropped:13}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

Henrique Dallazuanna
In reply to this post by Small Sandy (NHS Greater Glasgow & Clyde)
Try this:

sapply(by(x, x$basestudy, lm, formula = ef ~ quartile), coef)

On Tue, Oct 26, 2010 at 1:08 PM, Small Sandy (NHS Greater Glasgow & Clyde) <
[hidden email]> wrote:

> Hi
>
> I need some help getting results from multiple linear models into a
> dataframe.
> Let me explain the problem.
>
> I have a dataframe with ejection fraction results measured over a number of
> quartiles and grouped by base_study.
> My dataframe (800 different base_studies) looks like
>
> > afvtprelvefs
> basestudy     quartile   ef        ef_std   entropy
> CBP0908020  1           21.6    0.53        3.27
> CBP0908020  2           32.5    0.61        3.27
> CBP0908020  3           30.8    0.63        3.27
> CBP0908020  4           33.6    0.37        3.27
> CBP0908022  1           42.4    0.52        1.80
> CBP0908021  1           29.4    0.70        2.63
> CBP0908021  2           29.2    0.42        2.63
> CBP0908021  3           29.7    0.89        2.63
> CBP0908021  4           29.3    0.50        2.63
> CBP0908022  2           45.7    1.30        1.80
> ...
>
> What I want to do is apply a weighted linear fit to the results from each
> base study and get the gradient out of it. I then want to plot the gradient
> against the entropy (which is constant for each base study).
>
> I can get apply a linear fit with
>
> > fits <- by(afvtprelvefs, afvtprelvefs$basestudy, function (x) lm (ef ~
> quartile, data=x, weights=1/ef_std))
>
> but how do I get the results from that into a dataframe which I can use?
>
> I thought I might get somewhere with
> > sapply(fits, "[[", "coefficients")
>
> But that doesn't give me the basestudy separately so that I can match up
> the results with the entropy results.
>
> I am sure this must have been answered somewhere before but I have been
> unable to find a solution.
> Many thanks for your help
>
> Sandy Small
> NHS Greater Glasgow and Clyde
>
>
>
> ********************************************************************************************************************
>
> This message may contain confidential information. If ...{{dropped:20}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

Hadley Wickham-2
In reply to this post by djmuseR
On Tue, Oct 26, 2010 at 11:55 AM, Dennis Murphy <[hidden email]> wrote:
> Hi:
>
> When it comes to split, apply, combine, think plyr.
>
> library(plyr)
> ldply(split(afvtprelvefs, afvtprelvefs$basestudy),
>         function(x) coef(lm (ef ~ quartile, data=x, weights=1/ef_std)))

Or do it in two steps:

models <- dlply(aftvprelvef, "basestudy", function(x)
  lm(ef ~ quartile, data=x, weights=1/ef_std)
coefs <- ldply(models, coefs)

That way you can easily pull out other info

rsq <- function(x) summary(x)$r.squared
ldply(models, rsq)

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Forcing results from lm into datframe

beagahje
In reply to this post by djmuseR
CONTENTS DELETED
The author has deleted this message.