Trying to understand the magic of lm

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Trying to understand the magic of lm

Sorkin, John
Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following,


myfunction <- function(y,x,dataframe){

  fit0 <- lm(y~x,data=dataframe)
  print (summary(fit0))
}

# Run the function using dep and ind as dependent and independent variables.
mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7))
myfunction(dep,ind)
# Run the function using outcome and predictor as dependent and independent variables.
newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
myfunction(outcome,predictor)





John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand the magic of lm

Ben Tupper-2
Hi,

I'm not sure if this is what you are after, but instead of defining arguments for elements of the formula why not simply pass your desired formula to your function?

Cheers,
Ben



myfunction <- function(frmla,dataframe){
 fit0 <- lm(frmla,data=dataframe)
 print (summary(fit0))
}

# Run the function using dep and ind as dependent and independent variables.
mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7))
myfunction(ind ~ dep, mydata)

# Call:
# lm(formula = frmla, data = dataframe)

# Residuals:
   # 1    2    3    4    5
 # 0.2 -0.3  0.2 -0.3  0.2

# Coefficients:
            # Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  -0.7000     0.3317  -2.111 0.125298    
# dep           1.5000     0.1000  15.000 0.000643 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 0.3162 on 3 degrees of freedom
# Multiple R-squared:  0.9868, Adjusted R-squared:  0.9825
# F-statistic:   225 on 1 and 3 DF,  p-value: 0.0006431


# Run the function using outcome and predictor as dependent and independent variables.
newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
myfunction(predictor ~ outcome, newdata)

# # Call:
# lm(formula = frmla, data = dataframe)

# Residuals:
   # 1    2    3    4    5
 # 0.2 -0.3  0.2 -0.3  0.2

# Coefficients:
            # Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  -0.7000     0.3317  -2.111 0.125298    
# outcome       1.5000     0.1000  15.000 0.000643 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 0.3162 on 3 degrees of freedom
# Multiple R-squared:  0.9868, Adjusted R-squared:  0.9825
# F-statistic:   225 on 1 and 3 DF,  p-value: 0.0006431




> On May 8, 2019, at 9:22 PM, Sorkin, John <[hidden email]> wrote:
>
> Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following,
>
>
> myfunction <- function(y,x,dataframe){
>
>  fit0 <- lm(y~x,data=dataframe)
>  print (summary(fit0))
> }
>
> # Run the function using dep and ind as dependent and independent variables.
> mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7))
> myfunction(dep,ind)
> # Run the function using outcome and predictor as dependent and independent variables.
> newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
> myfunction(outcome,predictor)
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand the magic of lm

Rui Barradas
In reply to this post by Sorkin, John
Hello,

There is a "standard" deparse/substitute trick that gets the names of
the variables passed to a function. There are more sophisticated ways
but maybe that is what you are looking for.


myfunction <- function(y, x, dataframe){
   y <- deparse(substitute(y))
   x <- deparse(substitute(x))
   fmla <- as.formula(paste(y, '~', x))
   fit0 <- lm(fmla, data = dataframe)
   summary(fit0)
}

# Run the function using dep and ind as dependent and independent variables.
mydata <- data.frame(dep = c(1,2,3,4,5),ind=c(1,2,4,5,7))
myfunction(dep, ind, mydata)

# Run the function using outcome and predictor as dependent and
independent variables.
newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
myfunction(outcome, predictor, newdata)


Note: your function has an argument 'dataframe' that you didn't use in
any of the two calls.


Hope this helps,

Rui Barradas

Às 02:22 de 09/05/19, Sorkin, John escreveu:

> Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following,
>
>
> myfunction <- function(y,x,dataframe){
>
>    fit0 <- lm(y~x,data=dataframe)
>    print (summary(fit0))
> }
>
> # Run the function using dep and ind as dependent and independent variables.
> mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7))
> myfunction(dep,ind)
> # Run the function using outcome and predictor as dependent and independent variables.
> newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
> myfunction(outcome,predictor)
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand the magic of lm

R help mailing list-2
In reply to this post by Sorkin, John
Hello John,

Others have commented on the first half of your question, but the
second half of your question looks very much like R's built-in
predict() functions:

>?predict
>?predict.lm

Best Regards,

Bill.

W. Michels, Ph.D.



On Wed, May 8, 2019 at 6:23 PM Sorkin, John <[hidden email]> wrote:

>
> Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following,
>
>
> myfunction <- function(y,x,dataframe){
>
>   fit0 <- lm(y~x,data=dataframe)
>   print (summary(fit0))
> }
>
> # Run the function using dep and ind as dependent and independent variables.
> mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7))
> myfunction(dep,ind)
> # Run the function using outcome and predictor as dependent and independent variables.
> newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
> myfunction(outcome,predictor)
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand the magic of lm

Bert Gunter-2
I don't think previous responses have addressed the question, which appears
to be: "How does R know to look in the "data" object for the variable names
in the formula?" And, of course, I could be wrong -- in which case ignore
all the following.

My answer to that question is: it's quite complicated. I think you have to
know about calls, function closures, evaluation environments, and the
details of model.frame.lm -- and perhaps more. The following **might** be a
start:

> dat <- data.frame(x = 1:10, y = rnorm(10))
>
> ## substitute() is used to return the unevaluated expression for the call
> mc <- match.call(lm, call = substitute(lm(y~x,data = dat)))
> class(mc)
[1] "call"
> as.list(mc)
[[1]]
lm

$formula
y ~ x

$data
dat

Cheers,
Bert Gunter

On Thu, May 9, 2019 at 10:01 AM William Michels via R-help <
[hidden email]> wrote:

> Hello John,
>
> Others have commented on the first half of your question, but the
> second half of your question looks very much like R's built-in
> predict() functions:
>
> >?predict
> >?predict.lm
>
> Best Regards,
>
> Bill.
>
> W. Michels, Ph.D.
>
>
>
> On Wed, May 8, 2019 at 6:23 PM Sorkin, John <[hidden email]>
> wrote:
> >
> > Can someone send me something I can read about passing parameters so I
> can understand how lm manages to have a dataframe passed to it, and use
> columns from the dataframe to set up a regression. I have looked at the
> code for lm and don't understand what I am reading. What I want to do is
> something like the following,
> >
> >
> > myfunction <- function(y,x,dataframe){
> >
> >   fit0 <- lm(y~x,data=dataframe)
> >   print (summary(fit0))
> > }
> >
> > # Run the function using dep and ind as dependent and independent
> variables.
> > mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7))
> > myfunction(dep,ind)
> > # Run the function using outcome and predictor as dependent and
> independent variables.
> > newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7))
> > myfunction(outcome,predictor)
> >
> >
> >
> >
> >
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.