|
Hi all,
Could you please help me? I am trying to understand why this line works: lm1x = lm(y~X-1, tmp) Here it seems that I was combining the design matrix and the data frame... And X below is not a single column, in fact, it's a bunch of columns in matrix form... I don't understand why this line works... Is it just luck, i.e. if we change the data-set and/or formulas to something else, this will potentially fail? (that's something I would like to catch and avoid...) Thank you! ------------------------------- The data is located at: http://www.ling.uni-potsdam.de/~vasishth/book.html In the section: downloadable errata, code, datasets ERRATA Corrected Pages VasishthBroebook.R beauty.txt mathachieve.txt mathachschool.txt ------------------------------ MathAchieve <- read.table("mathachieve.txt") colnames(MathAchieve) <- c("School", "Minority", "Sex", "SES", "MathAch", "MEANSES") head(MathAchieve) MathAchSchool <- read.table("mathachschool.txt") colnames(MathAchSchool) <- c("School", "Size", "Sector", "PRACAD", "DISCLIM", "HIMINTY", "MEANSES") MathScores <- merge(MathAchieve, MathAchSchool, by = "School") lm1 = lm(MathAch ~ SES + factor(Sector) , MathScores) X=model.matrix(MathAch ~ SES+factor(Sector) , MathScores) y=MathScores$MathAch tmp=MathScores tmp$y=y tmp$X=X lm1x = lm(y~X-1, tmp) plot(fitted(lm1), fitted(lm1x)) max(abs(fitted(lm1) - fitted(lm1x))) [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On May 12, 2012, at 04:40 , Michael wrote: > Hi all, > > Could you please help me? > > I am trying to understand why this line works: > > lm1x = lm(y~X-1, tmp) > > Here it seems that I was combining the design matrix and the data frame... > > And X below is not a single column, in fact, it's a bunch of columns in > matrix form... > > I don't understand why this line works... It works because you can have a matrix on the right hand side of a model formula and it will be interpreted as a set of columns... If X is constructed via model.matrix it usually contains an all-1 column, which is why you need to remove the intercept with "-1". A data frame can contain matrices, you just need to be careful that some functions, notably data.frame(), which will split a matrix into its constituent columns, unless protected with I(). Notice the difference between these two examples: > d <- data.frame(a=1:3,m=matrix(4:9,,2)) > d$m NULL > names(d) [1] "a" "m.1" "m.2" > d <- data.frame(a=1:3,m=I(matrix(4:9,,2))) > d$m [,1] [,2] [1,] 4 7 [2,] 5 8 [3,] 6 9 > names(d) [1] "a" "m" (d <- data.frame(a=1:3) ; d$m <- matrix(4:9,,2) is like the 2nd version) Your example has the weakness that you do "tmp$X <- X" so that you have X both in the data frame tmp and in the global workspace, and it is really not clear that the one in the data frame is the one used. I am pretty sure that this is in fact the case, but for your peace of mind, you should try renaming one of them. > > Is it just luck, i.e. if we change the data-set and/or formulas to > something else, this will potentially fail? > (that's something I would like to catch and avoid...) > > Thank you! > > ------------------------------- > > The data is located at: > http://www.ling.uni-potsdam.de/~vasishth/book.html > In the section: > downloadable errata, code, datasets > ERRATA > Corrected Pages > VasishthBroebook.R > beauty.txt > mathachieve.txt > mathachschool.txt > > ------------------------------ > > MathAchieve <- read.table("mathachieve.txt") > > colnames(MathAchieve) <- c("School", "Minority", "Sex", "SES", "MathAch", > "MEANSES") > > head(MathAchieve) > > > MathAchSchool <- read.table("mathachschool.txt") > colnames(MathAchSchool) <- c("School", "Size", "Sector", "PRACAD", > "DISCLIM", "HIMINTY", "MEANSES") > MathScores <- merge(MathAchieve, MathAchSchool, by = "School") > > > lm1 = lm(MathAch ~ SES + factor(Sector) , MathScores) > > X=model.matrix(MathAch ~ SES+factor(Sector) , MathScores) > y=MathScores$MathAch > > tmp=MathScores > tmp$y=y > tmp$X=X > > lm1x = lm(y~X-1, tmp) > > plot(fitted(lm1), fitted(lm1x)) > > max(abs(fitted(lm1) - fitted(lm1x))) > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by LosemindL
>I am trying to understand why this line works:
> > lm1x = lm(y~X-1, tmp) Well, I would not normally define a data frame element as a matrix myself (though I might well define a list element as one). But specifying a matrix as the terms part of an lm is documented in lm's details: "If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix" So _something_ will happen. Whether the something is useful depends on the intent. > Here it seems that I was combining the design matrix and the data frame... Did you inspect tmp after adding the design matrix? Was it an odd looking data frame or a list? What seems to have been done is that the design matrix has been added to a list. I wouldn't normally do that if tmp is a data frame, and r would not do so unless the lengths all matched. But a list should be ok. And lm takes a list or environment as its data argument, so a list of things will work even if they are different types. In other words tmp is just a ragbag of things, each of which lm understands. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thank you!
But the line you cited was about "response" being a matrix, which is not our case. And also I have checked: Any more thoughts? Thank you! > names(tmp) [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES.y" [13] "Intercept" "y" "X" > names(MathScores) [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES.y" > class(tmp) [1] "data.frame" > head(tmp) School Minority Sex SES MathAch MEANSES.x Size Sector PRACAD 1 1224 No Female -1.52800000 5.876 -0.428 842 Public 0.35 2 1224 No Female -0.83155757 19.708 -0.428 842 Public 0.35 3 1224 No Male -0.91452283 20.349 -0.428 842 Public 0.35 4 1224 No Male -1.33600000 8.781 -0.428 842 Public 0.35 5 1224 No Male -0.35329874 17.898 -0.428 842 Public 0.35 6 1224 No Male 0.05388877 4.583 -0.428 842 Public 0.35 DISCLIM HIMINTY MEANSES.y Intercept y X.(Intercept) X.SES 1 1.597 0 -0.428 1.000000 5.87600 1.00000000 -1.52800000 2 1.597 0 -0.428 1.414214 27.87132 1.41421356 -0.83155757 3 1.597 0 -0.428 1.732051 35.24550 1.73205081 -0.91452283 4 1.597 0 -0.428 2.000000 17.56200 2.00000000 -1.33600000 5 1.597 0 -0.428 2.236068 40.02114 2.23606798 -0.35329874 6 1.597 0 -0.428 2.449490 11.22601 2.44948974 0.05388877 X.factor(Sector)Public 1 1.00000000 2 1.41421356 3 1.73205081 4 2.00000000 5 2.23606798 6 2.44948974 On Sat, May 12, 2012 at 6:47 AM, S Ellison <[hidden email]> wrote: > >I am trying to understand why this line works: > > > > lm1x = lm(y~X-1, tmp) > > Well, I would not normally define a data frame element as a matrix myself > (though I might well define a list element as one). But specifying a > matrix as the terms part of an lm is documented in lm's details: > "If response is a matrix a linear model is fitted separately by > least-squares to each column of the matrix" > > So _something_ will happen. > > Whether the something is useful depends on the intent. > > > Here it seems that I was combining the design matrix and the data > frame... > Did you inspect tmp after adding the design matrix? Was it an odd looking > data frame or a list? > What seems to have been done is that the design matrix has been added to a > list. I wouldn't normally do that if tmp is a data frame, and r would not > do so unless the lengths all matched. But a list should be ok. And lm > takes a list or environment as its data argument, so a list of things will > work even if they are different types. In other words tmp is just a ragbag > of things, each of which lm understands. > ******************************************************************* > This email and any attachments are confidential. Any u...{{dropped:11}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
It's very interesting that the names of tmp has only 14 items, and the
dim(tmp) shows 14 columns too, but actually the number of columns in tmp should be 16... This is very strange... > dim(X) [1] 7185 3 > dim(MathScores) [1] 7185 12 > dim(tmp) [1] 7185 14 > names(tmp) [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES.y" [13] "y" "X" On Sun, May 13, 2012 at 3:30 PM, Michael <[hidden email]> wrote: > Thank you! > > But the line you cited was about "response" being a matrix, which is not > our case. > > And also I have checked: > > Any more thoughts? > > Thank you! > > > > names(tmp) > > [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" > > [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES.y" > > [13] "Intercept" "y" "X" > > > > names(MathScores) > > [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" > > [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES.y" > > > > > class(tmp) > > [1] "data.frame" > > > > head(tmp) > > School Minority Sex SES MathAch MEANSES.x Size Sector PRACAD > > 1 1224 No Female -1.52800000 5.876 -0.428 842 Public 0.35 > > 2 1224 No Female -0.83155757 19.708 -0.428 842 Public 0.35 > > 3 1224 No Male -0.91452283 20.349 -0.428 842 Public 0.35 > > 4 1224 No Male -1.33600000 8.781 -0.428 842 Public 0.35 > > 5 1224 No Male -0.35329874 17.898 -0.428 842 Public 0.35 > > 6 1224 No Male 0.05388877 4.583 -0.428 842 Public 0.35 > > DISCLIM HIMINTY MEANSES.y Intercept y X.(Intercept) X.SES > > 1 1.597 0 -0.428 1.000000 5.87600 1.00000000 -1.52800000 > > 2 1.597 0 -0.428 1.414214 27.87132 1.41421356 -0.83155757 > > 3 1.597 0 -0.428 1.732051 35.24550 1.73205081 -0.91452283 > > 4 1.597 0 -0.428 2.000000 17.56200 2.00000000 -1.33600000 > > 5 1.597 0 -0.428 2.236068 40.02114 2.23606798 -0.35329874 > > 6 1.597 0 -0.428 2.449490 11.22601 2.44948974 0.05388877 > > X.factor(Sector)Public > > 1 1.00000000 > > 2 1.41421356 > > 3 1.73205081 > > 4 2.00000000 > > 5 2.23606798 > > 6 2.44948974 > > > On Sat, May 12, 2012 at 6:47 AM, S Ellison <[hidden email]>wrote: > >> >I am trying to understand why this line works: >> > >> > lm1x = lm(y~X-1, tmp) >> >> Well, I would not normally define a data frame element as a matrix myself >> (though I might well define a list element as one). But specifying a >> matrix as the terms part of an lm is documented in lm's details: >> "If response is a matrix a linear model is fitted separately by >> least-squares to each column of the matrix" >> >> So _something_ will happen. >> >> Whether the something is useful depends on the intent. >> >> > Here it seems that I was combining the design matrix and the data >> frame... >> Did you inspect tmp after adding the design matrix? Was it an odd looking >> data frame or a list? >> What seems to have been done is that the design matrix has been added to >> a list. I wouldn't normally do that if tmp is a data frame, and r would not >> do so unless the lengths all matched. But a list should be ok. And lm >> takes a list or environment as its data argument, so a list of things will >> work even if they are different types. In other words tmp is just a ragbag >> of things, each of which lm understands. >> ******************************************************************* >> This email and any attachments are confidential. Any use, copying or >> disclosure other than by the intended recipient is unauthorised. If >> you have received this message in error, please notify the sender >> immediately via +44(0)20 8943 7000 or notify [hidden email] >> and delete this message and any copies from your computer and network. >> LGC Limited. Registered in England 2991879. >> Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK > > > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
> But the line you cited was about "response" being a matrix, which is not our case.
Yes, you're right; I picked the wrong thing to cite. The only documentation I found about lm accepting a matrix in the predictors is a one-line statement in "Introduction to R" which says "term_i is either a vector or matrix expression, or 1, a factor, or a formula expression consisting of factors, vectors or matrices connected by formula operators. " Not the most informative documentation. But Peter Dalgaard is a most authoritative source! >And also I have checked: > >Any more thoughts? Data frames are odd things; a column need not contain only a vector if the number of rows is OK. I am half surprised that including a matrix in one works. But the gods of R are powerful and their magic is strong. Here, names(tmp) is showing that the data frame has one element called X (in effect, the whole matrix is regarded as one element of the data frame), but on display the magic has expanded X to show all the columns of X. This is the main reason I generally keep to simple things in data frames; complicated things make it less easy to predict behaviour. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks!
Do you think if the correctness of the such results could be generalized to other future cases? On Sun, May 13, 2012 at 7:10 PM, S Ellison <[hidden email]> wrote: > > But the line you cited was about "response" being a matrix, which is not > our case. > Yes, you're right; I picked the wrong thing to cite. > The only documentation I found about lm accepting a matrix in the > predictors is a one-line statement in "Introduction to R" which says "term_i > is either > > a vector or matrix expression, or 1, > a factor, or > a formula expression consisting of factors, vectors or matrices > connected by formula operators. " > > Not the most informative documentation. But Peter Dalgaard is a most > authoritative source! > > >And also I have checked: > > > >Any more thoughts? > > Data frames are odd things; a column need not contain only a vector if the > number of rows is OK. I am half surprised that including a matrix in one > works. But the gods of R are powerful and their magic is strong. Here, > names(tmp) is showing that the data frame has one element called X (in > effect, the whole matrix is regarded as one element of the data frame), but > on display the magic has expanded X to show all the columns of X. > > This is the main reason I generally keep to simple things in data frames; > complicated things make it less easy to predict behaviour. > > > > ******************************************************************* > This email and any attachments are confidential. Any u...{{dropped:13}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On May 14, 2012, at 02:24 , Luna wrote: > Thanks! > > Do you think if the correctness of the such results could be generalized to > other future cases? If correctly generalized, yes.... (Apologies for being slightly facetious; the point is that the properties you build on are part of the software design for model formulas and model matrices. They are not fortuitous buglets, so they are not going to go away unless the actual design is changed.) > > > > > On Sun, May 13, 2012 at 7:10 PM, S Ellison <[hidden email]> wrote: > >>> But the line you cited was about "response" being a matrix, which is not >> our case. >> Yes, you're right; I picked the wrong thing to cite. >> The only documentation I found about lm accepting a matrix in the >> predictors is a one-line statement in "Introduction to R" which says "term_i >> is either >> >> a vector or matrix expression, or 1, >> a factor, or >> a formula expression consisting of factors, vectors or matrices >> connected by formula operators. " >> >> Not the most informative documentation. But Peter Dalgaard is a most >> authoritative source! >> >>> And also I have checked: >>> >>> Any more thoughts? >> >> Data frames are odd things; a column need not contain only a vector if the >> number of rows is OK. I am half surprised that including a matrix in one >> works. But the gods of R are powerful and their magic is strong. Here, >> names(tmp) is showing that the data frame has one element called X (in >> effect, the whole matrix is regarded as one element of the data frame), but >> on display the magic has expanded X to show all the columns of X. >> >> This is the main reason I generally keep to simple things in data frames; >> complicated things make it less easy to predict behaviour. >> >> >> >> ******************************************************************* >> This email and any attachments are confidential. Any u...{{dropped:13}} > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Oh, so we can always combine model matrices and formulas in regression in R?
Thanks! On Mon, May 14, 2012 at 2:41 AM, peter dalgaard <[hidden email]> wrote: > > On May 14, 2012, at 02:24 , Luna wrote: > > > Thanks! > > > > Do you think if the correctness of the such results could be generalized > to > > other future cases? > > > If correctly generalized, yes.... > > (Apologies for being slightly facetious; the point is that the properties > you build on are part of the software design for model formulas and model > matrices. They are not fortuitous buglets, so they are not going to go away > unless the actual design is changed.) > > > > > > > > > > > On Sun, May 13, 2012 at 7:10 PM, S Ellison <[hidden email]> > wrote: > > > >>> But the line you cited was about "response" being a matrix, which is > not > >> our case. > >> Yes, you're right; I picked the wrong thing to cite. > >> The only documentation I found about lm accepting a matrix in the > >> predictors is a one-line statement in "Introduction to R" which says > "term_i > >> is either > >> > >> a vector or matrix expression, or 1, > >> a factor, or > >> a formula expression consisting of factors, vectors or matrices > >> connected by formula operators. " > >> > >> Not the most informative documentation. But Peter Dalgaard is a most > >> authoritative source! > >> > >>> And also I have checked: > >>> > >>> Any more thoughts? > >> > >> Data frames are odd things; a column need not contain only a vector if > the > >> number of rows is OK. I am half surprised that including a matrix in one > >> works. But the gods of R are powerful and their magic is strong. Here, > >> names(tmp) is showing that the data frame has one element called X (in > >> effect, the whole matrix is regarded as one element of the data frame), > but > >> on display the magic has expanded X to show all the columns of X. > >> > >> This is the main reason I generally keep to simple things in data > frames; > >> complicated things make it less easy to predict behaviour. > >> > >> > >> > >> ******************************************************************* > >> This email and any attachments are confidential. Any u...{{dropped:13}} > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: [hidden email] Priv: [hidden email] > > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
