MSE Cross-validation with factor interactions terms MARS regression

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

MSE Cross-validation with factor interactions terms MARS regression

R help mailing list-2

Dear R-experts,
I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ?

Here below my reproducible example.

#######

install.packages("ISLR")

library(ISLR)

install.packages("earth")

library(earth)

a<-as.factor(Wage$education)

# Create a list to store the results

lst<-list()

# This statement does the repetitions (looping)

for(i in 1 :200) {

n=dim(Wage)[1]

p=0.667

sam=sample(1 :n,floor(p*n),replace=FALSE)

Training =Wage [sam,]

Testing = Wage [-sam,]

mars5<-earth(wage~age+education+year+age*a, data=Wage)

ypred=predict(mars5,newdata=Testing)

y=Testing$wage

y=Wage[-sam,]$wage

MSE = mean(y-ypred)^2

MSE

lst[i]<-MSE

}

mean(unlist(lst))

summary(mars5)

#######

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: MSE Cross-validation with factor interactions terms MARS regression

Bert Gunter-2
I did no analysis of your code or thought process, but noticed that you had
the following two successive lines in your code:


y=Testing$wage

y=Wage[-sam,]$wage

This obviously makes no sense, so maybe you should fix this first and then
proceed.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <[hidden email]>
wrote:

>
> Dear R-experts,
> I am having trouble while doing crossvalidation with a MARS regression
> including an interaction term between a factor variable (education) and 1
> continuous variable (age). How could I solve my problem ?
>
> Here below my reproducible example.
>
> #######
>
> install.packages("ISLR")
>
> library(ISLR)
>
> install.packages("earth")
>
> library(earth)
>
> a<-as.factor(Wage$education)
>
> # Create a list to store the results
>
> lst<-list()
>
> # This statement does the repetitions (looping)
>
> for(i in 1 :200) {
>
> n=dim(Wage)[1]
>
> p=0.667
>
> sam=sample(1 :n,floor(p*n),replace=FALSE)
>
> Training =Wage [sam,]
>
> Testing = Wage [-sam,]
>
> mars5<-earth(wage~age+education+year+age*a, data=Wage)
>
> ypred=predict(mars5,newdata=Testing)
>
> y=Testing$wage
>
> y=Wage[-sam,]$wage
>
> MSE = mean(y-ypred)^2
>
> MSE
>
> lst[i]<-MSE
>
> }
>
> mean(unlist(lst))
>
> summary(mars5)
>
> #######
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: MSE Cross-validation with factor interactions terms MARS regression

R help mailing list-2
Hi Bert,

Many thanks, I have fixed it but it still don't work... .
Best,






Le lundi 29 octobre 2018 à 22:07:26 UTC+1, Bert Gunter <[hidden email]> a écrit :





I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code:


y=Testing$wage

y=Wage[-sam,]$wage

This obviously makes no sense, so maybe you should fix this first and then proceed.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <[hidden email]> wrote:

>
> Dear R-experts,
> I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ?
>
> Here below my reproducible example.
>
> #######
>
> install.packages("ISLR")
>
> library(ISLR)
>
> install.packages("earth")
>
> library(earth)
>
> a<-as.factor(Wage$education)
>
> # Create a list to store the results
>
> lst<-list()
>
> # This statement does the repetitions (looping)
>
> for(i in 1 :200) {
>
> n=dim(Wage)[1]
>
> p=0.667
>
> sam=sample(1 :n,floor(p*n),replace=FALSE)
>
> Training =Wage [sam,]
>
> Testing = Wage [-sam,]
>
> mars5<-earth(wage~age+education+year+age*a, data=Wage)
>
> ypred=predict(mars5,newdata=Testing)
>
> y=Testing$wage
>
> y=Wage[-sam,]$wage
>
> MSE = mean(y-ypred)^2
>
> MSE
>
> lst[i]<-MSE
>
> }
>
> mean(unlist(lst))
>
> summary(mars5)
>
> #######
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: MSE Cross-validation with factor interactions terms MARS regression

Peter Dalgaard-2
The two lines did the same thing, so little wonder...

More likely, the culprit is that a is assigned in the global environment, and then used in a prediction on a subset.

Also,

- you are defining Training, but as far as I can tell, you're not using it. Not likely to be an issue in itself, but wouldn't you want to fit on the Training set and evaluate on the Testing?

- your model de facto contains both education as a numeric predictor and as.factor(education) as well as the interaction term age:as.factor(education). Does that make sense modelling-wise??

-pd

> On 29 Oct 2018, at 23:50 , varin sacha via R-help <[hidden email]> wrote:
>
> Hi Bert,
>
> Many thanks, I have fixed it but it still don't work... .
> Best,
>
>
>
>
>
>
> Le lundi 29 octobre 2018 à 22:07:26 UTC+1, Bert Gunter <[hidden email]> a écrit :
>
>
>
>
>
> I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code:
>
>
> y=Testing$wage
>
> y=Wage[-sam,]$wage
>
> This obviously makes no sense, so maybe you should fix this first and then proceed.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <[hidden email]> wrote:
>>
>> Dear R-experts,
>> I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ?
>>
>> Here below my reproducible example.
>>
>> #######
>>
>> install.packages("ISLR")
>>
>> library(ISLR)
>>
>> install.packages("earth")
>>
>> library(earth)
>>
>> a<-as.factor(Wage$education)
>>
>> # Create a list to store the results
>>
>> lst<-list()
>>
>> # This statement does the repetitions (looping)
>>
>> for(i in 1 :200) {
>>
>> n=dim(Wage)[1]
>>
>> p=0.667
>>
>> sam=sample(1 :n,floor(p*n),replace=FALSE)
>>
>> Training =Wage [sam,]
>>
>> Testing = Wage [-sam,]
>>
>> mars5<-earth(wage~age+education+year+age*a, data=Wage)
>>
>> ypred=predict(mars5,newdata=Testing)
>>
>> y=Testing$wage
>>
>> y=Wage[-sam,]$wage
>>
>> MSE = mean(y-ypred)^2
>>
>> MSE
>>
>> lst[i]<-MSE
>>
>> }
>>
>> mean(unlist(lst))
>>
>> summary(mars5)
>>
>> #######
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: MSE Cross-validation with factor interactions terms MARS regression

R help mailing list-2
Dear Prof. Dalgaard,

I really thank you lots for your comments and responses. It perfectly works !
Many thanks.





Le mardi 30 octobre 2018 à 00:30:11 UTC+1, peter dalgaard <[hidden email]> a écrit :





The two lines did the same thing, so little wonder...

More likely, the culprit is that a is assigned in the global environment, and then used in a prediction on a subset.

Also,

- you are defining Training, but as far as I can tell, you're not using it. Not likely to be an issue in itself, but wouldn't you want to fit on the Training set and evaluate on the Testing?

- your model de facto contains both education as a numeric predictor and as.factor(education) as well as the interaction term age:as.factor(education). Does that make sense modelling-wise??

-pd

> On 29 Oct 2018, at 23:50 , varin sacha via R-help <[hidden email]> wrote:
>
> Hi Bert,
>
> Many thanks, I have fixed it but it still don't work... .
> Best,
>
>
>
>
>
>
> Le lundi 29 octobre 2018 à 22:07:26 UTC+1, Bert Gunter <[hidden email]> a écrit :
>
>
>
>
>
> I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code:
>
>
> y=Testing$wage
>
> y=Wage[-sam,]$wage
>
> This obviously makes no sense, so maybe you should fix this first and then proceed.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <[hidden email]> wrote:
>>
>> Dear R-experts,
>> I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ?
>>
>> Here below my reproducible example.
>>
>> #######
>>
>> install.packages("ISLR")
>>
>> library(ISLR)
>>
>> install.packages("earth")
>>
>> library(earth)
>>
>> a<-as.factor(Wage$education)
>>
>> # Create a list to store the results
>>
>> lst<-list()
>>
>> # This statement does the repetitions (looping)
>>
>> for(i in 1 :200) {
>>
>> n=dim(Wage)[1]
>>
>> p=0.667
>>
>> sam=sample(1 :n,floor(p*n),replace=FALSE)
>>
>> Training =Wage [sam,]
>>
>> Testing = Wage [-sam,]
>>
>> mars5<-earth(wage~age+education+year+age*a, data=Wage)
>>
>> ypred=predict(mars5,newdata=Testing)
>>
>> y=Testing$wage
>>
>> y=Wage[-sam,]$wage
>>
>> MSE = mean(y-ypred)^2
>>
>> MSE
>>
>> lst[i]<-MSE
>>
>> }
>>
>> mean(unlist(lst))
>>
>> summary(mars5)
>>
>> #######
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.