Defining Variables from a Matrix for 10-Fold Cross Validation

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Defining Variables from a Matrix for 10-Fold Cross Validation

matthew campbell
Good afternoon,

I am trying to run a 10-fold CV, using a matrix as my data set.
Essentially, I want "y" to be the first column of the matrix, and my "x" to
be all remaining columns (2-257). I've posted some of the code I used
below, and the data set (called "zip.train") is in the "ElemStatLearn"
package. The error message is highlighted in red, and the corresponding
section of code is bolded. (I am not concerned with the warning message,
just the error message).

The issue I am experiencing is the error message below the code: I haven't
come across that specific message before, and am not exactly sure how to
interpret its meaning. What exactly is this error message trying to tell
me?  Any suggestions or insights are appreciated!

Thank you all,

Matthew Campbell


> library (ElemStatLearn)
> library(kknn)
> data(zip.train)
> train=zip.train[which(zip.train[,1] %in% c(2,3)),]
> test=zip.test[which(zip.test[,1] %in% c(2,3)),]
> nfold = 10
> infold = sample(rep(1:10, length.out = (x)))
Warning message:
In rep(1:10, length.out = (x)) :
  first element used of 'length.out' argument
>
*> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])*
>
> K = 20
> errorMatrix = matrix(NA, K, 10)
>
> for (l in nfold)
+ {
+   for (k in 1:20)
+   {
+     knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test =
mydata[infold == l, ], k = k)
+     errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold ==
l])^2)
+   }
+ }
Error in model.frame.default(formula, data = train) :
  variable lengths differ (found for 'x')

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Defining Variables from a Matrix for 10-Fold Cross Validation

David Winsemius

> On Oct 9, 2018, at 3:04 PM, matthew campbell <[hidden email]> wrote:
>
> Good afternoon,
>
> I am trying to run a 10-fold CV, using a matrix as my data set.
> Essentially, I want "y" to be the first column of the matrix, and my "x" to
> be all remaining columns (2-257). I've posted some of the code I used
> below, and the data set (called "zip.train") is in the "ElemStatLearn"
> package. The error message is highlighted in red, and the corresponding
> section of code is bolded. (I am not concerned with the warning message,
> just the error message).
>
> The issue I am experiencing is the error message below the code: I haven't
> come across that specific message before, and am not exactly sure how to
> interpret its meaning. What exactly is this error message trying to tell
> me?  Any suggestions or insights are appreciated!
>
> Thank you all,
>
> Matthew Campbell
>
>
>> library (ElemStatLearn)
>> library(kknn)
>> data(zip.train)
>> train=zip.train[which(zip.train[,1] %in% c(2,3)),]
>> test=zip.test[which(zip.test[,1] %in% c(2,3)),]
>> nfold = 10
>> infold = sample(rep(1:10, length.out = (x)))

I don't see a definition for x.

> Warning message:
> In rep(1:10, length.out = (x)) :
>  first element used of 'length.out' argument

But apparently it las a length greater than 1 and your are getting a sample whose length is specified by the first element of x.


>>
> *> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])*
>>
>> K = 20
>> errorMatrix = matrix(NA, K, 10)
>>
>> for (l in nfold)
> + {
> +   for (k in 1:20)
> +   {
> +     knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test =
> mydata[infold == l, ], k = k)
> +     errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold ==
> l])^2)
> +   }
> + }
> Error in model.frame.default(formula, data = train) :
>  variable lengths differ (found for 'x')

So the warning above is probably a great clue to the source of this error.

Morale of the tale: Always read the warnings, even if your code proceeds.

>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

"The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts." - Bertrand Russell

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Defining Variables from a Matrix for 10-Fold Cross Validation

Zach Simpson
In reply to this post by matthew campbell
Hey Matthew,

In addition to what's been mentioned, you may want to look at the
'caret' package, as it provides a nice system for whatever flavor of
cross-validation you're after *and* has a built-in method for `kknn`:

http://topepo.github.io/caret/available-models.html

Hope this helps,
Zach Simpson

On October 9, 2018 15:34:15 -0700, David Winsemius
<[hidden email]> wrote:

> Message: 26
> Date: Tue, 9 Oct 2018 15:34:15 -0700
> From: David Winsemius <[hidden email]>
> To: matthew campbell <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R]  Defining Variables from a Matrix for 10-Fold Cross
>         Validation
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="us-ascii"
>
>
> > On Oct 9, 2018, at 3:04 PM, matthew campbell <[hidden email]> wrote:
> >
> > Good afternoon,
> >
> > I am trying to run a 10-fold CV, using a matrix as my data set.
> > Essentially, I want "y" to be the first column of the matrix, and my "x" to
> > be all remaining columns (2-257). I've posted some of the code I used
> > below, and the data set (called "zip.train") is in the "ElemStatLearn"
> > package. The error message is highlighted in red, and the corresponding
> > section of code is bolded. (I am not concerned with the warning message,
> > just the error message).
> >
> > The issue I am experiencing is the error message below the code: I haven't
> > come across that specific message before, and am not exactly sure how to
> > interpret its meaning. What exactly is this error message trying to tell
> > me?  Any suggestions or insights are appreciated!
> >
> > Thank you all,
> >
> > Matthew Campbell
> >
> >
> >> library (ElemStatLearn)
> >> library(kknn)
> >> data(zip.train)
> >> train=zip.train[which(zip.train[,1] %in% c(2,3)),]
> >> test=zip.test[which(zip.test[,1] %in% c(2,3)),]
> >> nfold = 10
> >> infold = sample(rep(1:10, length.out = (x)))
>
> I don't see a definition for x.
>
> > Warning message:
> > In rep(1:10, length.out = (x)) :
> >  first element used of 'length.out' argument
>
> But apparently it las a length greater than 1 and your are getting a sample whose length is specified by the first element of x.
>
>
> >>
> > *> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])*
> >>
> >> K = 20
> >> errorMatrix = matrix(NA, K, 10)
> >>
> >> for (l in nfold)
> > + {
> > +   for (k in 1:20)
> > +   {
> > +     knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test =
> > mydata[infold == l, ], k = k)
> > +     errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold ==
> > l])^2)
> > +   }
> > + }
> > Error in model.frame.default(formula, data = train) :
> >  variable lengths differ (found for 'x')
>
> So the warning above is probably a great clue to the source of this error.
>
> Morale of the tale: Always read the warnings, even if your code proceeds.
>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> "The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts." - Bertrand Russell

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.