|
Hi, I understand from help pages that in order to use a data set with svm, I have to divide it into two files: one for the dataset without the class label and the other file contains the class label as the following code:- library(e1071) x<- read.delim("mydataset_except-class-label.txt") y<- read.delim("mydataset_class-labell.txt") model <- svm(x, y, cross=5) summary(model) but I couldnt understand how I add formula parameter to it? Does formula contain the class label too?? and what I have to do to use testing set when I dont use cross parameter. Cheers, Amy _________________________________________________________________ -is-here [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi,
On Tue, Jan 5, 2010 at 7:01 PM, Amy Hessen <[hidden email]> wrote: > > Hi, > > I understand from help pages that in order to use a data set with svm, I have to divide it into two files: one for the dataset without the class label and the other file contains the class label as the following code:- This isn't exactly correct ... look at the examples in the ?svm documentation a bit closer. > library(e1071) > x<- read.delim("mydataset_except-class-label.txt") > y<- read.delim("mydataset_class-labell.txt") > model <- svm(x, y, cross=5) > summary(model) > > but I couldn’t understand how I add “formula” parameter to it? Does formula contain the class label too?? Using the first example in ?svm attach(iris) model <- svm(Species ~ ., data = iris) The first argument in the function call is the formula. The "Species" column is the class label. `iris` is a data.frame ... you can see that it has the label *in it*, look at the output of "head(iris) > and what I have to do to use testing set when I don’t use “cross” parameter. Just follow the example in ?svm some more, you'll see training a model and then testing it on data. The example happens to be the same data the model trained on. To use new data, you'll just need a data matrix/data.frame with as many columns as your original data, and as many rows as you have observations. The first step separates the labels from the data (you can do the same in your data -- you don't have to have separate test and train files that are different -- just remove the labels from it in R): attach(iris) x <- subset(iris, select = -Species) y <- Species model <- svm(x, y) # test with train data pred <- predict(model, x) Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Steve, Thank you very much for your reply. Im trying to do something systematic/general in the program so that I can try different datasets without changing much in the program (without knowing the name of the class label that has different name from dataset to another ) Could you please tell me your opinion about this code:- library(e1071) mydata<-read.delim("the_whole_dataset.txt") class_label <- names(mydata)[1] # Ill always put the class label in the first column. myformula <- formula(paste(class_label,"~ .")) x <- subset(mydata, select = - mydata[, 1]) mymodel<-(svm(myformula, x, cross=3)) summary(model) ################ Do I have to the same steps with testingset? i.e. the testing set must not contain the label too? But contains the same structure as the training set? Is it correct? Cheers, Amy > Date: Tue, 5 Jan 2010 21:15:17 -0500 > Subject: Re: [R] svm > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > Hi, > > On Tue, Jan 5, 2010 at 7:01 PM, Amy Hessen <[hidden email]> wrote: > > > > Hi, > > > > I understand from help pages that in order to use a data set with svm, I have to divide it into two files: one for the dataset without the class label and the other file contains the class label as the following code:- > > This isn't exactly correct ... look at the examples in the ?svm > documentation a bit closer. > > > library(e1071) > > x<- read.delim("mydataset_except-class-label.txt") > > y<- read.delim("mydataset_class-labell.txt") > > model <- svm(x, y, cross=5) > > summary(model) > > > > but I couldnt understand how I add formula parameter to it? Does formula contain the class label too?? > > Using the first example in ?svm > > attach(iris) > model <- svm(Species ~ ., data = iris) > > The first argument in the function call is the formula. The "Species" > column is the class label. > > `iris` is a data.frame ... you can see that it has the label *in it*, > look at the output of "head(iris) > > > and what I have to do to use testing set when I dont use cross parameter. > > Just follow the example in ?svm some more, you'll see training a model > and then testing it on data. The example happens to be the same data > the model trained on. To use new data, you'll just need a data > matrix/data.frame with as many columns as your original data, and as > many rows as you have observations. > > The first step separates the labels from the data (you can do the same > in your data -- you don't have to have separate test and train files > that are different -- just remove the labels from it in R): > > attach(iris) > x <- subset(iris, select = -Species) > y <- Species > model <- svm(x, y) > > # test with train data > pred <- predict(model, x) > > Hope that helps, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact messenger [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Amy,
On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <[hidden email]> wrote: > Hi Steve, > > Thank you very much for your reply. > > I’m trying to do something systematic/general in the program so that I can > try different datasets without changing much in the program (without knowing > the name of the class label that has different name from dataset to > another…) > > Could you please tell me your opinion about this code:- > > library(e1071) > > mydata<-read.delim("the_whole_dataset.txt") > > class_label <- names(mydata)[1] # I’ll always put the > class label in the first column. > > myformula <- formula(paste(class_label,"~ .")) > > x <- subset(mydata, select = - mydata[, 1]) > > mymodel<-(svm(myformula, x, cross=3)) > > summary(model) > > ################ Since you're not doing anything funky with the formula, a preference of mine is to just skip this way of calling SVM and go "straight" to the svm(x,y,...) method: R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) R> train.x <- mydata[,-1] R> train.y <- mydata[,1] R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") ## or R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") As an aside, I also like to be explicit about the type="" parameter to tell what I want my SVM to do (regression or classification). If it's not specified, the SVM picks which one to do based on whether or not your y vector is a vector of factors (does classification), or not (does regression) > Do I have to the same steps with testingset? i.e. the testing set must not > contain the label too? But contains the same structure as the training set? > Is it correct? I guess you'll want to report your accuracy/MSE/something on your model for your testing set? Just load the data in the same way then use `predict` to calculate the metric your after. You'll have to have the labels for your data to do that, though, eg: testdata <- as.matrix(read.delim('testdata.txt')) test.x <- testdata[,-1] test.y <- testdata[,1] preds <- predict(mymodel, test.x) Let's assume you're doing classification, so let's report the accuracy: acc <- sum(preds == test.y) / length(test.y) Does that help? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Steve, Thank you very much for your reply. Your code is more readable and obvious than mine Could you please help me in these questions?: 1) Formula is an alternative to y parameter in SVM. is it correct? 2) I forgot to remove the class label from the dataset besides I gave the program the class label in formula parameter but the program works! Could you please clarify this point to me? Cheers, Amy > Date: Wed, 6 Jan 2010 18:44:13 -0500 > Subject: Re: [R] svm > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > Hi Amy, > > On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <[hidden email]> wrote: > > Hi Steve, > > > > Thank you very much for your reply. > > > > Im trying to do something systematic/general in the program so that I can > > try different datasets without changing much in the program (without knowing > > the name of the class label that has different name from dataset to > > another ) > > > > Could you please tell me your opinion about this code:- > > > > library(e1071) > > > > mydata<-read.delim("the_whole_dataset.txt") > > > > class_label <- names(mydata)[1] # Ill always put the > > class label in the first column. > > > > myformula <- formula(paste(class_label,"~ .")) > > > > x <- subset(mydata, select = - mydata[, 1]) > > > > mymodel<-(svm(myformula, x, cross=3)) > > > > summary(model) > > > > ################ > > Since you're not doing anything funky with the formula, a preference > of mine is to just skip this way of calling SVM and go "straight" to > the svm(x,y,...) method: > > R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) > R> train.x <- mydata[,-1] > R> train.y <- mydata[,1] > > R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") > ## or > R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") > > As an aside, I also like to be explicit about the type="" parameter to > tell what I want my SVM to do (regression or classification). If it's > not specified, the SVM picks which one to do based on whether or not > your y vector is a vector of factors (does classification), or not > (does regression) > > > Do I have to the same steps with testingset? i.e. the testing set must not > > contain the label too? But contains the same structure as the training set? > > Is it correct? > > I guess you'll want to report your accuracy/MSE/something on your > model for your testing set? Just load the data in the same way then > use `predict` to calculate the metric your after. You'll have to have > the labels for your data to do that, though, eg: > > testdata <- as.matrix(read.delim('testdata.txt')) > test.x <- testdata[,-1] > test.y <- testdata[,1] > preds <- predict(mymodel, test.x) > > Let's assume you're doing classification, so let's report the accuracy: > > acc <- sum(preds == test.y) / length(test.y) > > Does that help? > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact [[elided Hotmail spam]] [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi,
On Fri, Jan 8, 2010 at 11:57 AM, Amy Hessen <[hidden email]> wrote: > Hi Steve, > > Thank you very much for your reply. Your code is more readable and obvious than mine… No Problem. > Could you please help me in these questions?: > > 1) “Formula” is an alternative to “y” parameter in SVM. is it correct? No, that's not correct. There are two svm functions, one that takes a "formula" object (svm.formula), and one that takes an x matrix, and a y vector (svm.default). The svm.formula function is called when the first argument in your "svm(..)" call is a formula object. This function simply parses the formula and manipulates your data object into an x matrix and y vector, then calls the svm.default function with those params ... I usually prefer to just skip the formula and provide the x and y objects directly. Load the e1071 library and look at the source code: R> library(e1071) R> e1071:::svm.formula You'll see what I mean. > 2) I forgot to remove the “class label” from the dataset besides I gave the > program the class label in formula parameter but the program works! Could > you please clarify this point to me? The author of the e1071 package did you a favor. The predict.svm function checks to see if your svm object was built using the formula interface .. if so, it looks for you label column in the data you are trying to predict on and ignores it. Look at the function's source code (eg, type e1071:::predict.svm at the R prompt), and look for the call to the delete.response function ... you can also look at the help in ?delete.response. -steve >> Date: Wed, 6 Jan 2010 18:44:13 -0500 >> Subject: Re: [R] svm >> From: [hidden email] >> To: [hidden email] >> CC: [hidden email] >> >> Hi Amy, >> >> On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <[hidden email]> wrote: >> > Hi Steve, >> > >> > Thank you very much for your reply. >> > >> > I’m trying to do something systematic/general in the program so that I >> > can >> > try different datasets without changing much in the program (without >> > knowing >> > the name of the class label that has different name from dataset to >> > another…) >> > >> > Could you please tell me your opinion about this code:- >> > >> > library(e1071) >> > >> > mydata<-read.delim("the_whole_dataset.txt") >> > >> > class_label <- names(mydata)[1] # I’ll always put >> > the >> > class label in the first column. >> > >> > myformula <- formula(paste(class_label,"~ .")) >> > >> > x <- subset(mydata, select = - mydata[, 1]) >> > >> > mymodel<-(svm(myformula, x, cross=3)) >> > >> > summary(model) >> > >> > ################ >> >> Since you're not doing anything funky with the formula, a preference >> of mine is to just skip this way of calling SVM and go "straight" to >> the svm(x,y,...) method: >> >> R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) >> R> train.x <- mydata[,-1] >> R> train.y <- mydata[,1] >> >> R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") >> ## or >> R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") >> >> As an aside, I also like to be explicit about the type="" parameter to >> tell what I want my SVM to do (regression or classification). If it's >> not specified, the SVM picks which one to do based on whether or not >> your y vector is a vector of factors (does classification), or not >> (does regression) >> >> > Do I have to the same steps with testingset? i.e. the testing set must >> > not >> > contain the label too? But contains the same structure as the training >> > set? >> > Is it correct? >> >> I guess you'll want to report your accuracy/MSE/something on your >> model for your testing set? Just load the data in the same way then >> use `predict` to calculate the metric your after. You'll have to have >> the labels for your data to do that, though, eg: >> >> testdata <- as.matrix(read.delim('testdata.txt')) >> test.x <- testdata[,-1] >> test.y <- testdata[,1] >> preds <- predict(mymodel, test.x) >> >> Let's assume you're doing classification, so let's report the accuracy: >> >> acc <- sum(preds == test.y) / length(test.y) >> >> Does that help? >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > ________________________________ > Sell your old one fast! Time for a new car? -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Steve, Thank you so much for your reply. I really needed to know how SVM works without removing the class label while receiving it in the formula parameter. It does not if I remove the class label. Cheers, Amy > Date: Sat, 9 Jan 2010 15:48:49 -0500 > Subject: Re: [R] svm > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > Hi, > > On Fri, Jan 8, 2010 at 11:57 AM, Amy Hessen <[hidden email]> wrote: > > Hi Steve, > > > > Thank you very much for your reply. Your code is more readable and obvious than mine > > No Problem. > > > Could you please help me in these questions?: > > > > 1) Formula is an alternative to y parameter in SVM. is it correct? > > No, that's not correct. > > There are two svm functions, one that takes a "formula" object > (svm.formula), and one that takes an x matrix, and a y vector > (svm.default). The svm.formula function is called when the first > argument in your "svm(..)" call is a formula object. This function > simply parses the formula and manipulates your data object into an x > matrix and y vector, then calls the svm.default function with those > params ... I usually prefer to just skip the formula and provide the x > and y objects directly. > > Load the e1071 library and look at the source code: > > R> library(e1071) > R> e1071:::svm.formula > > You'll see what I mean. > > > 2) I forgot to remove the class label from the dataset besides I gave the > > program the class label in formula parameter but the program works! Could > > you please clarify this point to me? > > The author of the e1071 package did you a favor. The predict.svm > function checks to see if your svm object was built using the formula > interface .. if so, it looks for you label column in the data you are > trying to predict on and ignores it. > > Look at the function's source code (eg, type e1071:::predict.svm at > the R prompt), and look for the call to the delete.response function > ... you can also look at the help in ?delete.response. > > -steve > > > >> Date: Wed, 6 Jan 2010 18:44:13 -0500 > >> Subject: Re: [R] svm > >> From: [hidden email] > >> To: [hidden email] > >> CC: [hidden email] > >> > >> Hi Amy, > >> > >> On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <[hidden email]> wrote: > >> > Hi Steve, > >> > > >> > Thank you very much for your reply. > >> > > >> > Im trying to do something systematic/general in the program so that I > >> > can > >> > try different datasets without changing much in the program (without > >> > knowing > >> > the name of the class label that has different name from dataset to > >> > another ) > >> > > >> > Could you please tell me your opinion about this code:- > >> > > >> > library(e1071) > >> > > >> > mydata<-read.delim("the_whole_dataset.txt") > >> > > >> > class_label <- names(mydata)[1] # Ill always put > >> > the > >> > class label in the first column. > >> > > >> > myformula <- formula(paste(class_label,"~ .")) > >> > > >> > x <- subset(mydata, select = - mydata[, 1]) > >> > > >> > mymodel<-(svm(myformula, x, cross=3)) > >> > > >> > summary(model) > >> > > >> > ################ > >> > >> Since you're not doing anything funky with the formula, a preference > >> of mine is to just skip this way of calling SVM and go "straight" to > >> the svm(x,y,...) method: > >> > >> R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) > >> R> train.x <- mydata[,-1] > >> R> train.y <- mydata[,1] > >> > >> R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") > >> ## or > >> R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") > >> > >> As an aside, I also like to be explicit about the type="" parameter to > >> tell what I want my SVM to do (regression or classification). If it's > >> not specified, the SVM picks which one to do based on whether or not > >> your y vector is a vector of factors (does classification), or not > >> (does regression) > >> > >> > Do I have to the same steps with testingset? i.e. the testing set must > >> > not > >> > contain the label too? But contains the same structure as the training > >> > set? > >> > Is it correct? > >> > >> I guess you'll want to report your accuracy/MSE/something on your > >> model for your testing set? Just load the data in the same way then > >> use `predict` to calculate the metric your after. You'll have to have > >> the labels for your data to do that, though, eg: > >> > >> testdata <- as.matrix(read.delim('testdata.txt')) > >> test.x <- testdata[,-1] > >> test.y <- testdata[,1] > >> preds <- predict(mymodel, test.x) > >> > >> Let's assume you're doing classification, so let's report the accuracy: > >> > >> acc <- sum(preds == test.y) / length(test.y) > >> > >> Does that help? > >> -steve > >> > >> -- > >> Steve Lianoglou > >> Graduate Student: Computational Systems Biology > >> | Memorial Sloan-Kettering Cancer Center > >> | Weill Medical College of Cornell University > >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > ________________________________ > > Sell your old one fast! Time for a new car? > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact Shopping Trolley Mechanic If It Exists, You'll Find it on SEEK [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Steve Lianoglou-6
Hi, Could you please tell me whether there are feature selection algorithms in R or not such as genetic algorithms? If so, could you please tell me in which package? Cheers, Amy _________________________________________________________________ View photos of singles in your area! Browse profiles for FREE [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Sun, 24 Jan 2010, Amy Hessen wrote:
> > > > Hi, > > Could you please tell me whether there are feature selection algorithms > in R or not such as genetic algorithms? If so, could you please tell me > in which package? I can! By following the _posting guide_, I see in the 'Do Your Homework' section that I should try something like: RSiteSearch("feature selection") and RSiteSearch("genetic algorithm") And each seems to produce lots of good candidates! HTH, Chuck p.s. Don't forget to check the Tasks Views on CRAN > > Cheers, > Amy > _________________________________________________________________ > View photos of singles in your area! Browse profiles for FREE > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:[hidden email] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Amy Hessen
You can check
http://cran.r-project.org/web/views/MachineLearning.html Carlos J. Gil Bellosta http://www.datanalytics.com Amy Hessen wrote: > > > Hi, > > Could you please tell me whether there are feature selection algorithms in R or not such as genetic algorithms? If so, could you please tell me in which package? > > Cheers, > Amy ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Steve Lianoglou-6
Hi Steve, Could you please help me in this point?: I use SVM of R and Im trying some datasets from UCI but when I compare the results of my program( that does not do anything more than calling SVM) with the RMSE of SVM in any other paper, I found a big gap between them. For example, this is the rmse of svm of my program for the dataset bodyfat: 2.64561 And this is the RMSE of a paper 0.0204. Could you please tell me how I can reduce this gap in the performance of SVM? Cheers, Amy > Date: Sat, 9 Jan 2010 15:48:49 -0500 > Subject: Re: [R] svm > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > Hi, > > On Fri, Jan 8, 2010 at 11:57 AM, Amy Hessen <[hidden email]> wrote: > > Hi Steve, > > > > Thank you very much for your reply. Your code is more readable and obvious than mine > > No Problem. > > > Could you please help me in these questions?: > > > > 1) Formula is an alternative to y parameter in SVM. is it correct? > > No, that's not correct. > > There are two svm functions, one that takes a "formula" object > (svm.formula), and one that takes an x matrix, and a y vector > (svm.default). The svm.formula function is called when the first > argument in your "svm(..)" call is a formula object. This function > simply parses the formula and manipulates your data object into an x > matrix and y vector, then calls the svm.default function with those > params ... I usually prefer to just skip the formula and provide the x > and y objects directly. > > Load the e1071 library and look at the source code: > > R> library(e1071) > R> e1071:::svm.formula > > You'll see what I mean. > > > 2) I forgot to remove the class label from the dataset besides I gave the > > program the class label in formula parameter but the program works! Could > > you please clarify this point to me? > > The author of the e1071 package did you a favor. The predict.svm > function checks to see if your svm object was built using the formula > interface .. if so, it looks for you label column in the data you are > trying to predict on and ignores it. > > Look at the function's source code (eg, type e1071:::predict.svm at > the R prompt), and look for the call to the delete.response function > ... you can also look at the help in ?delete.response. > > -steve > > > >> Date: Wed, 6 Jan 2010 18:44:13 -0500 > >> Subject: Re: [R] svm > >> From: [hidden email] > >> To: [hidden email] > >> CC: [hidden email] > >> > >> Hi Amy, > >> > >> On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <[hidden email]> wrote: > >> > Hi Steve, > >> > > >> > Thank you very much for your reply. > >> > > >> > Im trying to do something systematic/general in the program so that I > >> > can > >> > try different datasets without changing much in the program (without > >> > knowing > >> > the name of the class label that has different name from dataset to > >> > another ) > >> > > >> > Could you please tell me your opinion about this code:- > >> > > >> > library(e1071) > >> > > >> > mydata<-read.delim("the_whole_dataset.txt") > >> > > >> > class_label <- names(mydata)[1] # Ill always put > >> > the > >> > class label in the first column. > >> > > >> > myformula <- formula(paste(class_label,"~ .")) > >> > > >> > x <- subset(mydata, select = - mydata[, 1]) > >> > > >> > mymodel<-(svm(myformula, x, cross=3)) > >> > > >> > summary(model) > >> > > >> > ################ > >> > >> Since you're not doing anything funky with the formula, a preference > >> of mine is to just skip this way of calling SVM and go "straight" to > >> the svm(x,y,...) method: > >> > >> R> mydata <- as.matrix(read.delim("the_whole_dataset.txt")) > >> R> train.x <- mydata[,-1] > >> R> train.y <- mydata[,1] > >> > >> R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification") > >> ## or > >> R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression") > >> > >> As an aside, I also like to be explicit about the type="" parameter to > >> tell what I want my SVM to do (regression or classification). If it's > >> not specified, the SVM picks which one to do based on whether or not > >> your y vector is a vector of factors (does classification), or not > >> (does regression) > >> > >> > Do I have to the same steps with testingset? i.e. the testing set must > >> > not > >> > contain the label too? But contains the same structure as the training > >> > set? > >> > Is it correct? > >> > >> I guess you'll want to report your accuracy/MSE/something on your > >> model for your testing set? Just load the data in the same way then > >> use `predict` to calculate the metric your after. You'll have to have > >> the labels for your data to do that, though, eg: > >> > >> testdata <- as.matrix(read.delim('testdata.txt')) > >> test.x <- testdata[,-1] > >> test.y <- testdata[,1] > >> preds <- predict(mymodel, test.x) > >> > >> Let's assume you're doing classification, so let's report the accuracy: > >> > >> acc <- sum(preds == test.y) / length(test.y) > >> > >> Does that help? > >> -steve > >> > >> -- > >> Steve Lianoglou > >> Graduate Student: Computational Systems Biology > >> | Memorial Sloan-Kettering Cancer Center > >> | Weill Medical College of Cornell University > >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > ________________________________ > > Sell your old one fast! Time for a new car? > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact View photos of singles in your area! Browse profiles for FREE [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
HI Amy,
On Wed, Feb 3, 2010 at 1:56 AM, Amy Hessen <[hidden email]> wrote: > > Hi Steve, > > Could you please help me in this point?: > > I use SVM of R and I’m trying some datasets from UCI but when I compare the > results of my program( that does not do anything more than calling SVM) with > the RMSE of SVM in any other paper, I found a big gap between them. > > For example, this is the rmse of svm of my program for the dataset bodyfat: > 2.64561 > > And this is the RMSE of a paper 0.0204. > > Could you please tell me how I can reduce this gap in the performance of > SVM? Sorry, it's hard to say w/o investing any real time to investigate (and I unfortunately don't have the time to do so). There are different parameters you can play with in nu-regression vs. eps-regression and different kernel functions that can be used that might be a better fit for the type of data you are trying to learn against. Before running the SVM (or any other "learning" alogorithm), there are also ways to normalize your data, too .. Lots of things to look at ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Steve, Thank you very much for your reply. Could you please guide me to any helpful reference to learn about the other non-linear regression algorithms available in R language and about how I use any of them? Cheers,Amyate: Wed, 3 Feb 2010 10:59:27 -0500 > Subject: Re: [R] svm > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > HI Amy, > > On Wed, Feb 3, 2010 at 1:56 AM, Amy Hessen <[hidden email]> wrote: > > > > Hi Steve, > > > > Could you please help me in this point?: > > > > I use SVM of R and Im trying some datasets from UCI but when I compare the > > results of my program( that does not do anything more than calling SVM) with > > the RMSE of SVM in any other paper, I found a big gap between them. > > > > For example, this is the rmse of svm of my program for the dataset bodyfat: > > 2.64561 > > > > And this is the RMSE of a paper 0.0204. > > > > Could you please tell me how I can reduce this gap in the performance of > > SVM? > > Sorry, it's hard to say w/o investing any real time to investigate > (and I unfortunately don't have the time to do so). > > There are different parameters you can play with in nu-regression vs. > eps-regression and different kernel functions that can be used that > might be a better fit for the type of data you are trying to learn > against. > > Before running the SVM (or any other "learning" alogorithm), there are > also ways to normalize your data, too .. > > Lots of things to look at ... > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact [[elided Hotmail spam]] [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Thu, 4 Feb 2010, Amy Hessen wrote:
> > > Hi Steve, > > > > Thank you very much for your reply. > > Could you please guide me to any helpful reference to learn about the > other non-linear regression algorithms available in R language and about > how I use any of them? There are a few papers in the Journal of Statistical Software that might be interesting for you. The paper about the "caret" package gives a good overview, many further pointers, and an easy-to-use interface (see http://www.jstatsoft.org/v28/i05/). There is also a comparison of Support Vector Machines in R (in http://www.jstatsoft.org/v15/i09/). Further interesting issues might be kernlab (http://www.jstatsoft.org/v11/i09/) or glmnet (http://www.jstatsoft.org/v33/i01/) among others. See also the Machine Learning task view http://CRAN.R-project.org/view=MachineLearning for other approaches and their implementations. hth, Z > Cheers,Amyate: Wed, 3 Feb 2010 10:59:27 -0500 >> Subject: Re: [R] svm >> From: [hidden email] >> To: [hidden email] >> CC: [hidden email] >> >> HI Amy, >> >> On Wed, Feb 3, 2010 at 1:56 AM, Amy Hessen <[hidden email]> wrote: >>> >>> Hi Steve, >>> >>> Could you please help me in this point?: >>> >>> I use SVM of R and I?m trying some datasets from UCI but when I compare the >>> results of my program( that does not do anything more than calling SVM) with >>> the RMSE of SVM in any other paper, I found a big gap between them. >>> >>> For example, this is the rmse of svm of my program for the dataset bodyfat: >>> 2.64561 >>> >>> And this is the RMSE of a paper 0.0204. >>> >>> Could you please tell me how I can reduce this gap in the performance of >>> SVM? >> >> Sorry, it's hard to say w/o investing any real time to investigate >> (and I unfortunately don't have the time to do so). >> >> There are different parameters you can play with in nu-regression vs. >> eps-regression and different kernel functions that can be used that >> might be a better fit for the type of data you are trying to learn >> against. >> >> Before running the SVM (or any other "learning" alogorithm), there are >> also ways to normalize your data, too .. >> >> Lots of things to look at ... >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > _________________________________________________________________ > [[elided Hotmail spam]] > > [[alternative HTML version deleted]] > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Achim, Thank you so much for your reply, could you please tell me how i call PLS . i' m sorry i tried but could not. Many thanks Amy > Date: Thu, 4 Feb 2010 02:31:36 +0100 > From: [hidden email] > To: [hidden email] > CC: [hidden email]; [hidden email] > Subject: Re: [R] svm > > On Thu, 4 Feb 2010, Amy Hessen wrote: > > > > > > > Hi Steve, > > > > > > > > Thank you very much for your reply. > > > > Could you please guide me to any helpful reference to learn about the > > other non-linear regression algorithms available in R language and about > > how I use any of them? > > There are a few papers in the Journal of Statistical Software that might > be interesting for you. The paper about the "caret" package gives a good > overview, many further pointers, and an easy-to-use interface (see > http://www.jstatsoft.org/v28/i05/). There is also a comparison of Support > Vector Machines in R (in http://www.jstatsoft.org/v15/i09/). Further > interesting issues might be kernlab (http://www.jstatsoft.org/v11/i09/) or > glmnet (http://www.jstatsoft.org/v33/i01/) among others. > > See also the Machine Learning task view > > http://CRAN.R-project.org/view=MachineLearning > > for other approaches and their implementations. > > hth, > Z > > > Cheers,Amyate: Wed, 3 Feb 2010 10:59:27 -0500 > >> Subject: Re: [R] svm > >> From: [hidden email] > >> To: [hidden email] > >> CC: [hidden email] > >> > >> HI Amy, > >> > >> On Wed, Feb 3, 2010 at 1:56 AM, Amy Hessen <[hidden email]> wrote: > >>> > >>> Hi Steve, > >>> > >>> Could you please help me in this point?: > >>> > >>> I use SVM of R and I?m trying some datasets from UCI but when I compare the > >>> results of my program( that does not do anything more than calling SVM) with > >>> the RMSE of SVM in any other paper, I found a big gap between them. > >>> > >>> For example, this is the rmse of svm of my program for the dataset bodyfat: > >>> 2.64561 > >>> > >>> And this is the RMSE of a paper 0.0204. > >>> > >>> Could you please tell me how I can reduce this gap in the performance of > >>> SVM? > >> > >> Sorry, it's hard to say w/o investing any real time to investigate > >> (and I unfortunately don't have the time to do so). > >> > >> There are different parameters you can play with in nu-regression vs. > >> eps-regression and different kernel functions that can be used that > >> might be a better fit for the type of data you are trying to learn > >> against. > >> > >> Before running the SVM (or any other "learning" alogorithm), there are > >> also ways to normalize your data, too .. > >> > >> Lots of things to look at ... > >> > >> -steve > >> > >> -- > >> Steve Lianoglou > >> Graduate Student: Computational Systems Biology > >> | Memorial Sloan-Kettering Cancer Center > >> | Weill Medical College of Cornell University > >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > _________________________________________________________________ > > [[elided Hotmail spam]] > > > > [[alternative HTML version deleted]] > > > > _________________________________________________________________ messenger [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Achim Zeileis
Hi, Every time I run a svm regression program, I got different RMSE value. Could you please tell me what the reason for that? Cheers, Amy _________________________________________________________________ If it exists, you'll find it on SEEK. Australia's #1 job site [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi,
On Fri, Feb 12, 2010 at 3:00 PM, Amy Hessen <[hidden email]> wrote: > > Hi, > Every time I run a svm regression program, I got different RMSE value. > Could you please tell me what the reason for that? Sorry, your question is a bit vague. Can you provide an example/code that shows this behavior? Is the different RMSE over different folds of cross validation. Over the same data? With the same parameters? Is the RMSE significantly different? Providing an example that shows this behavior would help. Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Thank you very much for your reply. I meant I got different RMSE with different runs to the same program without any change for the dataset or the parameters of SVM. This is my code: library(e1071) readingmydata <- as.matrix(read.delim("mydataset.txt")) train.x <- readingmydata[,-1] train.y <- readingmydata[,1] mymodel <- svm(train.x, train.y, cross=10) summary(mymodel) can you please tell me how I can fix that error? Cheers, Amy > Date: Tue, 16 Feb 2010 10:33:19 -0500 > Subject: Re: [R] svm and RMSE > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > Hi, > > On Fri, Feb 12, 2010 at 3:00 PM, Amy Hessen <[hidden email]> wrote: > > > > Hi, > > Every time I run a svm regression program, I got different RMSE value. > > Could you please tell me what the reason for that? > > Sorry, your question is a bit vague. > > Can you provide an example/code that shows this behavior? Is the > different RMSE over different folds of cross validation. Over the same > data? With the same parameters? Is the RMSE significantly different? > > Providing an example that shows this behavior would help. > > Thanks, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact _________________________________________________________________ Link all your email accounts and social updates with Hotmail. Find out now [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On 19.02.2010 01:29, Amy Hessen wrote: > > Hi > Thank you very much for your reply. > > I meant I got different RMSE with different runs to the same program without any change for the dataset or the parameters of SVM. > > This is my code: > > library(e1071) > readingmydata<- as.matrix(read.delim("mydataset.txt")) > train.x<- readingmydata[,-1] > train.y<- readingmydata[,1] > > mymodel<- svm(train.x, train.y, cross=10) > summary(mymodel) > can you please tell me how I can fix that error? No error at all, it is expected: You get different partitions of the data for cross validation since they are samples "at random". If you want to get the exactly same results, use a seed for the random number generator such as: set.seed(123) mymodel<- svm(train.x, train.y, cross=10) summary(mymodel) Uwe Ligges > Cheers, > Amy > > >> Date: Tue, 16 Feb 2010 10:33:19 -0500 >> Subject: Re: [R] svm and RMSE >> From: [hidden email] >> To: [hidden email] >> CC: [hidden email] >> >> Hi, >> >> On Fri, Feb 12, 2010 at 3:00 PM, Amy Hessen<[hidden email]> wrote: >>> >>> Hi, >>> Every time I run a svm regression program, I got different RMSE value. >>> Could you please tell me what the reason for that? >> >> Sorry, your question is a bit vague. >> >> Can you provide an example/code that shows this behavior? Is the >> different RMSE over different folds of cross validation. Over the same >> data? With the same parameters? Is the RMSE significantly different? >> >> Providing an example that shows this behavior would help. >> >> Thanks, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > _________________________________________________________________ > Link all your email accounts and social updates with Hotmail. Find out now > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Steve Lianoglou-6
Hi , Could you please help me in this question:? After trying this code: library(e1071) mydata <- as.matrix(read.delim("iris.txt")) train.x <- mydata[,-1] train.y <- mydata[,1] mymodel <- svm(train.x, train.y, cross=3, type="C-classification") I receive this error: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric I put the class label in the first column. Cheers, Amy > Date: Wed, 3 Feb 2010 10:59:27 -0500 > Subject: Re: [R] svm > From: [hidden email] > To: [hidden email] > CC: [hidden email] > > HI Amy, > > On Wed, Feb 3, 2010 at 1:56 AM, Amy Hessen <[hidden email]> wrote: > > > > Hi Steve, > > > > Could you please help me in this point?: > > > > I use SVM of R and Im trying some datasets from UCI but when I compare the > > results of my program( that does not do anything more than calling SVM) with > > the RMSE of SVM in any other paper, I found a big gap between them. > > > > For example, this is the rmse of svm of my program for the dataset bodyfat: > > 2.64561 > > > > And this is the RMSE of a paper 0.0204. > > > > Could you please tell me how I can reduce this gap in the performance of > > SVM? > > Sorry, it's hard to say w/o investing any real time to investigate > (and I unfortunately don't have the time to do so). > > There are different parameters you can play with in nu-regression vs. > eps-regression and different kernel functions that can be used that > might be a better fit for the type of data you are trying to learn > against. > > Before running the SVM (or any other "learning" alogorithm), there are > also ways to normalize your data, too .. > > Lots of things to look at ... > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact Looking for a place to rent, share or buy? Find your next place with ninemsn Property [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
