Quantcast

Sugeestion about tuning of SVM

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Sugeestion about tuning of SVM

Guido Leoni
Dear list
I've a generic question about how to tune an SVM
I'm trying to classify  with caret package some population data from a
case-control study . In each column of my matrix there are the SNP
genotypes , in each row there are the individuals.
I correctly splitted my total dataset in training(132 individuals) and test
(50 individuals) (respecting the total observed genotypic frequencies and
the % of cases and controls)
After training (with radial RBF function)  I have an accuracy of the best
model of 76% but applying the model to my test dataset the accuracy
decreases to 52%.
Obviously i expected the decrease but this appear to be quite big in my
opinion.
I manually checked the predictions for my test dataset and some cases that
have no risk allele are not well classified. Similar cases in my training
dataset are well recognized.
Please could you suggest to me which parameters modify  in order to improve
the classification for the test dataset? or better which could be the
causes that could originate this big discrepancy?
I know that my question is very generic but i'm very newbie to this kind of
analysis so please any suggestion is the welcome
thank you very much
Guido

--
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel     + 39 06 51 49 41 (operator)
        + 39 06 51 49 4498 (direct)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Sugeestion about tuning of SVM

mark leeds
Hi: I don't know anything about gentoypes but it sounds like you overfitted
the training set so you should try using regularization. In standard
svm-classification algorithms, that can be done by decreasing the parameter
C which decreases the objective functional penalty for mis-classifying. (
allows the margin to increase by allowing  the
algorithm to mis-classify more often ) But you're using caret rather than
one of the svm packages directly so the parameter might be called something
else rather than C.

There are so many books on support vector machines but a nice intro from an
R perspective is "Support Vector Machines in R" in the Journal of
Statistical Software. ( it's free at www.jstatsoft.com )










On Fri, Jun 15, 2012 at 8:19 AM, Guido Leoni <[hidden email]> wrote:

> Dear list
> I've a generic question about how to tune an SVM
> I'm trying to classify  with caret package some population data from a
> case-control study . In each column of my matrix there are the SNP
> genotypes , in each row there are the individuals.
> I correctly splitted my total dataset in training(132 individuals) and test
> (50 individuals) (respecting the total observed genotypic frequencies and
> the % of cases and controls)
> After training (with radial RBF function)  I have an accuracy of the best
> model of 76% but applying the model to my test dataset the accuracy
> decreases to 52%.
> Obviously i expected the decrease but this appear to be quite big in my
> opinion.
> I manually checked the predictions for my test dataset and some cases that
> have no risk allele are not well classified. Similar cases in my training
> dataset are well recognized.
> Please could you suggest to me which parameters modify  in order to improve
> the classification for the test dataset? or better which could be the
> causes that could originate this big discrepancy?
> I know that my question is very generic but i'm very newbie to this kind of
> analysis so please any suggestion is the welcome
> thank you very much
> Guido
>
> --
> Guido Leoni
> National Research Institute on Food and Nutrition
> (I.N.R.A.N.)
> via Ardeatina 546
> 00178 Rome
> Italy
>
> tel     + 39 06 51 49 41 (operator)
>        + 39 06 51 49 4498 (direct)
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...