Looking for packages to do Feature Selection and Classification

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Looking for packages to do Feature Selection and Classification

FD-4
Hi All,

Sorry if this is a repost (a quick browse didn't give me the answer).

I wonder if there are packages that can do the feature selection and
classification at the same time. For instance, I am using SVM to classify my
samples, but it's easy to get overfitted if using all of the features. Thus,
it is necessary to select "good" features to build an optimum hyperplane
(?). Here is a simple example: Suppose I have 100 "useful" features and 100
"useless" features (or noise features), I want the SVM to give me the
same results when 1) using only 100 useful features or 2) using all 200
features.

Any suggestions or point me to a reference?

Thanks in advance!

Frank

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Looking for packages to do Feature Selection and Classification

Ramon Diaz-Uriarte
Dear Frank,
I expect you'll get many different answers since a wide variety of approaches have been suggested. So I'll stick to self-advertisment: I've written an R package, varSelRF (available from R), that uses random forest together with a simple variable selection approach, and provides also bootstrap estimates of the error rate of the procedure. Andy Liaw and collaborators previously developed and published a somewhat similar procedure. You probably also want to take a look at several packages available from BioConductor.

Best,

R.


-----Original Message-----
From: [hidden email] on behalf of Frank Duan
Sent: Wed 1/4/2006 4:23 AM
To: r-help
Cc:
Subject: [R] Looking for packages to do Feature Selection and Classification

Hi All,

Sorry if this is a repost (a quick browse didn't give me the answer).

I wonder if there are packages that can do the feature selection and
classification at the same time. For instance, I am using SVM to classify my
samples, but it's easy to get overfitted if using all of the features. Thus,
it is necessary to select "good" features to build an optimum hyperplane
(?). Here is a simple example: Suppose I have 100 "useful" features and 100
"useless" features (or noise features), I want the SVM to give me the
same results when 1) using only 100 useful features or 2) using all 200
features.

Any suggestions or point me to a reference?

Thanks in advance!

Frank

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Looking for packages to do Feature Selection and Classification

Weiwei Shi
FYI:

check the following paper on svm (using libsvm) as well as random
forest in the context of feature selection.

http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf

HTH

On 1/4/06, Diaz.Ramon <[hidden email]> wrote:

> Dear Frank,
> I expect you'll get many different answers since a wide variety of approaches have been suggested. So I'll stick to self-advertisment: I've written an R package, varSelRF (available from R), that uses random forest together with a simple variable selection approach, and provides also bootstrap estimates of the error rate of the procedure. Andy Liaw and collaborators previously developed and published a somewhat similar procedure. You probably also want to take a look at several packages available from BioConductor.
>
> Best,
>
> R.
>
>
> -----Original Message-----
> From:   [hidden email] on behalf of Frank Duan
> Sent:   Wed 1/4/2006 4:23 AM
> To:     r-help
> Cc:
> Subject:        [R] Looking for packages to do Feature Selection and Classification
>
> Hi All,
>
> Sorry if this is a repost (a quick browse didn't give me the answer).
>
> I wonder if there are packages that can do the feature selection and
> classification at the same time. For instance, I am using SVM to classify my
> samples, but it's easy to get overfitted if using all of the features. Thus,
> it is necessary to select "good" features to build an optimum hyperplane
> (?). Here is a simple example: Suppose I have 100 "useful" features and 100
> "useless" features (or noise features), I want the SVM to give me the
> same results when 1) using only 100 useful features or 2) using all 200
> features.
>
> Any suggestions or point me to a reference?
>
> Thanks in advance!
>
> Frank
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
> --
> Ramón Díaz-Uriarte
> Bioinformatics Unit
> Centro Nacional de Investigaciones Oncológicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fernández Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
>
> http://ligarto.org/rdiaz
> PGP KeyID: 0xE89B3462
> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>
>
>
> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Looking for packages to do Feature Selection and Classification

Ramon Diaz-Uriarte
In reply to this post by FD-4

Thanks for the reference, it looks very interesting.

Best,

R.

-----Original Message-----
From: Weiwei Shi [mailto:[hidden email]]
Sent: Thu 1/5/2006 9:01 PM
To: Diaz.Ramon
Cc: Frank Duan; r-help
Subject: Re: [R] Looking for packages to do Feature Selection and Classification

FYI:

check the following paper on svm (using libsvm) as well as random
forest in the context of feature selection.

http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf

HTH

On 1/4/06, Diaz.Ramon <[hidden email]> wrote:

> Dear Frank,
> I expect you'll get many different answers since a wide variety of approaches have been suggested. So I'll stick to self-advertisment: I've written an R package, varSelRF (available from R), that uses random forest together with a simple variable selection approach, and provides also bootstrap estimates of the error rate of the procedure. Andy Liaw and collaborators previously developed and published a somewhat similar procedure. You probably also want to take a look at several packages available from BioConductor.
>
> Best,
>
> R.
>
>
> -----Original Message-----
> From:   [hidden email] on behalf of Frank Duan
> Sent:   Wed 1/4/2006 4:23 AM
> To:     r-help
> Cc:
> Subject:        [R] Looking for packages to do Feature Selection and Classification
>
> Hi All,
>
> Sorry if this is a repost (a quick browse didn't give me the answer).
>
> I wonder if there are packages that can do the feature selection and
> classification at the same time. For instance, I am using SVM to classify my
> samples, but it's easy to get overfitted if using all of the features. Thus,
> it is necessary to select "good" features to build an optimum hyperplane
> (?). Here is a simple example: Suppose I have 100 "useful" features and 100
> "useless" features (or noise features), I want the SVM to give me the
> same results when 1) using only 100 useful features or 2) using all 200
> features.
>
> Any suggestions or point me to a reference?
>
> Thanks in advance!
>
> Frank
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
> --
> Ramón Díaz-Uriarte
> Bioinformatics Unit
> Centro Nacional de Investigaciones Oncológicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fernández Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
>
> http://ligarto.org/rdiaz
> PGP KeyID: 0xE89B3462
> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>
>
>
> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Looking for packages to do Feature Selection and Classification

FD-4
In reply to this post by Weiwei Shi
Thanks. It's indeed an interesting paper. Besides RF (using Ramon's varSelRF
package), I am also testing Guyon et al's (2002) Recursive Feature
Elimination for my feature-selection part.

On 1/5/06, Weiwei Shi <[hidden email]> wrote:

>
> FYI:
>
> check the following paper on svm (using libsvm) as well as random
> forest in the context of feature selection.
>
> http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf
>
> HTH
>
> On 1/4/06, Diaz.Ramon <[hidden email]> wrote:
> > Dear Frank,
> > I expect you'll get many different answers since a wide variety of
> approaches have been suggested. So I'll stick to self-advertisment: I've
> written an R package, varSelRF (available from R), that uses random forest
> together with a simple variable selection approach, and provides also
> bootstrap estimates of the error rate of the procedure. Andy Liaw and
> collaborators previously developed and published a somewhat similar
> procedure. You probably also want to take a look at several packages
> available from BioConductor.
> >
> > Best,
> >
> > R.
> >
> >
> > -----Original Message-----
> > From:   [hidden email] on behalf of Frank Duan
> > Sent:   Wed 1/4/2006 4:23 AM
> > To:     r-help
> > Cc:
> > Subject:        [R] Looking for packages to do Feature Selection and
> Classification
> >
> > Hi All,
> >
> > Sorry if this is a repost (a quick browse didn't give me the answer).
> >
> > I wonder if there are packages that can do the feature selection and
> > classification at the same time. For instance, I am using SVM to
> classify my
> > samples, but it's easy to get overfitted if using all of the features.
> Thus,
> > it is necessary to select "good" features to build an optimum hyperplane
> > (?). Here is a simple example: Suppose I have 100 "useful" features and
> 100
> > "useless" features (or noise features), I want the SVM to give me the
> > same results when 1) using only 100 useful features or 2) using all 200
> > features.
> >
> > Any suggestions or point me to a reference?
> >
> > Thanks in advance!
> >
> > Frank
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
> > --
> > Ramón Díaz-Uriarte
> > Bioinformatics Unit
> > Centro Nacional de Investigaciones Oncológicas (CNIO)
> > (Spanish National Cancer Center)
> > Melchor Fernández Almagro, 3
> > 28029 Madrid (Spain)
> > Fax: +-34-91-224-6972
> > Phone: +-34-91-224-6900
> >
> > http://ligarto.org/rdiaz
> > PGP KeyID: 0xE89B3462
> > (http://ligarto.org/rdiaz/0xE89B3462.asc)
> >
> >
> >
> > **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en
> s...{{dropped}}
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
>
>
> --
> Weiwei Shi, Ph.D
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html