Random Forest Classifiers

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Random Forest Classifiers

statsmenace
I am learning Random Forest and have a basic training question. For my problem, I "derived" various classifiers (var0,var1...var9). They are independent, but the intrinsic values from which they are derived overlap. I get the following data for my RF tree. The question I have is, should I eliminate the number of classifiers that haven't shown enough importance (For example, I could scale %IncMSE relatively and may be just pick the top 3 or 4).

-------------------------------
%IncMSE IncNodePurity
Var0 10.84632 7.232559
var1 24.53021 7.976509
var2 26.5005 4.653162
var3 60.18863 21.882258
var4 11.97568 7.25413
var5 49.63468 16.968472
var6 19.55981 10.009517
var7 10.36669 13.136694
var8 14.16585 7.818673
var9 9.75812 7.178831
-------------------------------

[[elided Yahoo spam]]
        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Classifiers

statsmenace

Apologies as the  mail got sent before completion. Here's the full text
 
I am learning Random Forest and have a basic training question. For my problem, I "derived" various classifiers (var0,var1...var9). They are independent, but the intrinsic values from which they are derived overlap. I get the following data for my RF tree. The question I have is, should I eliminate the number of classifiers that haven't shown enough importance (For example, I could scale %IncMSE relatively and may be just pick the top 3 or 4).

-------------------------------
%IncMSE    IncNodePurity
Var0    10.84632    7.232559
var1    24.53021    7.976509
var2    26.5005    4.653162
var3    60.18863    21.882258
var4    11.97568    7.25413
var5    49.63468    16.968472
var6    19.55981    10.009517
var7    10.36669    13.136694
var8    14.16585    7.818673
var9    9.75812    7.178831
-------------------------------

Essentially, what I was attempting to do was to choose the best derived classifier by eliminating some from the above list which doesn't show noticeable relative impact on MSE. Any guidance or pointers is much appreciated. Thanks!


________________________________

To: "[hidden email]" <[hidden email]>
Sent: Saturday, November 26, 2011 5:45 PM
Subject: [R-SIG-Finance] Random Forest Classifiers

I am learning Random Forest and have a basic training question. For my problem, I "derived" various classifiers (var0,var1...var9). They are independent, but the intrinsic values from which they are derived overlap. I get the following data for my RF tree. The question I have is, should I eliminate the number of classifiers that haven't shown enough importance (For example, I could scale %IncMSE relatively and may be just pick the top 3 or 4).

-------------------------------
%IncMSE    IncNodePurity
Var0    10.84632    7.232559
var1    24.53021    7.976509
var2    26.5005    4.653162
var3    60.18863    21.882258
var4    11.97568    7.25413
var5    49.63468    16.968472
var6    19.55981    10.009517
var7    10.36669    13.136694
var8    14.16585    7.818673
var9    9.75812    7.178831
-------------------------------

[[elided Yahoo spam]]
    [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
        [[alternative HTML version deleted]]


_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Classifiers

Zachary Mayer
Whatever you do, be sure to cross-validate it!

Sent from my iPhone

On Nov 26, 2011, at 8:52 PM, Momop Momop <[hidden email]> wrote:

>
> Apologies as the� mail got sent before completion. Here's the full text
> �
> I am learning Random Forest and have a basic training question. For my problem, I "derived" various classifiers (var0,var1...var9). They are independent, but the intrinsic values from which they are derived overlap. I get the following data for my RF tree. The question I have is, should I eliminate the number of classifiers that haven't shown enough importance (For example, I could scale %IncMSE relatively and may be just pick the top 3 or 4).
>
> -------------------------------
> %IncMSE    IncNodePurity
> Var0    10.84632    7.232559
> var1    24.53021    7.976509
> var2    26.5005    4.653162
> var3    60.18863    21.882258
> var4    11.97568    7.25413
> var5    49.63468    16.968472
> var6    19.55981    10.009517
> var7    10.36669    13.136694
> var8    14.16585    7.818673
> var9    9.75812    7.178831
> -------------------------------
>
> Essentially, what I was attempting to do was to choose the best derived classifier by eliminating some from the above list which doesn't show noticeable relative impact on MSE. Any guidance or pointers is much appreciated. Thanks!
>
>
> ________________________________
>
> To: "[hidden email]" <[hidden email]>
> Sent: Saturday, November 26, 2011 5:45 PM
> Subject: [R-SIG-Finance] Random Forest Classifiers
>
> I am learning Random Forest and have a basic training question. For my problem, I "derived" various classifiers (var0,var1...var9). They are independent, but the intrinsic values from which they are derived overlap. I get the following data for my RF tree. The question I have is, should I eliminate the number of classifiers that haven't shown enough importance (For example, I could scale %IncMSE relatively and may be just pick the top 3 or 4).
>
> -------------------------------
> %IncMSE��� IncNodePurity
> Var0��� 10.84632��� 7.232559
> var1��� 24.53021��� 7.976509
> var2��� 26.5005��� 4.653162
> var3��� 60.18863��� 21.882258
> var4��� 11.97568��� 7.25413
> var5��� 49.63468��� 16.968472
> var6��� 19.55981��� 10.009517
> var7��� 10.36669��� 13.136694
> var8��� 14.16585��� 7.818673
> var9��� 9.75812��� 7.178831
> -------------------------------
>
> [[elided Yahoo spam]]
> ��� [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>    [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Classifiers

toro
In reply to this post by statsmenace
Momop, I think that would warp the robustness of RF. As I understand it, RF
averages together the different leaves which are themselves averages.
Pruning like you're talking about would risk overfitting to your particular
dataset rather than the data-generating process.

On Sat, Nov 26, 2011 at 6:52 PM, Momop Momop <[hidden email]> wrote:

>
> Apologies as the  mail got sent before completion. Here's the full text
>
> I am learning Random Forest and have a basic training question. For my
> problem, I "derived" various classifiers (var0,var1...var9). They are
> independent, but the intrinsic values from which they are derived overlap.
> I get the following data for my RF tree. The question I have is, should I
> eliminate the number of classifiers that haven't shown enough importance
> (For example, I could scale %IncMSE relatively and may be just pick the top
> 3 or 4).
>
> -------------------------------
> %IncMSE    IncNodePurity
> Var0    10.84632    7.232559
> var1    24.53021    7.976509
> var2    26.5005    4.653162
> var3    60.18863    21.882258
> var4    11.97568    7.25413
> var5    49.63468    16.968472
> var6    19.55981    10.009517
> var7    10.36669    13.136694
> var8    14.16585    7.818673
> var9    9.75812    7.178831
> -------------------------------
>
> Essentially, what I was attempting to do was to choose the best derived
> classifier by eliminating some from the above list which doesn't show
> noticeable relative impact on MSE. Any guidance or pointers is much
> appreciated. Thanks!
>
>
> ________________________________
>
> To: "[hidden email]" <[hidden email]>
> Sent: Saturday, November 26, 2011 5:45 PM
> Subject: [R-SIG-Finance] Random Forest Classifiers
>
> I am learning Random Forest and have a basic training question. For my
> problem, I "derived" various classifiers (var0,var1...var9). They are
> independent, but the intrinsic values from which they are derived overlap.
> I get the following data for my RF tree. The question I have is, should I
> eliminate the number of classifiers that haven't shown enough importance
> (For example, I could scale %IncMSE relatively and may be just pick the top
> 3 or 4).
>
> -------------------------------
> %IncMSE    IncNodePurity
> Var0    10.84632    7.232559
> var1    24.53021    7.976509
> var2    26.5005    4.653162
> var3    60.18863    21.882258
> var4    11.97568    7.25413
> var5    49.63468    16.968472
> var6    19.55981    10.009517
> var7    10.36669    13.136694
> var8    14.16585    7.818673
> var9    9.75812    7.178831
> -------------------------------
>
> [[elided Yahoo spam]]
>     [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Classifiers

Jeffrey Ryan-2
This isn't related to finance.  Part of the reason for separate lists
is to keep the noise to a minimum, as well as direct to where answers
may be best found.

Thanks,
Jeff

On Sat, Nov 26, 2011 at 8:42 PM, Chris Waggoner <[hidden email]> wrote:

> Momop, I think that would warp the robustness of RF. As I understand it, RF
> averages together the different leaves which are themselves averages.
> Pruning like you're talking about would risk overfitting to your particular
> dataset rather than the data-generating process.
>
> On Sat, Nov 26, 2011 at 6:52 PM, Momop Momop <[hidden email]> wrote:
>
>>
>> Apologies as the  mail got sent before completion. Here's the full text
>>
>> I am learning Random Forest and have a basic training question. For my
>> problem, I "derived" various classifiers (var0,var1...var9). They are
>> independent, but the intrinsic values from which they are derived overlap.
>> I get the following data for my RF tree. The question I have is, should I
>> eliminate the number of classifiers that haven't shown enough importance
>> (For example, I could scale %IncMSE relatively and may be just pick the top
>> 3 or 4).
>>
>> -------------------------------
>> %IncMSE    IncNodePurity
>> Var0    10.84632    7.232559
>> var1    24.53021    7.976509
>> var2    26.5005    4.653162
>> var3    60.18863    21.882258
>> var4    11.97568    7.25413
>> var5    49.63468    16.968472
>> var6    19.55981    10.009517
>> var7    10.36669    13.136694
>> var8    14.16585    7.818673
>> var9    9.75812    7.178831
>> -------------------------------
>>
>> Essentially, what I was attempting to do was to choose the best derived
>> classifier by eliminating some from the above list which doesn't show
>> noticeable relative impact on MSE. Any guidance or pointers is much
>> appreciated. Thanks!
>>
>>
>> ________________________________
>>
>> To: "[hidden email]" <[hidden email]>
>> Sent: Saturday, November 26, 2011 5:45 PM
>> Subject: [R-SIG-Finance] Random Forest Classifiers
>>
>> I am learning Random Forest and have a basic training question. For my
>> problem, I "derived" various classifiers (var0,var1...var9). They are
>> independent, but the intrinsic values from which they are derived overlap.
>> I get the following data for my RF tree. The question I have is, should I
>> eliminate the number of classifiers that haven't shown enough importance
>> (For example, I could scale %IncMSE relatively and may be just pick the top
>> 3 or 4).
>>
>> -------------------------------
>> %IncMSE    IncNodePurity
>> Var0    10.84632    7.232559
>> var1    24.53021    7.976509
>> var2    26.5005    4.653162
>> var3    60.18863    21.882258
>> var4    11.97568    7.25413
>> var5    49.63468    16.968472
>> var6    19.55981    10.009517
>> var7    10.36669    13.136694
>> var8    14.16585    7.818673
>> var9    9.75812    7.178831
>> -------------------------------
>>
>> [[elided Yahoo spam]]
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>



--
Jeffrey Ryan
[hidden email]

www.lemnica.com
www.esotericR.com

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.