How to change the number of bins?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to change the number of bins?

bowiew
I wish to calculate the weight of evidence of a variable x, which is
positively skewed, with over 6000 of the observations are 999 but only 200
range from 1-27. I used the code,

“IV<-create_infotables(data=Test[,-1],y="class",bins=10)”

However, no matter what number I used in bins parameter, I can only get 2
bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
closely because they represent a lot? The output from R is shown below,

Table$pdays
    pdays        N   Percent     WOE       IV
    1 [1,27]    243  0.03807584  2.6743166 0.5267751
    2 [999,999] 6139 0.96192416 -0.2230081 0.5707022

Thank you very much!!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to change the number of bins?

David Winsemius
Seems rather likely that 999 is not really a measured value but rather
is a missing value indicator.


--

David.

On 3/10/19 1:54 PM, wong bowie wrote:

> I wish to calculate the weight of evidence of a variable x, which is
> positively skewed, with over 6000 of the observations are 999 but only 200
> range from 1-27. I used the code,
>
> “IV<-create_infotables(data=Test[,-1],y="class",bins=10)”
>
> However, no matter what number I used in bins parameter, I can only get 2
> bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> closely because they represent a lot? The output from R is shown below,
>
> Table$pdays
>      pdays        N   Percent     WOE       IV
>      1 [1,27]    243  0.03807584  2.6743166 0.5267751
>      2 [999,999] 6139 0.96192416 -0.2230081 0.5707022
>
> Thank you very much!!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to change the number of bins?

bowiew
You are right. Actually this variable represents the number of day passed
after contacting a client, 999 means the client has never been contacted.

But I am not supposed to change the value, am I?

David Winsemius <[hidden email]> 於 2019年3月10日 週日 下午10:48寫道:

> Seems rather likely that 999 is not really a measured value but rather
> is a missing value indicator.
>
>
> --
>
> David.
>
> On 3/10/19 1:54 PM, wong bowie wrote:
> > I wish to calculate the weight of evidence of a variable x, which is
> > positively skewed, with over 6000 of the observations are 999 but only
> 200
> > range from 1-27. I used the code,
> >
> > “IV<-create_infotables(data=Test[,-1],y="class",bins=10)”
> >
> > However, no matter what number I used in bins parameter, I can only get 2
> > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> > closely because they represent a lot? The output from R is shown below,
> >
> > Table$pdays
> >      pdays        N   Percent     WOE       IV
> >      1 [1,27]    243  0.03807584  2.6743166 0.5267751
> >      2 [999,999] 6139 0.96192416 -0.2230081 0.5707022
> >
> > Thank you very much!!
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to change the number of bins?

David Winsemius

On 3/10/19 5:29 PM, wong bowie wrote:
> You are right. Actually this variable represents the number of day
> passed after contacting a client, 999 means the client has never been
> contacted.
>
> But I am not supposed to change the value, am I?


I certainly would. SAS allows one to specify a value such as 999 to be
missing but R needs to have it changed to NA

is.na(Table$pdays) <- Table$pdays == 999


--

David


>
> David Winsemius <[hidden email]
> <mailto:[hidden email]>> 於 2019年3月10日 週日 下午10:48寫道:
>
>     Seems rather likely that 999 is not really a measured value but
>     rather
>     is a missing value indicator.
>
>
>     --
>
>     David.
>
>     On 3/10/19 1:54 PM, wong bowie wrote:
>     > I wish to calculate the weight of evidence of a variable x, which is
>     > positively skewed, with over 6000 of the observations are 999
>     but only 200
>     > range from 1-27. I used the code,
>     >
>     > “IV<-create_infotables(data=Test[,-1],y="class",bins=10)”
>     >
>     > However, no matter what number I used in bins parameter, I can
>     only get 2
>     > bins, [1,27] and [999,999]. Is there any way I can look into the
>     [1,27]
>     > closely because they represent a lot? The output from R is shown
>     below,
>     >
>     > Table$pdays
>     >      pdays        N   Percent     WOE       IV
>     >      1 [1,27]    243  0.03807584  2.6743166 0.5267751
>     >      2 [999,999] 6139 0.96192416 -0.2230081 0.5707022
>     >
>     > Thank you very much!!
>     >
>     >       [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > [hidden email] <mailto:[hidden email]> mailing list
>     -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to change the number of bins?

Jim Lemon-4
In reply to this post by bowiew
Hi Bowie,
As David suggested, you can substitute the R missing value (NA) for
999 (probably an SPSS missing  value). If you don't want to change it,
you could probably just subset your data like this:

V<-create_infotables(data=Test[Test[n] != 999,-1],y="class",bins=10)

where "n" is the column number in Test of the variable of interest.

Jim

On Mon, Mar 11, 2019 at 9:45 AM wong bowie <[hidden email]> wrote:

>
> I wish to calculate the weight of evidence of a variable x, which is
> positively skewed, with over 6000 of the observations are 999 but only 200
> range from 1-27. I used the code,
>
> “IV<-create_infotables(data=Test[,-1],y="class",bins=10)”
>
> However, no matter what number I used in bins parameter, I can only get 2
> bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> closely because they represent a lot? The output from R is shown below,
>
> Table$pdays
>     pdays        N   Percent     WOE       IV
>     1 [1,27]    243  0.03807584  2.6743166 0.5267751
>     2 [999,999] 6139 0.96192416 -0.2230081 0.5707022
>
> Thank you very much!!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to change the number of bins?

Bert Gunter-2
You are asking the wrong question. The right  question is, "why are so many
values missing?" Is it because they were censored, not reported for some
reason, due to instrument failure,...?  Until you answer that question, any
analysis you do is garbage.

I strongly recommend you consult a competent data analyst.

Bert

On Sun, Mar 10, 2019, 9:21 PM Jim Lemon <[hidden email]> wrote:

> Hi Bowie,
> As David suggested, you can substitute the R missing value (NA) for
> 999 (probably an SPSS missing  value). If you don't want to change it,
> you could probably just subset your data like this:
>
> V<-create_infotables(data=Test[Test[n] != 999,-1],y="class",bins=10)
>
> where "n" is the column number in Test of the variable of interest.
>
> Jim
>
> On Mon, Mar 11, 2019 at 9:45 AM wong bowie <[hidden email]> wrote:
> >
> > I wish to calculate the weight of evidence of a variable x, which is
> > positively skewed, with over 6000 of the observations are 999 but only
> 200
> > range from 1-27. I used the code,
> >
> > “IV<-create_infotables(data=Test[,-1],y="class",bins=10)”
> >
> > However, no matter what number I used in bins parameter, I can only get 2
> > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> > closely because they represent a lot? The output from R is shown below,
> >
> > Table$pdays
> >     pdays        N   Percent     WOE       IV
> >     1 [1,27]    243  0.03807584  2.6743166 0.5267751
> >     2 [999,999] 6139 0.96192416 -0.2230081 0.5707022
> >
> > Thank you very much!!
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.