

I wish to calculate the weight of evidence of a variable x, which is
positively skewed, with over 6000 of the observations are 999 but only 200
range from 127. I used the code,
“IV<create_infotables(data=Test[,1],y="class",bins=10)”
However, no matter what number I used in bins parameter, I can only get 2
bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
closely because they represent a lot? The output from R is shown below,
Table$pdays
pdays N Percent WOE IV
1 [1,27] 243 0.03807584 2.6743166 0.5267751
2 [999,999] 6139 0.96192416 0.2230081 0.5707022
Thank you very much!!
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Seems rather likely that 999 is not really a measured value but rather
is a missing value indicator.

David.
On 3/10/19 1:54 PM, wong bowie wrote:
> I wish to calculate the weight of evidence of a variable x, which is
> positively skewed, with over 6000 of the observations are 999 but only 200
> range from 127. I used the code,
>
> “IV<create_infotables(data=Test[,1],y="class",bins=10)”
>
> However, no matter what number I used in bins parameter, I can only get 2
> bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> closely because they represent a lot? The output from R is shown below,
>
> Table$pdays
> pdays N Percent WOE IV
> 1 [1,27] 243 0.03807584 2.6743166 0.5267751
> 2 [999,999] 6139 0.96192416 0.2230081 0.5707022
>
> Thank you very much!!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


You are right. Actually this variable represents the number of day passed
after contacting a client, 999 means the client has never been contacted.
But I am not supposed to change the value, am I?
David Winsemius < [hidden email]> 於 2019年3月10日 週日 下午10:48寫道：
> Seems rather likely that 999 is not really a measured value but rather
> is a missing value indicator.
>
>
> 
>
> David.
>
> On 3/10/19 1:54 PM, wong bowie wrote:
> > I wish to calculate the weight of evidence of a variable x, which is
> > positively skewed, with over 6000 of the observations are 999 but only
> 200
> > range from 127. I used the code,
> >
> > “IV<create_infotables(data=Test[,1],y="class",bins=10)”
> >
> > However, no matter what number I used in bins parameter, I can only get 2
> > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> > closely because they represent a lot? The output from R is shown below,
> >
> > Table$pdays
> > pdays N Percent WOE IV
> > 1 [1,27] 243 0.03807584 2.6743166 0.5267751
> > 2 [999,999] 6139 0.96192416 0.2230081 0.5707022
> >
> > Thank you very much!!
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 3/10/19 5:29 PM, wong bowie wrote:
> You are right. Actually this variable represents the number of day
> passed after contacting a client, 999 means the client has never been
> contacted.
>
> But I am not supposed to change the value, am I?
I certainly would. SAS allows one to specify a value such as 999 to be
missing but R needs to have it changed to NA
is.na(Table$pdays) < Table$pdays == 999

David
>
> David Winsemius < [hidden email]
> <mailto: [hidden email]>> 於 2019年3月10日 週日 下午10:48寫道：
>
> Seems rather likely that 999 is not really a measured value but
> rather
> is a missing value indicator.
>
>
> 
>
> David.
>
> On 3/10/19 1:54 PM, wong bowie wrote:
> > I wish to calculate the weight of evidence of a variable x, which is
> > positively skewed, with over 6000 of the observations are 999
> but only 200
> > range from 127. I used the code,
> >
> > “IV<create_infotables(data=Test[,1],y="class",bins=10)”
> >
> > However, no matter what number I used in bins parameter, I can
> only get 2
> > bins, [1,27] and [999,999]. Is there any way I can look into the
> [1,27]
> > closely because they represent a lot? The output from R is shown
> below,
> >
> > Table$pdays
> > pdays N Percent WOE IV
> > 1 [1,27] 243 0.03807584 2.6743166 0.5267751
> > 2 [999,999] 6139 0.96192416 0.2230081 0.5707022
> >
> > Thank you very much!!
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] <mailto: [hidden email]> mailing list
>  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Bowie,
As David suggested, you can substitute the R missing value (NA) for
999 (probably an SPSS missing value). If you don't want to change it,
you could probably just subset your data like this:
V<create_infotables(data=Test[Test[n] != 999,1],y="class",bins=10)
where "n" is the column number in Test of the variable of interest.
Jim
On Mon, Mar 11, 2019 at 9:45 AM wong bowie < [hidden email]> wrote:
>
> I wish to calculate the weight of evidence of a variable x, which is
> positively skewed, with over 6000 of the observations are 999 but only 200
> range from 127. I used the code,
>
> “IV<create_infotables(data=Test[,1],y="class",bins=10)”
>
> However, no matter what number I used in bins parameter, I can only get 2
> bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> closely because they represent a lot? The output from R is shown below,
>
> Table$pdays
> pdays N Percent WOE IV
> 1 [1,27] 243 0.03807584 2.6743166 0.5267751
> 2 [999,999] 6139 0.96192416 0.2230081 0.5707022
>
> Thank you very much!!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


You are asking the wrong question. The right question is, "why are so many
values missing?" Is it because they were censored, not reported for some
reason, due to instrument failure,...? Until you answer that question, any
analysis you do is garbage.
I strongly recommend you consult a competent data analyst.
Bert
On Sun, Mar 10, 2019, 9:21 PM Jim Lemon < [hidden email]> wrote:
> Hi Bowie,
> As David suggested, you can substitute the R missing value (NA) for
> 999 (probably an SPSS missing value). If you don't want to change it,
> you could probably just subset your data like this:
>
> V<create_infotables(data=Test[Test[n] != 999,1],y="class",bins=10)
>
> where "n" is the column number in Test of the variable of interest.
>
> Jim
>
> On Mon, Mar 11, 2019 at 9:45 AM wong bowie < [hidden email]> wrote:
> >
> > I wish to calculate the weight of evidence of a variable x, which is
> > positively skewed, with over 6000 of the observations are 999 but only
> 200
> > range from 127. I used the code,
> >
> > “IV<create_infotables(data=Test[,1],y="class",bins=10)”
> >
> > However, no matter what number I used in bins parameter, I can only get 2
> > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27]
> > closely because they represent a lot? The output from R is shown below,
> >
> > Table$pdays
> > pdays N Percent WOE IV
> > 1 [1,27] 243 0.03807584 2.6743166 0.5267751
> > 2 [999,999] 6139 0.96192416 0.2230081 0.5707022
> >
> > Thank you very much!!
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

