Fwd: high p values

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: high p values

khan123
Hi

This is my function:

wilcox.test(A,B, data = data, paired = FALSE)

It gives me high p value, though the median of A column is 6900 and B
column is 3500.

Why it gives p value high if there is a difference in the median?

Regards

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: high p values

Patrick (Malone Quantitative)
We've had this conversation.

A) This is off-topic for R-Help. Your question is about the statistical test, not about the R coding.

B) A difference in sample statistics, whether or not it "looks" large, is not sufficient for statistical significance.

´╗┐On 3/19/19, 12:48 PM, "R-help on behalf of javed khan" <[hidden email] on behalf of [hidden email]> wrote:

    Hi
   
    This is my function:
   
    wilcox.test(A,B, data = data, paired = FALSE)
   
    It gives me high p value, though the median of A column is 6900 and B
    column is 3500.
   
    Why it gives p value high if there is a difference in the median?
   
    Regards
   
    [[alternative HTML version deleted]]
   
    ______________________________________________
    [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
   

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: high p values

S Ellison-2
In reply to this post by khan123
> This is my function:
>
> wilcox.test(A,B, data = data, paired = FALSE)
>
> It gives me high p value, though the median of A column is 6900 and B
> column is 3500.
>
> Why it gives p value high if there is a difference in the median?

Perhaps becuase a) because you are testing the wrong data or b) there isn't a significant difference

a) You are probably not using the data you think you are. Check ?wilcox.test; the 'data' argument is specific to the formula method. That needs a formula as the first argument, not a numeric vector. What you've done is apply the default, and 'data' has been ignored. So A and B are whatever was lying around in your current environment, not what is in 'data'.  ('data' is a terrible name for a data frame, by the way, as 'data' is an R function).

After that:
- How many data points do you have in each group?
- How much do the two groups overlap?

If the answers are 'not many' or 'lots' (in that order), and especially if both apply, you can't expect a significant test result.

S Ellison


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: high p values

R help mailing list-2
In reply to this post by khan123
Any reasonable test of whether two samples differ should be scale and
location invariant.  E.g., if you measure temperature it should not matter
if you units are degrees Fahrenheit or micro-Kelvins.  Thus saying the
medians are 3500 and 6200 is equivalent to saying they are 100.035 and
100.062: it does not tell use how different the samples are.  You need to
consider how much overlap there is.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Mar 19, 2019 at 9:48 AM javed khan <[hidden email]> wrote:

> Hi
>
> This is my function:
>
> wilcox.test(A,B, data = data, paired = FALSE)
>
> It gives me high p value, though the median of A column is 6900 and B
> column is 3500.
>
> Why it gives p value high if there is a difference in the median?
>
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: high p values

Jim Lemon-4
In reply to this post by khan123
Hi Javed,
Easy.

A<-c(2000,2100,2300,2400,6900,7000,7040,7050,7060)
median(A)
[1] 6900
B<-c(3300,3350,3400,3450,3500,7000,7100,7200,7300)
median(B)
[1] 3500
wilcox.test(A,B,paired=FALSE)

       Wilcoxon rank sum test with continuity correction

data:  A and B
W = 26.5, p-value = 0.233
alternative hypothesis: true location shift is not equal to 0

Jim

On Wed, Mar 20, 2019 at 3:48 AM javed khan <[hidden email]> wrote:

>
> Hi
>
> This is my function:
>
> wilcox.test(A,B, data = data, paired = FALSE)
>
> It gives me high p value, though the median of A column is 6900 and B
> column is 3500.
>
> Why it gives p value high if there is a difference in the median?
>
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: high p values

R help mailing list-2
Hi,

Since folks are taking the time to point out some subtle issues here, taking an example from the UCLA Stats web site:

https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-why-is-the-mann-whitney-significant-when-the-medians-are-equal/

Grp1 <- rep(c(-2, 0, 5), each = 20)
Grp2 <- rep(c(-1, 0, 10), each = 20)

> Grp1
 [1] -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2  0  0
[23]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  5  5  5
[45]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
> Grp2
 [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  0  0
[23]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 10 10 10 10
[45] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

> median(Grp1)
[1] 0
> median(Grp2)
[1] 0

> wilcox.test(Grp1, Grp2)

        Wilcoxon rank sum test with continuity correction

data:  Grp1 and Grp2
W = 1400, p-value = 0.03096
alternative hypothesis: true location shift is not equal to 0


So, in contrast to the original problem, here is an example where you have equal medians, but a significant test result.

The key concept is that the Wilcoxon Rank Sum test is not strictly a test of differences in medians. That is, the null hypothesis for the test is not that the medians are equal, and you are either accepting or rejecting that null.

Javed, I would suggest spending some time with a good tutorial on non-parametric statistics.

Regards,

Marc Schwartz


> On Mar 19, 2019, at 6:25 PM, Jim Lemon <[hidden email]> wrote:
>
> Hi Javed,
> Easy.
>
> A<-c(2000,2100,2300,2400,6900,7000,7040,7050,7060)
> median(A)
> [1] 6900
> B<-c(3300,3350,3400,3450,3500,7000,7100,7200,7300)
> median(B)
> [1] 3500
> wilcox.test(A,B,paired=FALSE)
>
>       Wilcoxon rank sum test with continuity correction
>
> data:  A and B
> W = 26.5, p-value = 0.233
> alternative hypothesis: true location shift is not equal to 0
>
> Jim
>
> On Wed, Mar 20, 2019 at 3:48 AM javed khan <[hidden email]> wrote:
>>
>> Hi
>>
>> This is my function:
>>
>> wilcox.test(A,B, data = data, paired = FALSE)
>>
>> It gives me high p value, though the median of A column is 6900 and B
>> column is 3500.
>>
>> Why it gives p value high if there is a difference in the median?
>>
>> Regards

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.