Multiple sets of proportion tests

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple sets of proportion tests

Allaisone 1

Hi all ,


I have a dataframe  of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :-


> Mydata


                                      variable I      variable II    Variable III  ......... 200

Freq.of cases (gp I)      6493               9375               5524

Freq. of cases (gpII)     509                  462                 54



The result I need for the first column can be given using this code :


 MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
for the second  column :-
MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..


I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :-


                                      variable I        Variable III  .........

Freq.of cases (gp I)      6493                   5524

Freq. of cases (gpII)     509                      54

p-values                          0.02               0.010

Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05.

I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.

 MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))

How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated ..

Regards

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple sets of proportion tests

Thierry Onkelinx
Hi anonymous,

?prop.test states that it returns a list. And one of the element is
'p.value'.  str() on the output of prop.test() reveals that too. So
prop.test()$p.value or prop.test()["p.value"] should work.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
[hidden email]
Kliniekstraat 25, B-1070 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.

///////////////////////////////////////////////////////////////////////////////////////////



2017-11-24 12:09 GMT+01:00 Allaisone 1 <[hidden email]>:

>
> Hi all ,
>
>
> I have a dataframe  of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :-
>
>
>> Mydata
>
>
>                                       variable I      variable II    Variable III  ......... 200
>
> Freq.of cases (gp I)      6493               9375               5524
>
> Freq. of cases (gpII)     509                  462                 54
>
>
>
> The result I need for the first column can be given using this code :
>
>
>  MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
> for the second  column :-
> MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..
>
>
> I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :-
>
>
>                                       variable I        Variable III  .........
>
> Freq.of cases (gp I)      6493                   5524
>
> Freq. of cases (gpII)     509                      54
>
> p-values                          0.02               0.010
>
> Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05.
>
> I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.
>
>  MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))
>
> How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated ..
>
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple sets of proportion tests

Allaisone 1
Thank you for clarifying this point but my main question was about how to modify my code to do the analysis correctly. The code I mentioned :-

MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))



Results in this error : 'x' and 'n' must have the same length in the prop.test(x,n).


How can I modify "x' or "n" arguments so the analysis gives me the desired output

shown in my previous post ?

________________________________
From: Thierry Onkelinx <[hidden email]>
Sent: 24 November 2017 21:06:39
To: Allaisone 1
Cc: [hidden email]
Subject: Re: [R] Multiple sets of proportion tests

Hi anonymous,

?prop.test states that it returns a list. And one of the element is
'p.value'.  str() on the output of prop.test() reveals that too. So
prop.test()$p.value or prop.test()["p.value"] should work.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
[hidden email]
Kliniekstraat 25, B-1070 Brussel
www.inbo.be<http://www.inbo.be>

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.

///////////////////////////////////////////////////////////////////////////////////////////



2017-11-24 12:09 GMT+01:00 Allaisone 1 <[hidden email]>:

>
> Hi all ,
>
>
> I have a dataframe  of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :-
>
>
>> Mydata
>
>
>                                       variable I      variable II    Variable III  ......... 200
>
> Freq.of cases (gp I)      6493               9375               5524
>
> Freq. of cases (gpII)     509                  462                 54
>
>
>
> The result I need for the first column can be given using this code :
>
>
>  MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
> for the second  column :-
> MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..
>
>
> I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :-
>
>
>                                       variable I        Variable III  .........
>
> Freq.of cases (gp I)      6493                   5524
>
> Freq. of cases (gpII)     509                      54
>
> p-values                          0.02               0.010
>
> Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05.
>
> I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.
>
>  MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))
>
> How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated ..
>
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple sets of proportion tests

David Winsemius

> On Nov 24, 2017, at 3:35 PM, Allaisone 1 <[hidden email]> wrote:
>
> Thank you for clarifying this point but my main question was about how to modify my code to do the analysis correctly.

You need to first clarify what your proposed statistical hypothesis might be. If you are doing prop.test on 300 columns you have a serious multiple comparisons issue in your analysis plan that you have not recognized. Removing the columns that "fail" a test set at nominal level of 0.05 is statistical malpractice.


> The code I mentioned :-
>
> MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))


The code as written appears to have the obvious error of using `Mydata` as an argument inside the prop.test function. Should almost certainly be `x` instead. (I suspect the length of the 'x'-argument to prop.test will be on the order of 200 and the length of n is 2, hence the error.)

It would also be ideal if you could post the output of dput(Mydata[,1:3] ).


> Results in this error : 'x' and 'n' must have the same length in the prop.test(x,n).
>
>
> How can I modify "x' or "n" arguments so the analysis gives me the desired output

You desperately need to read the help page for the function you are using. This need was pointed out to you, but it appears to me that you have ignored Thierry's advice. (Going back to your original example ... The x variable is supposed to be the number of success and the n variable is the number of trials. So in all instances n MUST be greater than or equal to x. Your data example is going to fail that requirement even after you correct the semantic error noted above.)

(And do learn to post with plain text.)
--
David.

>
> shown in my previous post ?
>
> ________________________________
> From: Thierry Onkelinx <[hidden email]>
> Sent: 24 November 2017 21:06:39
> To: Allaisone 1
> Cc: [hidden email]
> Subject: Re: [R] Multiple sets of proportion tests
>
> Hi anonymous,
>
> ?prop.test states that it returns a list. And one of the element is
> 'p.value'.  str() on the output of prop.test() reveals that too. So
> prop.test()$p.value or prop.test()["p.value"] should work.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> [hidden email]
> Kliniekstraat 25, B-1070 Brussel
> www.inbo.be<http://www.inbo.be>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data. ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
>
> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
> Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
> Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.
>
> ///////////////////////////////////////////////////////////////////////////////////////////
>
>
>
> 2017-11-24 12:09 GMT+01:00 Allaisone 1 <[hidden email]>:
>>
>> Hi all ,
>>
>>
>> I have a dataframe  of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :-
>>
>>
>>> Mydata
>>
>>
>>                                      variable I      variable II    Variable III  ......... 200
>>
>> Freq.of cases (gp I)      6493               9375               5524
>>
>> Freq. of cases (gpII)     509                  462                 54
>>
>>
>>
>> The result I need for the first column can be given using this code :
>>
>>
>> MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
>> for the second  column :-
>> MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..
>>
>>
>> I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :-
>>
>>
>>                                      variable I        Variable III  .........
>>
>> Freq.of cases (gp I)      6493                   5524
>>
>> Freq. of cases (gpII)     509                      54
>>
>> p-values                          0.02               0.010
>>
>> Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05.
>>
>> I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.
>>
>> MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))
>>
>> How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated ..
>>
>> Regards
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.