Retrieve 95% coverage of results from a hypergeometric distribution

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Retrieve 95% coverage of results from a hypergeometric distribution

jas4710
This post was updated on .
Suppose I have a urn containing 90 red and 10 black balls.
Now I wanna remove 3 from the urn. By the following codes:

m<-90;n<-10;k<-3;
x<-0:3
dhyper(x,m,n,k)

I can obtain the probability that 0,1,2,3 red balls will be removed.
 0.000742115 0.025046382 0.247680891 0.726530612

So >95% time, 2 to 3 red balls will be removed and the resultant composition will be changed to
87:10 or 88:9, the original percent of red balls will be changed from 90 to 89.69 to 90.72 then.

If now I have 50:50 and again to remove 3 balls, I will obtain the probability as:
0.1212 0.3788 0.3788 0.1212

To get the resultant range of red balls for >95% time, this time all the four cases have to consider and so the resultant change of red balls will become 48.45 to 51.54

So my problem is, is there any convenient built-in function that helps extract this 95% confidence interval-like data?


Reply | Threaded
Open this post in threaded view
|

Re: Retrieve hypergeometric results in large scale

Jeff Newmiller
Perhaps you should read

?dhyper

and if you have a hard time parsing that, then read

?Distributions

and then go back to

?dhyper
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

jas4710 <[hidden email]> wrote:

>
>I'm going to use
>
>dhyper(x, m, n, k)
>
>to get a 95% coverage. Let me use an example to explain my problem:
>
>Suppose I have a urn containing 90 red and 10 black balls.
>Now I wanna remove 3 from the urn. By the following codes:
>
>m<-90;n<-10;k<-3;
>x<-0:3
>dhyper(x,m,n,k)
>
>I can obtain the probability that 0,1,2,3 red balls will be removed.
> 0.000742115 0.025046382 0.247680891 0.726530612
>
>So >95% time, 2 to 3 red balls will be removed and the resultant
>composition
>will be changed to
>87:10 or 88:9, the original percent of red balls will be changed from
>90 to
>89.69 to 90.72 then.
>
>If now I have 50:50 and again to remove 3 balls, I will obtain the
>probability as:
>0.1212 0.3788 0.3788 0.1212
>
>To get the resultant range of red balls for >95% time, this time all
>the
>four cases have to consider and so the resultant change of red balls
>will
>become 48.45 to 51.54
>
>So my problem is, is there any convenient built-in function that helps
>extract this 95% confidence interval-like data?
>
>
>
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieve hypergeometric results in large scale

Bert Gunter
Homework? There's a no homework policy on this list.

-- Bert

On Mon, Oct 1, 2012 at 8:10 AM, Jeff Newmiller <[hidden email]> wrote:

> Perhaps you should read
>
> ?dhyper
>
> and if you have a hard time parsing that, then read
>
> ?Distributions
>
> and then go back to
>
> ?dhyper
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> jas4710 <[hidden email]> wrote:
>
>>
>>I'm going to use
>>
>>dhyper(x, m, n, k)
>>
>>to get a 95% coverage. Let me use an example to explain my problem:
>>
>>Suppose I have a urn containing 90 red and 10 black balls.
>>Now I wanna remove 3 from the urn. By the following codes:
>>
>>m<-90;n<-10;k<-3;
>>x<-0:3
>>dhyper(x,m,n,k)
>>
>>I can obtain the probability that 0,1,2,3 red balls will be removed.
>> 0.000742115 0.025046382 0.247680891 0.726530612
>>
>>So >95% time, 2 to 3 red balls will be removed and the resultant
>>composition
>>will be changed to
>>87:10 or 88:9, the original percent of red balls will be changed from
>>90 to
>>89.69 to 90.72 then.
>>
>>If now I have 50:50 and again to remove 3 balls, I will obtain the
>>probability as:
>>0.1212 0.3788 0.3788 0.1212
>>
>>To get the resultant range of red balls for >95% time, this time all
>>the
>>four cases have to consider and so the resultant change of red balls
>>will
>>become 48.45 to 51.54
>>
>>So my problem is, is there any convenient built-in function that helps
>>extract this 95% confidence interval-like data?
>>
>>
>>
>>
>>
>>
>>--
>>View this message in context:
>>http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html
>>Sent from the R help mailing list archive at Nabble.com.
>>
>>______________________________________________
>>[hidden email] mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieve hypergeometric results in large scale

jas4710
This post was updated on .
In reply to this post by Jeff Newmiller
Thanks Jeff~~~

In fact I do not know how to combine and extract vectors in R.

ans<-sort(dhyper(x, m, n, k),decreasing=TRUE)
rbind(ans,cumsum(ans)

will show the first point that exceeds 95% threshold. The problem is: information is lost
I can no longer identify where are the first few elements from. e.g. for 10 numbers, maybe they are from points 4,5,6,7; for 100 numbers, from points 45 to 68

How to append ID's to the data for later retrieval? rbind appears to do the job but not so exactly...
Reply | Threaded
Open this post in threaded view
|

Re: Retrieve hypergeometric results in large scale

Bert Gunter
In reply to this post by Bert Gunter
If you have not already done so, stop what you are doing and work
through the Introduction to R tutorial that ships with R (or other R
tutorial on the web that you may prefer).

The tutorials are written to help you climb the R learning curve much
more efficiently than the fooling around that you appear to be doing
now.

-- Bert

On Mon, Oct 1, 2012 at 8:31 AM, jas4710 <[hidden email]> wrote:

> Hi Bert. This is not a homework. If I can do some basic programming in R like
> Perl, then I'll have a better chance to accomplish this task but the matrix
> concept is not quickly comprehensible...
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644703.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieve hypergeometric results in large scale

Rui Barradas
In reply to this post by Jeff Newmiller
Hello,

See the differences.


k <- 3
p <- 0.95

m <- 90; n <- 10
dhyper(0:k, m, n, k) # Prob(X = x), with x = 0:k
phyper(0:k, m, n, k) # Prob(X <= x)
# quantiles, what you want
qhyper(p, m, n, k)   # inverse of phyper


m <- 50; n <- 50
dhyper(0:k, m, n, k)
phyper(0:k, m, n, k)
qhyper(p, m, n, k)

In your original post (op) you gave the output of dhyper. I would find
phyper much better, it shows that 0.95 is between the last two values of
phyper. Apparently you want its inverse, qhyper, but are assuming there
are such things as decimals. The hypergeometric is a discrete
distribution, so, from the help page,

"The quantile is defined as the smallest value /x/ such that /F(x) ? p/,
where /F/ is the distribution function. "

And, R is surely more competent at distribution functions than Perl.
Stick to it and read "An Introduction to R", file R-intro.pdf in the doc
directory of your R installation, Chapter 8. All computer languages have
some learning time and as far as statistics is concerned, learning R pays.

Hope this helps,

Rui Barradas
Em 01-10-2012 16:28, jas4710 escreveu:

> Thanks Jeff
> The documentation pages, if I haven't missed any crucial points, illustrate
> how to get probability and cumulative probability values.
>
> I can first retrieve the data structures and use Perl (I don't know how to
> use R...) to sort the derived ratios and sum the probability values until
> the cumulative probability exceeds 95%. Well, I just don't know whether such
> seemingly routine procedures have already been implemented...
>
> Thanks again!
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644701.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieve hypergeometric results in large scale

William Dunlap
In reply to this post by jas4710
order() is usually a lot more useful than sort(), since, as you noticed,
sort() drops information about where each element in its output came
from.

Your example was incomplete so I made up one which I
think is similar.
  > n <- 10 ; p <- 0.7 ; k <- 0:n ; d <- dbinom(k, n, p)
  > plot(k, d) # density of binomial over its domain
If you want the indices of the largest density values whose
cumulative sum is less than 0.95 you
  > ord <- order(d, decreasing=TRUE) # indices such that d[ord] is in decreasing order
  > big <- ord[cumsum(d[ord]) < 0.95]
  > data.frame(big, d=d[big], cumsum=cumsum(d[big]))
    big         d    cumsum
  1   8 0.2668279 0.2668279
  2   9 0.2334744 0.5003024
  3   7 0.2001209 0.7004233
  4  10 0.1210608 0.8214841
  5   6 0.1029193 0.9244035
 > points(cex=2, k[big], d[big])

If you want to include the index of the density value that puts
you just over 0.95 first find the complement of the desired indices
and use setdiff to compute its complement.  E.g.,
  > ord <- order(d)
  > little <- ord[cumsum(d[ord]) < 0.05]
  > big <- setdiff(seq_along(d), little) # difference of two sets of numbers
  > big
  [1]  5  6  7  8  9 10

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of jas4710
> Sent: Monday, October 01, 2012 9:59 AM
> To: [hidden email]
> Subject: Re: [R] Retrieve hypergeometric results in large scale
>
> Thanks Jeff~~~
>
> In fact I do not know how to combine and extract vectors in R.
>
> ans<-sort(dhyper(x, m, n, k),decreasing=TRUE)
> rbind(ans,cumsum(ans)
>
> will show the first point that exceeds 95% threshold. The problem is:
> *information is lost*
> I can no longer identify where are the first few elements from. e.g. for 10
> numbers, maybe they are from 4,5,6,7 or for 100 numbers, from 45 to 68
>
> So to append ID's to the data for later retrieval? rbind appears to do the
> job but not so exactly...
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-
> results-from-a-hypergeometric-distribution-tp4644683p4644715.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.