This post was updated on .
Suppose I have a urn containing 90 red and 10 black balls.
Now I wanna remove 3 from the urn. By the following codes: m<-90;n<-10;k<-3; x<-0:3 dhyper(x,m,n,k) I can obtain the probability that 0,1,2,3 red balls will be removed. 0.000742115 0.025046382 0.247680891 0.726530612 So >95% time, 2 to 3 red balls will be removed and the resultant composition will be changed to 87:10 or 88:9, the original percent of red balls will be changed from 90 to 89.69 to 90.72 then. If now I have 50:50 and again to remove 3 balls, I will obtain the probability as: 0.1212 0.3788 0.3788 0.1212 To get the resultant range of red balls for >95% time, this time all the four cases have to consider and so the resultant change of red balls will become 48.45 to 51.54 So my problem is, is there any convenient built-in function that helps extract this 95% confidence interval-like data? |
Perhaps you should read
?dhyper and if you have a hard time parsing that, then read ?Distributions and then go back to ?dhyper --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. jas4710 <[hidden email]> wrote: > >I'm going to use > >dhyper(x, m, n, k) > >to get a 95% coverage. Let me use an example to explain my problem: > >Suppose I have a urn containing 90 red and 10 black balls. >Now I wanna remove 3 from the urn. By the following codes: > >m<-90;n<-10;k<-3; >x<-0:3 >dhyper(x,m,n,k) > >I can obtain the probability that 0,1,2,3 red balls will be removed. > 0.000742115 0.025046382 0.247680891 0.726530612 > >So >95% time, 2 to 3 red balls will be removed and the resultant >composition >will be changed to >87:10 or 88:9, the original percent of red balls will be changed from >90 to >89.69 to 90.72 then. > >If now I have 50:50 and again to remove 3 balls, I will obtain the >probability as: >0.1212 0.3788 0.3788 0.1212 > >To get the resultant range of red balls for >95% time, this time all >the >four cases have to consider and so the resultant change of red balls >will >become 48.45 to 51.54 > >So my problem is, is there any convenient built-in function that helps >extract this 95% confidence interval-like data? > > > > > > >-- >View this message in context: >http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Homework? There's a no homework policy on this list.
-- Bert On Mon, Oct 1, 2012 at 8:10 AM, Jeff Newmiller <[hidden email]> wrote: > Perhaps you should read > > ?dhyper > > and if you have a hard time parsing that, then read > > ?Distributions > > and then go back to > > ?dhyper > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > jas4710 <[hidden email]> wrote: > >> >>I'm going to use >> >>dhyper(x, m, n, k) >> >>to get a 95% coverage. Let me use an example to explain my problem: >> >>Suppose I have a urn containing 90 red and 10 black balls. >>Now I wanna remove 3 from the urn. By the following codes: >> >>m<-90;n<-10;k<-3; >>x<-0:3 >>dhyper(x,m,n,k) >> >>I can obtain the probability that 0,1,2,3 red balls will be removed. >> 0.000742115 0.025046382 0.247680891 0.726530612 >> >>So >95% time, 2 to 3 red balls will be removed and the resultant >>composition >>will be changed to >>87:10 or 88:9, the original percent of red balls will be changed from >>90 to >>89.69 to 90.72 then. >> >>If now I have 50:50 and again to remove 3 balls, I will obtain the >>probability as: >>0.1212 0.3788 0.3788 0.1212 >> >>To get the resultant range of red balls for >95% time, this time all >>the >>four cases have to consider and so the resultant change of red balls >>will >>become 48.45 to 51.54 >> >>So my problem is, is there any convenient built-in function that helps >>extract this 95% confidence interval-like data? >> >> >> >> >> >> >>-- >>View this message in context: >>http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html >>Sent from the R help mailing list archive at Nabble.com. >> >>______________________________________________ >>[hidden email] mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
This post was updated on .
In reply to this post by Jeff Newmiller
Thanks Jeff~~~
In fact I do not know how to combine and extract vectors in R. ans<-sort(dhyper(x, m, n, k),decreasing=TRUE) rbind(ans,cumsum(ans) will show the first point that exceeds 95% threshold. The problem is: information is lost I can no longer identify where are the first few elements from. e.g. for 10 numbers, maybe they are from points 4,5,6,7; for 100 numbers, from points 45 to 68 How to append ID's to the data for later retrieval? rbind appears to do the job but not so exactly... |
In reply to this post by Bert Gunter
If you have not already done so, stop what you are doing and work
through the Introduction to R tutorial that ships with R (or other R tutorial on the web that you may prefer). The tutorials are written to help you climb the R learning curve much more efficiently than the fooling around that you appear to be doing now. -- Bert On Mon, Oct 1, 2012 at 8:31 AM, jas4710 <[hidden email]> wrote: > Hi Bert. This is not a homework. If I can do some basic programming in R like > Perl, then I'll have a better chance to accomplish this task but the matrix > concept is not quickly comprehensible... > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644703.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Jeff Newmiller
Hello,
See the differences. k <- 3 p <- 0.95 m <- 90; n <- 10 dhyper(0:k, m, n, k) # Prob(X = x), with x = 0:k phyper(0:k, m, n, k) # Prob(X <= x) # quantiles, what you want qhyper(p, m, n, k) # inverse of phyper m <- 50; n <- 50 dhyper(0:k, m, n, k) phyper(0:k, m, n, k) qhyper(p, m, n, k) In your original post (op) you gave the output of dhyper. I would find phyper much better, it shows that 0.95 is between the last two values of phyper. Apparently you want its inverse, qhyper, but are assuming there are such things as decimals. The hypergeometric is a discrete distribution, so, from the help page, "The quantile is defined as the smallest value /x/ such that /F(x) ? p/, where /F/ is the distribution function. " And, R is surely more competent at distribution functions than Perl. Stick to it and read "An Introduction to R", file R-intro.pdf in the doc directory of your R installation, Chapter 8. All computer languages have some learning time and as far as statistics is concerned, learning R pays. Hope this helps, Rui Barradas Em 01-10-2012 16:28, jas4710 escreveu: > Thanks Jeff > The documentation pages, if I haven't missed any crucial points, illustrate > how to get probability and cumulative probability values. > > I can first retrieve the data structures and use Perl (I don't know how to > use R...) to sort the derived ratios and sum the probability values until > the cumulative probability exceeds 95%. Well, I just don't know whether such > seemingly routine procedures have already been implemented... > > Thanks again! > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644701.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by jas4710
order() is usually a lot more useful than sort(), since, as you noticed,
sort() drops information about where each element in its output came from. Your example was incomplete so I made up one which I think is similar. > n <- 10 ; p <- 0.7 ; k <- 0:n ; d <- dbinom(k, n, p) > plot(k, d) # density of binomial over its domain If you want the indices of the largest density values whose cumulative sum is less than 0.95 you > ord <- order(d, decreasing=TRUE) # indices such that d[ord] is in decreasing order > big <- ord[cumsum(d[ord]) < 0.95] > data.frame(big, d=d[big], cumsum=cumsum(d[big])) big d cumsum 1 8 0.2668279 0.2668279 2 9 0.2334744 0.5003024 3 7 0.2001209 0.7004233 4 10 0.1210608 0.8214841 5 6 0.1029193 0.9244035 > points(cex=2, k[big], d[big]) If you want to include the index of the density value that puts you just over 0.95 first find the complement of the desired indices and use setdiff to compute its complement. E.g., > ord <- order(d) > little <- ord[cumsum(d[ord]) < 0.05] > big <- setdiff(seq_along(d), little) # difference of two sets of numbers > big [1] 5 6 7 8 9 10 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of jas4710 > Sent: Monday, October 01, 2012 9:59 AM > To: [hidden email] > Subject: Re: [R] Retrieve hypergeometric results in large scale > > Thanks Jeff~~~ > > In fact I do not know how to combine and extract vectors in R. > > ans<-sort(dhyper(x, m, n, k),decreasing=TRUE) > rbind(ans,cumsum(ans) > > will show the first point that exceeds 95% threshold. The problem is: > *information is lost* > I can no longer identify where are the first few elements from. e.g. for 10 > numbers, maybe they are from 4,5,6,7 or for 100 numbers, from 45 to 68 > > So to append ID's to the data for later retrieval? rbind appears to do the > job but not so exactly... > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of- > results-from-a-hypergeometric-distribution-tp4644683p4644715.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |