# quantile() names

 Classic List Threaded
10 messages
Reply | Threaded
Open this post in threaded view
|

## quantile() names

 All, Consider the code below options(digits=2) x <- 1:1000 quantile(x, .975) The value returned is 975 (the 97.5th percentile), but the name has been shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%". Best, Ed         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 Hi Edgar, I certainly don't think quantile(x, .975) should return 980, as that is a completely wrong answer. I do agree that it seems like the name is a bit offputting. I'm not sure how deep in the machinery you'd have to go to get digits to no effect on the names (I don't have time to dig in right this second). On the other hand, though, if we're going to make the names not respect digits entirely, what do we do when someone does quantile(x, 1/3)? That'd be a bad time had by all without digits coming to the rescue, i think. Best, ~G On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar C. <[hidden email]> wrote: > All, > > Consider the code below > > options(digits=2) > x <- 1:1000 > quantile(x, .975) > > The value returned is 975 (the 97.5th percentile), but the name has been > shortened to "98%" due to the digits option. Is this intended? I would have > expected the name to also be "97.5%" here. Alternatively, the returned > value might be 980 in order to match the name of "98%". > > Best, > Ed > > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel>         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 In reply to this post by Ed Merkle-2 The "value" is *not* 975. It's 975.025. The results that you're observing, are merely the byproduct of formatting. Maybe, you should try:     quantile (x, .975, type=4) Which perhaps, using default options, produces the result you're expecting? On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. <[hidden email]> wrote: > > All, > > Consider the code below > > options(digits=2) > x <- 1:1000 > quantile(x, .975) > > The value returned is 975 (the 97.5th percentile), but the name has been shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%". > > Best, > Ed > > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 Question: is the part that Ed Merkle is asking about the change in the expected NAME associated with the output? He changed a sort of global parameter affecting how many digits he wants any compliant function to display. So when he asked for a named vector, the chosen name was based on his request and limited when possible to two digits. x <- 1:1000 temp <- quantile(x, .975) If you examine temp, you will see it is a vector containing (as it happens) a single numeric item (as it happens a double) with the value of 975. But the name associated is a character string with a "%" appended as shown below: str(temp)         Named num 975         - attr(*, "names")= chr "98%" If you do not want a name attached to the vector, add an option: quantile(x, .975, names=FALSE) If you want the name to be longer or different, you can do that after. names(temp)         [1] "98%" So change it yourself: temp         98%         975  names(temp) <- paste(round(temp, 3), "%", sep="") temp         975.025%         975 The above is for illustration with tabs inserted to show what is in the output. You probably do not need a name for your purposes and if you ask for multiple quantiles you might need to adjust the above. Of course if you wanted another non-default "type" of calculation, what Abby offered may also apply. -----Original Message----- From: R-devel <[hidden email]> On Behalf Of Abby Spurdle Sent: Monday, December 14, 2020 4:48 PM To: Merkle, Edgar C. <[hidden email]> Cc: [hidden email] Subject: Re: [Rd] quantile() names The "value" is *not* 975. It's 975.025. The results that you're observing, are merely the byproduct of formatting. Maybe, you should try:     quantile (x, .975, type=4) Which perhaps, using default options, produces the result you're expecting? On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. <[hidden email]> wrote: > > All, > > Consider the code below > > options(digits=2) > x <- 1:1000 > quantile(x, .975) > > The value returned is 975 (the 97.5th percentile), but the name has been shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%". > > Best, > Ed > > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-develScanned by McAfee and confirmed virus-free. Find out more here: https://bit.ly/2zCJMrO______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 Avi, On Mon, 2020-12-14 at 18:00 -0500, Avi Gross wrote: Question: is the part that Ed Merkle is asking about the change in the expected NAME associated with the output? You are right: the question is about the name changing to "98%", when the returned object is the 97.5th percentile. It is indeed easy to set names=FALSE here. But there can still be a problem when the user sets options(digits=2), then a package calls quantile(x, .975) and expects an object that has a name of "97.5%". I think the easiest solution is to tell the user not to set options(digits=2), but it also seems like the "98%" name is not the best result. But Gabriel is correct that we would still need to consider how to handle something like quantile(x, 1/3). Maybe it is not a big enough issue to warrant changing anything. Ed He changed a sort of global parameter affecting how many digits he wants any compliant function to display. So when he asked for a named vector, the chosen name was based on his request and limited when possible to two digits. x <- 1:1000 temp <- quantile(x, .975) If you examine temp, you will see it is a vector containing (as it happens) a single numeric item (as it happens a double) with the value of 975. But the name associated is a character string with a "%" appended as shown below: str(temp)         Named num 975         - attr(*, "names")= chr "98%" If you do not want a name attached to the vector, add an option: quantile(x, .975, names=FALSE) If you want the name to be longer or different, you can do that after. names(temp)         [1] "98%" So change it yourself: temp         98%         975  names(temp) <- paste(round(temp, 3), "%", sep="") temp         975.025%         975 The above is for illustration with tabs inserted to show what is in the output. You probably do not need a name for your purposes and if you ask for multiple quantiles you might need to adjust the above. Of course if you wanted another non-default "type" of calculation, what Abby offered may also apply. -----Original Message----- From: R-devel <[hidden email]> On Behalf Of Abby Spurdle Sent: Monday, December 14, 2020 4:48 PM To: Merkle, Edgar C. <[hidden email]> Cc: [hidden email] Subject: Re: [Rd] quantile() names The "value" is *not* 975. It's 975.025. The results that you're observing, are merely the byproduct of formatting. Maybe, you should try:     quantile (x, .975, type=4) Which perhaps, using default options, produces the result you're expecting? On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. <[hidden email]> wrote: All, Consider the code below options(digits=2) x <- 1:1000 quantile(x, .975) The value returned is 975 (the 97.5th percentile), but the name has been shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%". Best, Ed         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-develScanned by McAfee and confirmed virus-free. Find out more here: https://bit.ly/2zCJMrO        [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 In reply to this post by R devel mailing list Thank you for explaining, Ed. It makes looking at the issue raised much easier.   As I understand it, you are not really asking about some thing fully in your control. You are asking how any function like quantile() should behave when a user has altered something global or at least global within a package, such as this:   > quantile(x, c(.95, .975, .99000))     95%   97.5%     99% 950.050 975.025 990.010 > dig.it <- options(digits=2) > dig.it \$digits [1] 7   I did it that way so I could re-set it!   I looked to see if quantile() is written in base R and it seems to be a generic that I would have to hunt down so I stopped for now.   Here is what I get BEFORE changing the option for digits:   > x <- 1:1000 > quantile(x, probs=c(.95, .975, .99000))     95%   97.5%     99% 950.050 975.025 990.010   Note I used the fuller version asking for multiple thresholds so I could see what happened if I used more zeroes. Note that trailing zeroes are not shown in the name of the third element of the vector. So I can suggest the program is not getting the unevaluated text to use but is using the value of the vector. Now I set the number of digits to 2, globally, and repeat:   > quantile(x, probs=c(.95, .975, .99000)) 95% 98% 99% 950 975 990   I notice several things as others have pointed out. There seems to be a truncation in the values shown so nothing is now shown past the decimal point. But maybe not as adding an argument of 1/3 gives 334 rather than 333.   > quantile(x, probs=c(.95, .975, .99000, 1/3)) 95% 98% 99% 33% 950 975 990 334   Now the names are apparently rounded as discussed, with the percent symbol appended.   So what would you propose? Within the function there seem to be two parts dealing with displaying the result and it looks like the original number loses precision as handing the above to round(., 7) shows no change. So are you asking it to parse the name different than the value even though there is a global variable set specifying the digits they want?   If it really mattered, I suggest one solution may be to allow one or two additional arguments to a function like quantile like:   quantile(x, ., digits=5, names=c("95%", "97.5%", .) )   So if a user really wanted to live in their own world of fewer digits they could specify what labels they wanted and could ask for "high", "Higher" and "HIGHEST" or whatever makes them happy. But, as noted, any user wanting that level of control can change the labels afterward. But you are correct in some package using quantile() and calling out the results individually by name will not be able to consistently and reliably use that technique. But can they use it now? I tried using variations on \$.95% such as this and they fail such as for quantile(x, c(.95, .975, .99000))\$`95%` and the same for using [] notation. These identifiers were not chosen to be used this way. You can get them positionally:   > quantile(x, c(.95, .975, .99000))[1] 95% 950 > quantile(x, c(.95, .975, .99000))[2] 98% 975   If you convert the darn out put from a vector to a list, though, it works, using grave accents:   > as.list(quantile(x, c(.95, .975, .99000)))\$`98%` [1] 975   So, I doubt many would play games like me to find some way to select by name. Odds are they might use position or get one at a time. The name is more for humans to read, I would think.     Just my two cents. When an instruction impacts multiple places, it can be ambiguous and changing global variables is, well, global.   Which raise another question here is why did the people making choices choose silly names that are all numeric with maybe a decimal point and ending in a character like % that has other uses? A cousin of quantile is fivenum() that returns Tukey's five number summary as useful in making boxplots:   > fivenum(x) [1]    1  250  500  750 1000   This returned a vector with no names. You can only index it by number, albeit the columns are always in a fixed order and you know what to expect in each. Another cousin returns a more complex structure   > boxplot.stats(x) \$stats [1]    1  250  500  750 1000   \$n [1] 1000   \$conf [1] 476 525   \$out integer(0)   > boxplot.stats(x)\$stats [1]    1  250  500  750 1000   That is a list of items but the first item is a vector with no names that is the same as for fivenum().   Would it make more sense for the column names of the output looked more like:   > temp <- quantile(x, c(.95, .975, .99000)) > names(temp) <- c("perc95", "perc98", "perc99") > temp perc95 perc98 perc99    950    975    990   So you could do this to a vector:   > temp["perc98"] perc98    975 Or do even more to a list:   > as.list(temp)\$perc98 [1] 975   My feeling is some things are not really bugs but more like FEATURES you normally live with and if it matters, work around it. I had trouble a while ago with a laavan() case I ran where very rarely the program simply broke. When in a big loop running hundreds of thousands of times, that messed up as the program as a whole just stopped. So, I wrapped it and other parts in variations of try() to bulletproof it and lived with it. Sure, it slowed down a bit but it ran for hours or days so why fight it to find a subtle bug in something I could not change. Your question is valid but my guess is few use it in a way that will get much notice.       From: Ed Merkle <[hidden email]> Sent: Tuesday, December 15, 2020 11:33 AM To: Avi Gross <[hidden email]>; [hidden email] Subject: Re: [Rd] quantile() names   Avi,   On Mon, 2020-12-14 at 18:00 -0500, Avi Gross wrote: Question: is the part that Ed Merkle is asking about the change in the expected NAME associated with the output?   The question is indeed about the name changing to "98%", when the returned object is the 97.5th percentile.   It is indeed easy to set names=FALSE here. But there can still be a problem when the user sets options(digits=2), then a package calls quantile(x, .975) and expects an object that has a name of "97.5%".   I think the easiest solution is to tell the user not to set options(digits=2), but it also seems like the "98%" name is not the best result. But Gabriel is correct that we would still need to consider how to handle something like quantile(x, 1/3). Maybe it is not a big enough issue to warrant changing anything.   Ed       _____     Scanned by McAfee  and confirmed virus-free.           [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 In reply to this post by Gabriel Becker-2 >>>>> Gabriel Becker >>>>>     on Mon, 14 Dec 2020 13:23:00 -0800 writes:     > Hi Edgar, I certainly don't think quantile(x, .975) should     > return 980, as that is a completely wrong answer.     > I do agree that it seems like the name is a bit     > offputting. I'm not sure how deep in the machinery you'd     > have to go to get digits to no effect on the names (I     > don't have time to dig in right this second).     > On the other hand, though, if we're going to make the     > names not respect digits entirely, what do we do when     > someone does quantile(x, 1/3)? That'd be a bad time had by     > all without digits coming to the rescue, i think.     > Best, ~G and now we read more replies on this topic without anyone looking at the pure R source code which is pretty simple and easy. Instead, people do experiments and take time to muse about their findings.. Honestly, I'm disappointed: I've always thought that if you *write* on R-devel, you should be able to figure out a few things yourself before that.. It's not rocket science to see/know that you need to quickly look at the quantile.default() method function and then to note that it's  format_perc(.) which is used to create the names. Almost surely, I've been a bit envolved in creating parts of this and probably am responsible for the current default behavior.            ....            ....(sounds of digging) ...            ....            ....            ....            ....            ....            .... --> Yes: ------------------------------------------------------------------------ r837 | maechler | 1998-03-05 12:20:37 +0100 (Thu, 05. Mar 1998) | 2 Zeilen Geänderte Pfade:    M /trunk/src/library/base/R/quantile    M /trunk/src/library/base/man/quantile.Rd fixed names(.) construction ------------------------------------------------------------------------ With this diff  (my 'svn-diffB -c837 quantile') : Index: quantile =================================================================== 21c21,23 < names(qs) <- paste(round(100 * probs), "%", sep = "") --- > names(qs) <- paste(formatC(100 * probs, format= "fg", wid=1, >   dig= max(2,.Options\$digits)), >   "%", sep = "") ----------------------------------------------------------------- so this was before this was modularized into the format_perc() utility and quite a while before R 1.0.0 .... Now, 22.8 years later, I do think that indeed it was not necessarily the best idea to make the names() construction depend  on the 'digits' option entirely and just protect it by using at least 2 digits. What I think is better is to 1) provide an optional argument   'digits = 7'    back compatible w/ default getOption("digits") 2) when used, check that it is at least '1' But then some scripts / examples of some people *will* change ..., e.g., because they preferred to have a global setting of digits=5 so I'm guessing it may make more people unhappy than other people happy if we change this now, after close to 23 years  .. ?? Martin -- Martin Maechler ETH Zurich  and  R Core team     > On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar     > C. <[hidden email]> wrote:     >> All,     >>     >> Consider the code below     >>     >> options(digits=2)     >>  x <- 1:1000     >> quantile(x, .975)     >> The value returned is 975 (the 97.5th percentile), but     >> the name has been shortened to "98%" due to the digits     >> option. Is this intended? I would have expected the name     >> to also be "97.5%" here. Alternatively, the returned     >> value might be 980 in order to match the name of "98%".     >>     >> Best, Ed     >> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 CITED TEXT CONTAINS EXCERPTS ONLY > and now we read more replies on this topic without anyone looking at > the pure R source code which is pretty simple and easy. > Instead, people do experiments and take time to muse about their findings.. > Honestly, I'm disappointed: I've always thought that if you > *write* on R-devel, you should be able to figure out a few > things yourself before that.. That's a bit unfair. Some of us have written packages, containing functions for computing quantile names:      probhat::ntile.names (,100) > 1) provide an optional argument   'digits = 7' >    back compatible w/ default getOption("digits") I'm not sure I've got this right. Are you suggesting that by default, names should have 7 digits? > so I'm guessing it may make more people unhappy than other > people happy if we change this now, after close to 23 years  .. ?? I would probably be in the less enthusiastic group. I take the view that quantile naming is mainly a convenience, for summary-style output. And on that basis, I would say the current behaviour is about right. Anyone looking for high precision, should probably compute their own quantile names. Also, expanding on an earlier point. The value was 975.025, so a label of "97.5%" could still cause problems. Increasing the precision doesn't necessarily fix this sort of problem. But rather, increases the complexity of the output, beyond what "97.5%" of users would ever want... B. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 Sorry, I need to change my last post. I looked at this a bit more, and realized that increasing the (max) number of (name) digits is only relevant in some cases. For people computing quartiles and deciles, this shouldn't make any difference. Therefore, should still be convenient for the purposes of summary-style output. On Thu, Dec 17, 2020 at 11:48 AM Abby Spurdle <[hidden email]> wrote: > > CITED TEXT CONTAINS EXCERPTS ONLY > > > and now we read more replies on this topic without anyone looking at > > the pure R source code which is pretty simple and easy. > > Instead, people do experiments and take time to muse about their findings.. > > Honestly, I'm disappointed: I've always thought that if you > > *write* on R-devel, you should be able to figure out a few > > things yourself before that.. > > That's a bit unfair. > Some of us have written packages, containing functions for computing > quantile names: > >      probhat::ntile.names (,100) > > > > 1) provide an optional argument   'digits = 7' > >    back compatible w/ default getOption("digits") > > I'm not sure I've got this right. > Are you suggesting that by default, names should have 7 digits? > > > > so I'm guessing it may make more people unhappy than other > > people happy if we change this now, after close to 23 years  .. ?? > > I would probably be in the less enthusiastic group. > I take the view that quantile naming is mainly a convenience, for > summary-style output. > > And on that basis, I would say the current behaviour is about right. > Anyone looking for high precision, should probably compute their own > quantile names. > > > Also, expanding on an earlier point. > The value was 975.025, so a label of "97.5%" could still cause problems. > Increasing the precision doesn't necessarily fix this sort of problem. > But rather, increases the complexity of the output, beyond what > "97.5%" of users would ever want... > > > B. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

## Re: quantile() names

 In reply to this post by Martin Maechler Getting back to this after 3 months : >>>>> Martin Maechler >>>>>     on Wed, 16 Dec 2020 11:13:32 +0100 writes: >>>>> Gabriel Becker >>>>>     on Mon, 14 Dec 2020 13:23:00 -0800 writes:     >> Hi Edgar, I certainly don't think quantile(x, .975) should     >> return 980, as that is a completely wrong answer.     >> I do agree that it seems like the name is a bit     >> offputting. I'm not sure how deep in the machinery you'd     >> have to go to get digits to no effect on the names (I     >> don't have time to dig in right this second).     >> On the other hand, though, if we're going to make the     >> names not respect digits entirely, what do we do when     >> someone does quantile(x, 1/3)? That'd be a bad time had by     >> all without digits coming to the rescue, i think.     >> Best, ~G     > and now we read more replies on this topic without anyone looking at     > the pure R source code which is pretty simple and easy.     > Instead, people do experiments and take time to muse about their findings..     > Honestly, I'm disappointed: I've always thought that if you     > *write* on R-devel, you should be able to figure out a few     > things yourself before that..     > It's not rocket science to see/know that you need to quickly look at     > the quantile.default() method function and then to note     > that it's  format_perc(.) which is used to create the names.     > Almost surely, I've been a bit envolved in creating parts of     > this and probably am responsible for the current default     > behavior.     > ....     > ....(sounds of digging) ...     > ....     > ....     > ....     > ....     > ....     > ....     --> Yes:     > ------------------------------------------------------------------------     > r837 | maechler | 1998-03-05 12:20:37 +0100 (Thu, 05. Mar 1998) | 2 Zeilen     > Geänderte Pfade:     > M /trunk/src/library/base/R/quantile     > M /trunk/src/library/base/man/quantile.Rd     > fixed names(.) construction     > ------------------------------------------------------------------------     > With this diff  (my 'svn-diffB -c837 quantile') :     > Index: quantile     > ===================================================================     > 21c21,23     > < names(qs) <- paste(round(100 * probs), "%", sep = "")     > ---     >>        names(qs) <- paste(formatC(100 * probs, format= "fg", wid=1,     >>                          dig= max(2,.Options\$digits)),     >> "                        %", sep = "")     > -----------------------------------------------------------------     > so this was before this was modularized into the format_perc()     > utility and quite a while before R 1.0.0 ....     > Now, 22.8 years later, I do think that indeed it was not     > necessarily the best idea to make the names() construction depend  on the     > 'digits' option entirely and just protect it by using at least 2 digits.     > What I think is better is to     > 1) provide an optional argument   'digits = 7'     > back compatible w/ default getOption("digits")     > 2) when used, check that it is at least '1'     > But then some scripts / examples of some people *will* change     > ..., e.g., because they preferred to have a global setting of digits=5     > so I'm guessing it may make more people unhappy than other     > people happy if we change this now, after close to 23 years  .. ??     > Martin I had more thoughts about this, and noticed that not one example or test in base R  plus Recommended packages was changed, so I've now committed the above change. NEWS entry     • The names of quantile()'s result no longer depend on the global       getOption("digits"), but quantile() gets a new optional argument       digits = 7 instead. Martin -- Martin Maechler ETH Zurich  and  R Core team     >> On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar     >> C. <[hidden email]> wrote:     >>> All,     >>>     >>> Consider the code below     >>>     >>> options(digits=2)     >>> x <- 1:1000     >>> quantile(x, .975)     >>> The value returned is 975 (the 97.5th percentile), but     >>> the name has been shortened to "98%" due to the digits     >>> option. Is this intended? I would have expected the name     >>> to also be "97.5%" here. Alternatively, the returned     >>> value might be 980 in order to match the name of "98%".     >>>     >>> Best, Ed     >>> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel