Hmisc summary.formula formats for binary and continuous variables

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hmisc summary.formula formats for binary and continuous variables

Kwok, Heemun

Hello,
I am using Hmisc summary.formula, latex and Sweave to produce tables for publication.  Is it possible to change the formats for binary and continuous variables?  I would prefer to show 35 (10%) and 1.5 (1.2-1.8) rather than 10% (35) and 1.2 / 1.5 / 1.8. Here is a simple example:

sex <- factor(sample(c("m","f"), 500, rep=TRUE))
age <- rnorm(500, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE))

s1 <- summary(~sex + age)
s2 <- summary(treatment ~ sex + age, method="reverse")
print(s1); print(s2)

Descriptive Statistics  (N=500)

+-------+-----------------+
|       |                 |
+-------+-----------------+
|sex : m|    46% (232)    |
+-------+-----------------+
|age    |47.22/50.31/53.37|
+-------+-----------------+



Descriptive Statistics by treatment

+-------+-----------------+-----------------+
|       |Drug             |Placebo          |
|       |(N=257)          |(N=243)          |
+-------+-----------------+-----------------+
|sex : m|    47% (122)    |    45% (110)    |
+-------+-----------------+-----------------+
|age    |47.35/50.00/52.68|46.78/50.92/53.97|
+-------+-----------------+-----------------+

Thanks,
Heemun


-------------------------------------------------
Heemun Kwok, M.D.
Research Fellow
Harbor-UCLA Department of Emergency Medicine
1000 West Carson Street, Box 21
Torrance, CA 90509-2910
office 310-222-3501, fax 310-212-6101

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hmisc summary.formula formats for binary and continuous variables

Joshua Wiley-2
I played around with this for awhile and did not get very far.  I did
not see any arguments in summary.formula or its print methods to
reorder (happy to be corrected).  Another approach I toyed with was to
create a custom function to pass to summary.formula() that would
itself create (something like) the desired output.

foo <- function(x) {
  n <- length(x)
  pct <- n/5
  c(FOO = paste(n, "(", round(pct, digits = 0), "%)",
    sep = ''))
}
> summary(treatment ~ sex + age, fun = foo, method = "response")
treatment    N=500

+-------+-----------+---+---------+
|       |           |N  |FOO      |
+-------+-----------+---+---------+
|sex    |f          |273|273(55%) |
|       |m          |227|227(45%) |
+-------+-----------+---+---------+
|age    |[36.8,46.7)|125|125(25%) |
|       |[46.7,50.0)|125|125(25%) |
|       |[50.0,53.3)|125|125(25%) |
|       |[53.3,67.5]|125|125(25%) |
+-------+-----------+---+---------+
|Overall|           |500|500(100%)|
+-------+-----------+---+---------+

However, it does not work with method = "reverse".  Also, this
approach would seem to require either defining a very flexible
function or multiple ones for each different situation you come
across.  Looking at print.summary.formula.reverse, the magic seems to
happen on lines 47-50:

            cs <- formatCats(stats[[i]], nam, tr, type[i], if
(length(x$group.freq))
                x$group.freq
            else x$n[i], npct, pctdig, exclude1, long, prtest,
                pdig = pdig, eps = eps)

which lead me to explore formatCats().  A small tweak in the order of
the paste() call on lines 25-33 (and creating a copy in of the altered
version plus print.summary.formula.reverse in the global environment),
got me:

print.summary.formula.reverse(summary(treatment ~ sex + age, method="reverse"))


Descriptive Statistics by treatment

+-------+--------------+--------------+
|       |Drug          |Placebo       |
|       |(N=262)       |(N=238)       |
+-------+--------------+--------------+
|sex : m|   (118) 45%  |   (114) 48%  |
+-------+--------------+--------------+
|age    |46.5/50.0/53.8|46.6/49.5/52.6|
+-------+--------------+--------------+

which has the percentage info on the right side, though I did not take
the time to get the parentheses moved over.  Still, it seems like
adding an argument that just flipped the order might not take that
much work/code.

Cheers,


Josh

(Though I cannot help but wonder if in response to "I want to cross
the street" I just said "we could start building a two-lane,
underground tunnel with...." and someone is probably going to come
along and point out the cross walk 10 feet down the street)

On Sat, Mar 26, 2011 at 11:09 PM, Kwok, Heemun <[hidden email]> wrote:

>
> Hello,
> I am using Hmisc summary.formula, latex and Sweave to produce tables for publication.  Is it possible to change the formats for binary and continuous variables?  I would prefer to show 35 (10%) and 1.5 (1.2-1.8) rather than 10% (35) and 1.2 / 1.5 / 1.8. Here is a simple example:
>
> sex <- factor(sample(c("m","f"), 500, rep=TRUE))
> age <- rnorm(500, 50, 5)
> treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE))
>
> s1 <- summary(~sex + age)
> s2 <- summary(treatment ~ sex + age, method="reverse")
> print(s1); print(s2)
>
> Descriptive Statistics  (N=500)
>
> +-------+-----------------+
> |       |                 |
> +-------+-----------------+
> |sex : m|    46% (232)    |
> +-------+-----------------+
> |age    |47.22/50.31/53.37|
> +-------+-----------------+
>
>
>
> Descriptive Statistics by treatment
>
> +-------+-----------------+-----------------+
> |       |Drug             |Placebo          |
> |       |(N=257)          |(N=243)          |
> +-------+-----------------+-----------------+
> |sex : m|    47% (122)    |    45% (110)    |
> +-------+-----------------+-----------------+
> |age    |47.35/50.00/52.68|46.78/50.92/53.97|
> +-------+-----------------+-----------------+
>
> Thanks,
> Heemun
>
>
> -------------------------------------------------
> Heemun Kwok, M.D.
> Research Fellow
> Harbor-UCLA Department of Emergency Medicine
> 1000 West Carson Street, Box 21
> Torrance, CA 90509-2910
> office 310-222-3501, fax 310-212-6101
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hmisc summary.formula formats for binary and continuous variables

Frank Harrell
In reply to this post by Kwok, Heemun
If by 35 (10%) you mean that 35 is the numerator, this is not such a good idea.  That's because it emphasizes something that is not a scientific quantity.  A scientific quantity is something that has a meaning outside the current sample.  The numerator is dependent on the denominator.

Regarding the other formatting issue, summary.formula with method='reverse' is not flexible enough to allow that.
Frank

Kwok, Heemun wrote
Hello,
I am using Hmisc summary.formula, latex and Sweave to produce tables for publication.  Is it possible to change the formats for binary and continuous variables?  I would prefer to show 35 (10%) and 1.5 (1.2-1.8) rather than 10% (35) and 1.2 / 1.5 / 1.8. Here is a simple example:

sex <- factor(sample(c("m","f"), 500, rep=TRUE))
age <- rnorm(500, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE))

s1 <- summary(~sex + age)
s2 <- summary(treatment ~ sex + age, method="reverse")
print(s1); print(s2)

Descriptive Statistics  (N=500)

+-------+-----------------+
|       |                 |
+-------+-----------------+
|sex : m|    46% (232)    |
+-------+-----------------+
|age    |47.22/50.31/53.37|
+-------+-----------------+



Descriptive Statistics by treatment

+-------+-----------------+-----------------+
|       |Drug             |Placebo          |
|       |(N=257)          |(N=243)          |
+-------+-----------------+-----------------+
|sex : m|    47% (122)    |    45% (110)    |
+-------+-----------------+-----------------+
|age    |47.35/50.00/52.68|46.78/50.92/53.97|
+-------+-----------------+-----------------+

Thanks,
Heemun


-------------------------------------------------
Heemun Kwok, M.D.
Research Fellow
Harbor-UCLA Department of Emergency Medicine
1000 West Carson Street, Box 21
Torrance, CA 90509-2910
office 310-222-3501, fax 310-212-6101

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University