by function with sum does not give what is expected from by function with print

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

by function with sum does not give what is expected from by function with print

Sorkin, John
Colleagues,
 
The by function in the R program below is not giving me the sums
I expect to see, viz.,
382+170=552
4730+170=4900
5+6=11
199+25=224
###################################################
#full R program:
mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
sex=(rep(c(1,1,0,0),2)),
status=rep(c(1,0),2),
values=c(382,4730,5,199,170,497,6,25))
mydata
by(mydata,list(mydata$sex,mydata$status),sum)
by(mydata,list(mydata$sex,mydata$status),print)
###################################################

More complete explanation of my question
 
I have created a simple dataframe having three factors:
 mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
 sex=(rep(c(1,1,0,0),2)),
 status=rep(c(1,0),2),
 values=c(382,4730,5,199,170,497,6,25))
 
 > mydata
  covid sex status values
1     0   1      1    382
2     0   1      0   4730
3     0   0      1      5
4     0   0      0    199
5     1   1      1    170
6     1   1      0    497
7     1   0      1      6
8     1   0      0     25
 
When I use the by function with a sum as an argument, I don’t
get the sums that I would expect to
receive based either on the listing of the dataframe above,
or from using by with print as an argument:
 
> by(mydata,list(mydata$sex,mydata$status),sum)
: 0
: 0
[1] 225
-------------------------------------------------------------------------------
: 1
: 0
[1] 5230
-------------------------------------------------------------------------------
: 0
: 1
[1] 14
-------------------------------------------------------------------------------
: 1
: 1
[1] 557
 
I expected to see the following sums:
382+170=552
4730+170=4900
5+6=11
199+25=224
Which as can be seen by the output above, I am not getting.
 
Using print as an argument to the by function, I get the values
grouped as I would expect, but for some reason I get a double
printing of the values!
 
> by(mydata,list(mydata$sex,mydata$status),print)
  covid sex status values
4     0   0      0    199
8     1   0      0     25
  covid sex status values
2     0   1      0   4730
6     1   1      0    497
  covid sex status values
3     0   0      1      5
7     1   0      1      6
  covid sex status values
1     0   1      1    382
5     1   1      1    170
: 0
: 0
  covid sex status values
4     0   0      0    199
8     1   0      0     25
-------------------------------------------------------------------------------
: 1
: 0
  covid sex status values
2     0   1      0   4730
6     1   1      0    497
-------------------------------------------------------------------------------
: 0
: 1
  covid sex status values
3     0   0      1      5
7     1   0      1      6
-------------------------------------------------------------------------------
: 1
: 1
  covid sex status values
1     0   1      1    382
5     1   1      1    170
 
What am I doing wrong, or what don’t I understand
About the by function?
 
Thank you
John
 
 

















John David Sorkin M.D., Ph.D.

Professor of Medicine

Chief, Biostatistics and Informatics

University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine

Baltimore VA Medical Center

10 North Greene Street

GRECC (BT/18/GR)

Baltimore, MD 21201-1524

(Phone) 410-605-7119

(Fax) 410-605-7913 (Please call phone number above prior to faxing)



______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: by function with sum does not give what is expected from by function with print

Bert Gunter-2
by() chooses **data frame** subsets -- sum() is acting on these frames,
adding up everything in them.
Try this instead:

> by(mydata,list(mydata$sex,mydata$status),function(x)sum(x$values))
: 0
: 0
[1] 224
-----------------------------------------------------------
: 1
: 0
[1] 5227
-----------------------------------------------------------
: 0
: 1
[1] 11
-----------------------------------------------------------
: 1
: 1
[1] 552

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jul 23, 2020 at 3:25 PM Sorkin, John <[hidden email]>
wrote:

> Colleagues,
>
> The by function in the R program below is not giving me the sums
> I expect to see, viz.,
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> ###################################################
> #full R program:
> mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> sex=(rep(c(1,1,0,0),2)),
> status=rep(c(1,0),2),
> values=c(382,4730,5,199,170,497,6,25))
> mydata
> by(mydata,list(mydata$sex,mydata$status),sum)
> by(mydata,list(mydata$sex,mydata$status),print)
> ###################################################
>
> More complete explanation of my question
>
> I have created a simple dataframe having three factors:
>  mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
>  sex=(rep(c(1,1,0,0),2)),
>  status=rep(c(1,0),2),
>  values=c(382,4730,5,199,170,497,6,25))
>
>  > mydata
>   covid sex status values
> 1     0   1      1    382
> 2     0   1      0   4730
> 3     0   0      1      5
> 4     0   0      0    199
> 5     1   1      1    170
> 6     1   1      0    497
> 7     1   0      1      6
> 8     1   0      0     25
>
> When I use the by function with a sum as an argument, I don’t
> get the sums that I would expect to
> receive based either on the listing of the dataframe above,
> or from using by with print as an argument:
>
> > by(mydata,list(mydata$sex,mydata$status),sum)
> : 0
> : 0
> [1] 225
> -------------------------------------------------------------------------------
>
> : 1
> : 0
> [1] 5230
> -------------------------------------------------------------------------------
>
> : 0
> : 1
> [1] 14
> -------------------------------------------------------------------------------
>
> : 1
> : 1
> [1] 557
>
> I expected to see the following sums:
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> Which as can be seen by the output above, I am not getting.
>
> Using print as an argument to the by function, I get the values
> grouped as I would expect, but for some reason I get a double
> printing of the values!
>
> > by(mydata,list(mydata$sex,mydata$status),print)
>   covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
>   covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
>   covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
>   covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
> : 0
> : 0
>   covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
> -------------------------------------------------------------------------------
>
> : 1
> : 0
>   covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
> -------------------------------------------------------------------------------
>
> : 0
> : 1
>   covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
> -------------------------------------------------------------------------------
>
> : 1
> : 1
>   covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
>
> What am I doing wrong, or what don’t I understand
> About the by function?
>
> Thank you
> John
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
>
> Professor of Medicine
>
> Chief, Biostatistics and Informatics
>
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
>
> Baltimore VA Medical Center
>
> 10 North Greene Street
>
> GRECC (BT/18/GR)
>
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
>
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: by function with sum does not give what is expected from by function with print

Duncan Murdoch-2
In reply to this post by Sorkin, John
On 23/07/2020 6:15 p.m., Sorkin, John wrote:

> Colleagues,
>  
> The by function in the R program below is not giving me the sums
> I expect to see, viz.,
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> ###################################################
> #full R program:
> mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> sex=(rep(c(1,1,0,0),2)),
> status=rep(c(1,0),2),
> values=c(382,4730,5,199,170,497,6,25))
> mydata
> by(mydata,list(mydata$sex,mydata$status),sum)
> by(mydata,list(mydata$sex,mydata$status),print)
> ###################################################

The problem is that you are summing the mydata values, not the
mydata$values values.  That will include covid, sex and status in the
sums.  I think you'll get what you should (though it doesn't match what
you say you expected, which looks wrong to me) with this code:

by(mydata$values,list(mydata$sex,mydata$status),sum)

for 0,0, the sum is 224 = 199+25
for 0,1, the sum is  11 = 5+6
for 1,0, the sum is 5227 = 4730 + 497 (not 4730 + 170)
for 1,1, the sum is 552 = 382 + 170

Duncan Murdoch


>
> More complete explanation of my question
>  
> I have created a simple dataframe having three factors:
>   mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
>   sex=(rep(c(1,1,0,0),2)),
>   status=rep(c(1,0),2),
>   values=c(382,4730,5,199,170,497,6,25))
>  
>   > mydata
>    covid sex status values
> 1     0   1      1    382
> 2     0   1      0   4730
> 3     0   0      1      5
> 4     0   0      0    199
> 5     1   1      1    170
> 6     1   1      0    497
> 7     1   0      1      6
> 8     1   0      0     25
>  
> When I use the by function with a sum as an argument, I don’t
> get the sums that I would expect to
> receive based either on the listing of the dataframe above,
> or from using by with print as an argument:
>  
>> by(mydata,list(mydata$sex,mydata$status),sum)
> : 0
> : 0
> [1] 225
> -------------------------------------------------------------------------------
> : 1
> : 0
> [1] 5230
> -------------------------------------------------------------------------------
> : 0
> : 1
> [1] 14
> -------------------------------------------------------------------------------
> : 1
> : 1
> [1] 557
>  
> I expected to see the following sums:
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> Which as can be seen by the output above, I am not getting.
>  
> Using print as an argument to the by function, I get the values
> grouped as I would expect, but for some reason I get a double
> printing of the values!
>  
>> by(mydata,list(mydata$sex,mydata$status),print)
>    covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
>    covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
>    covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
>    covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
> : 0
> : 0
>    covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
> -------------------------------------------------------------------------------
> : 1
> : 0
>    covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
> -------------------------------------------------------------------------------
> : 0
> : 1
>    covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
> -------------------------------------------------------------------------------
> : 1
> : 1
>    covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
>  
> What am I doing wrong, or what don’t I understand
> About the by function?
>  
> Thank you
> John
>  
>  
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
>
> Professor of Medicine
>
> Chief, Biostatistics and Informatics
>
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>
> Baltimore VA Medical Center
>
> 10 North Greene Street
>
> GRECC (BT/18/GR)
>
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
>
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: by function with sum does not give what is expected from by function with print

Rasmus Liland-3
On 2020-07-23 18:54 -0400, Duncan Murdoch wrote:

> On 23/07/2020 6:15 p.m., Sorkin, John wrote:
> > Colleagues,
> > The by function in the R program below is not giving me the sums
> > I expect to see, viz.,
> > 382+170=552
> > 4730+170=4900
> > 5+6=11
> > 199+25=224
> > ###################################################
> > #full R program:
> > mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> > sex=(rep(c(1,1,0,0),2)),
> > status=rep(c(1,0),2),
> > values=c(382,4730,5,199,170,497,6,25))
> > mydata
> > by(mydata,list(mydata$sex,mydata$status),sum)
> > by(mydata,list(mydata$sex,mydata$status),print)
> > ###################################################
>
> The problem is that you are summing the mydata values, not the mydata$values
> values.  That will include covid, sex and status in the sums.  I think
> you'll get what you should (though it doesn't match what you say you
> expected, which looks wrong to me) with this code:
>
> by(mydata$values,list(mydata$sex,mydata$status),sum)
>
> for 0,0, the sum is 224 = 199+25
> for 0,1, the sum is  11 = 5+6
> for 1,0, the sum is 5227 = 4730 + 497 (not 4730 + 170)
> for 1,1, the sum is 552 = 382 + 170
Dear John,

Aggregate also does this, but sex and
status are columns in a data.frame and
not attributes of the double.

        aggregate(x=list("values"=mydata$values),
                  by=list("sex"=mydata$sex,
                          "status"=mydata$status),
                  FUN=sum)

yields

          sex status values
        1   0      0    224
        2   1      0   5227
        3   0      1     11
        4   1      1    552

Best,
Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: by function with sum does not give what is expected from by function with print

Rasmus Liland-3
On 2020-07-24 01:48 +0200, Rasmus Liland wrote:

> aggregate(x=list("values"=mydata$values),
>          by=list("sex"=mydata$sex,
>                  "status"=mydata$status),
>          FUN=sum)
>
> yields
>
>  sex status values
> 1   0      0    224
> 2   1      0   5227
> 3   0      1     11
> 4   1      1    552
After reading more in ?aggregate, I
realized this does the same thing ...

        aggregate(formula=formula("values~sex+status"),
                  FUN=sum,
                  data=mydata)

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: by function with sum does not give what is expected from by function with print

Rui Barradas
In reply to this post by Sorkin, John
Hello,

These two gives the same results:


aggregate(values ~ sex + status, mydata, sum)
#  sex status values
#1   0      0    224
#2   1      0   5227
#3   0      1     11
#4   1      1    552


by(mydata$values, list(mydata$sex, mydata$status), sum)
#: 0
#: 0
#[1] 224
#------------------------------------------------------------
#: 1
#: 0
#[1] 5227
#------------------------------------------------------------
#: 0
#: 1
#[1] 11
#------------------------------------------------------------
#: 1
#: 1
#[1] 552


So Duncan is right, your expected output's 2nd sum is wrong, the right
sum is

mydata rows 2 and 6: 4730 + 497 == 5227
----------------------------------------^

Another option, returning a matrix,

  tapply(mydata$values, list(mydata$sex, mydata$status), sum)
#             0   1
#0     224  11
#1 5227 552


Hope this helps,

Rui Barradas


Às 23:15 de 23/07/2020, Sorkin, John escreveu:

> Colleagues,
>  
> The by function in the R program below is not giving me the sums
> I expect to see, viz.,
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> ###################################################
> #full R program:
> mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> sex=(rep(c(1,1,0,0),2)),
> status=rep(c(1,0),2),
> values=c(382,4730,5,199,170,497,6,25))
> mydata
> by(mydata,list(mydata$sex,mydata$status),sum)
> by(mydata,list(mydata$sex,mydata$status),print)
> ###################################################
>
> More complete explanation of my question
>  
> I have created a simple dataframe having three factors:
>   mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
>   sex=(rep(c(1,1,0,0),2)),
>   status=rep(c(1,0),2),
>   values=c(382,4730,5,199,170,497,6,25))
>  
>   > mydata
>    covid sex status values
> 1     0   1      1    382
> 2     0   1      0   4730
> 3     0   0      1      5
> 4     0   0      0    199
> 5     1   1      1    170
> 6     1   1      0    497
> 7     1   0      1      6
> 8     1   0      0     25
>  
> When I use the by function with a sum as an argument, I don’t
> get the sums that I would expect to
> receive based either on the listing of the dataframe above,
> or from using by with print as an argument:
>  
>> by(mydata,list(mydata$sex,mydata$status),sum)
> : 0
> : 0
> [1] 225
> -------------------------------------------------------------------------------
> : 1
> : 0
> [1] 5230
> -------------------------------------------------------------------------------
> : 0
> : 1
> [1] 14
> -------------------------------------------------------------------------------
> : 1
> : 1
> [1] 557
>  
> I expected to see the following sums:
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> Which as can be seen by the output above, I am not getting.
>  
> Using print as an argument to the by function, I get the values
> grouped as I would expect, but for some reason I get a double
> printing of the values!
>  
>> by(mydata,list(mydata$sex,mydata$status),print)
>    covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
>    covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
>    covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
>    covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
> : 0
> : 0
>    covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
> -------------------------------------------------------------------------------
> : 1
> : 0
>    covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
> -------------------------------------------------------------------------------
> : 0
> : 1
>    covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
> -------------------------------------------------------------------------------
> : 1
> : 1
>    covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
>  
> What am I doing wrong, or what don’t I understand
> About the by function?
>  
> Thank you
> John
>  
>  
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
>
> Professor of Medicine
>
> Chief, Biostatistics and Informatics
>
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>
> Baltimore VA Medical Center
>
> 10 North Greene Street
>
> GRECC (BT/18/GR)
>
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
>
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.