Taking the sum of only some columns of a data frame

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Taking the sum of only some columns of a data frame

Bruce Ratner PhD
Hi R'ers:
Given a data.frame of five columns and ten rows.
I would like to take the sum of, say, the first and third columns only.
For the remaining columns, I do not want any calculations, thus rending their "values" on the "total" row blank. The sum/total row is to be combined to the original data.frame, yielding a data.frame with five columns and eleven rows.

Thanks, in advance.
Bruce


______________
Bruce Ratner PhD
The Significant Statistician™




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

Doran, Harold
I do not believe this can be done in one step

dat <- data.frame(matrix(rnorm(50), 5))

 pos <- c(1,3)
res <-  apply(dat[, pos], 2, sum)

 x <- numeric(5)
 x[pos] <- res

rbind(dat,x)

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Bruce Ratner PhD
Sent: Friday, March 31, 2017 12:20 PM
To: [hidden email]
Subject: [R] Taking the sum of only some columns of a data frame

Hi R'ers:
Given a data.frame of five columns and ten rows.
I would like to take the sum of, say, the first and third columns only.
For the remaining columns, I do not want any calculations, thus rending their "values" on the "total" row blank. The sum/total row is to be combined to the original data.frame, yielding a data.frame with five columns and eleven rows.

Thanks, in advance.
Bruce


______________
Bruce Ratner PhD
The Significant Statistician™




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

Doran, Harold
Apologies, my code below has an error that recycles the vector x. Hopefully, the concept is clear.

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Doran, Harold
Sent: Friday, March 31, 2017 12:34 PM
To: 'Bruce Ratner PhD' <[hidden email]>; [hidden email]
Subject: Re: [R] Taking the sum of only some columns of a data frame

I do not believe this can be done in one step

dat <- data.frame(matrix(rnorm(50), 5))

 pos <- c(1,3)
res <-  apply(dat[, pos], 2, sum)

 x <- numeric(5)
 x[pos] <- res

rbind(dat,x)

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Bruce Ratner PhD
Sent: Friday, March 31, 2017 12:20 PM
To: [hidden email]
Subject: [R] Taking the sum of only some columns of a data frame

Hi R'ers:
Given a data.frame of five columns and ten rows.
I would like to take the sum of, say, the first and third columns only.
For the remaining columns, I do not want any calculations, thus rending their "values" on the "total" row blank. The sum/total row is to be combined to the original data.frame, yielding a data.frame with five columns and eleven rows.

Thanks, in advance.
Bruce


______________
Bruce Ratner PhD
The Significant Statistician™




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

Doran, Harold
In reply to this post by Doran, Harold
Let's keep r-list on the email per typical protocol. Apply is a function in base R, so you don't need to install it

-----Original Message-----
From: Bruce Ratner PhD [mailto:[hidden email]]
Sent: Friday, March 31, 2017 1:06 PM
To: Doran, Harold <[hidden email]>
Subject: Re: [R] Taking the sum of only some columns of a data frame

Hey Harold:
Thanks for quick reply.
But, I can't install "apply."

Is there anything you can suggest to get my install of apply on R 3.3.3, or a work around of your original answer?

Thanks, so much.
Bruce

______________
Bruce Ratner PhD
The Significant Statistician™




> On Mar 31, 2017, at 12:33 PM, Doran, Harold <[hidden email]> wrote:
>
> apply

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

R help mailing list-2
In reply to this post by Bruce Ratner PhD
> dat <- data.frame(Group=LETTERS[1:5], X=1:5, Y=11:15)
> pos <- c(2,3)
> rbind(dat, Sum=lapply(seq_len(ncol(dat)), function(i) if (i %in% pos) sum(dat[,i]) else NA_real_))
    Group  X  Y
1       A  1 11
2       B  2 12
3       C  3 13
4       D  4 14
5       E  5 15
Sum  <NA> 15 65
> str(.Last.value)
'data.frame':   6 obs. of  3 variables:
 $ Group: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 NA
 $ X    : int  1 2 3 4 5 15
 $ Y    : int  11 12 13 14 15 65
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Mar 31, 2017 at 9:20 AM, Bruce Ratner PhD <[hidden email]> wrote:

> Hi R'ers:
> Given a data.frame of five columns and ten rows.
> I would like to take the sum of, say, the first and third columns only.
> For the remaining columns, I do not want any calculations, thus rending their "values" on the "total" row blank. The sum/total row is to be combined to the original data.frame, yielding a data.frame with five columns and eleven rows.
>
> Thanks, in advance.
> Bruce
>
>
> ______________
> Bruce Ratner PhD
> The Significant Statistician™
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

R help mailing list-2
In reply to this post by Bruce Ratner PhD
I'm sure there are more efficient ways, but this works:

> test1 <- matrix(runif(50), nrow=10, ncol=5)
> ## test1 <- as.data.frame(test1)
> test1 <- rbind(test1, NA)
> test1[11, c(1,3)] <- colSums(test1[1:10,c(1,3)])
> test1


HTH,

Bill.

William Michels, Ph.D.



On Fri, Mar 31, 2017 at 9:20 AM, Bruce Ratner PhD <[hidden email]> wrote:

>
> Hi R'ers:
> Given a data.frame of five columns and ten rows.
> I would like to take the sum of, say, the first and third columns only.
> For the remaining columns, I do not want any calculations, thus rending their "values" on the "total" row blank. The sum/total row is to be combined to the original data.frame, yielding a data.frame with five columns and eleven rows.
>
> Thanks, in advance.
> Bruce
>
>
> ______________
> Bruce Ratner PhD
> The Significant Statistician™
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

R help mailing list-2
Again, you should always copy the R-help list on replies to your OP.

The short answer is you **shouldn't** replace NAs with blanks in your
matrix or dataframe.  NA is the proper designation for those cell
positions. Replacing NA with a "blank" in a dataframe will convert
that column to a "character" mode, precluding further numeric
manipulation of those columns.

Consider your workflow:  are you tying to export a table? If so, take
a look at installing pander (see 'missing' argument on webpage below):

https://cran.r-project.org/web/packages/pander/README.html

Finally, please review the Introductory PDF, available here:

https://cran.r-project.org/doc/manuals/R-intro.pdf

HTH, Bill.

William Michels, Ph.D.



On Fri, Mar 31, 2017 at 11:21 AM, BR_email <[hidden email]> wrote:

> William:
> How can I replace the "NAs" with blanks?
> Bruce
>
> Bruce Ratner, Ph.D.
> The Significant Statistician™
>
>
> William Michels wrote:
>>
>> I'm sure there are more efficient ways, but this works:
>>
>>> test1 <- matrix(runif(50), nrow=10, ncol=5)
>>> ## test1 <- as.data.frame(test1)
>>> test1 <- rbind(test1, NA)
>>> test1[11, c(1,3)] <- colSums(test1[1:10,c(1,3)])
>>> test1
>>
>>
>> HTH,
>>
>> Bill.
>>
>> William Michels, Ph.D.
>>
>>
>>
>> On Fri, Mar 31, 2017 at 9:20 AM, Bruce Ratner PhD <[hidden email]> wrote:
>>>
>>> Hi R'ers:
>>> Given a data.frame of five columns and ten rows.
>>> I would like to take the sum of, say, the first and third columns only.
>>> For the remaining columns, I do not want any calculations, thus rending
>>> their "values" on the "total" row blank. The sum/total row is to be combined
>>> to the original data.frame, yielding a data.frame with five columns and
>>> eleven rows.
>>>
>>> Thanks, in advance.
>>> Bruce
>>>
>>>
>>> ______________
>>> Bruce Ratner PhD
>>> The Significant Statistician™
>>>
>>>
>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

Jeff Newmiller
You can also look at the knitr-RMarkdown work flow, or the knitr-latex work flow. In both of these it is reasonable to convert your data frame to a temporary character-only form purely for output purposes. However, one can usually use an existing function to push your results out without damaging your working data.

It is important to separate your data from your output because mixing results (totals) with data makes using the data further extremely difficult. Mixing them is one of the major flaws of the spreadsheet model of computation, and it causes problems there as well as in R.
--
Sent from my phone. Please excuse my brevity.

On March 31, 2017 1:05:09 PM PDT, William Michels via R-help <[hidden email]> wrote:

>Again, you should always copy the R-help list on replies to your OP.
>
>The short answer is you **shouldn't** replace NAs with blanks in your
>matrix or dataframe.  NA is the proper designation for those cell
>positions. Replacing NA with a "blank" in a dataframe will convert
>that column to a "character" mode, precluding further numeric
>manipulation of those columns.
>
>Consider your workflow:  are you tying to export a table? If so, take
>a look at installing pander (see 'missing' argument on webpage below):
>
>https://cran.r-project.org/web/packages/pander/README.html
>
>Finally, please review the Introductory PDF, available here:
>
>https://cran.r-project.org/doc/manuals/R-intro.pdf
>
>HTH, Bill.
>
>William Michels, Ph.D.
>
>
>
>On Fri, Mar 31, 2017 at 11:21 AM, BR_email <[hidden email]> wrote:
>> William:
>> How can I replace the "NAs" with blanks?
>> Bruce
>>
>> Bruce Ratner, Ph.D.
>> The Significant Statistician™
>>
>>
>> William Michels wrote:
>>>
>>> I'm sure there are more efficient ways, but this works:
>>>
>>>> test1 <- matrix(runif(50), nrow=10, ncol=5)
>>>> ## test1 <- as.data.frame(test1)
>>>> test1 <- rbind(test1, NA)
>>>> test1[11, c(1,3)] <- colSums(test1[1:10,c(1,3)])
>>>> test1
>>>
>>>
>>> HTH,
>>>
>>> Bill.
>>>
>>> William Michels, Ph.D.
>>>
>>>
>>>
>>> On Fri, Mar 31, 2017 at 9:20 AM, Bruce Ratner PhD <[hidden email]>
>wrote:
>>>>
>>>> Hi R'ers:
>>>> Given a data.frame of five columns and ten rows.
>>>> I would like to take the sum of, say, the first and third columns
>only.
>>>> For the remaining columns, I do not want any calculations, thus
>rending
>>>> their "values" on the "total" row blank. The sum/total row is to be
>combined
>>>> to the original data.frame, yielding a data.frame with five columns
>and
>>>> eleven rows.
>>>>
>>>> Thanks, in advance.
>>>> Bruce
>>>>
>>>>
>>>> ______________
>>>> Bruce Ratner PhD
>>>> The Significant Statistician™
>>>>
>>>>
>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

Mathew Guilfoyle
In reply to this post by Bruce Ratner PhD
This does the summation you want in one line:

#create example data and column selection
d = as.data.frame(matrix(rnorm(50),ncol=5))
cols = c(1,3)

#sum selected columns and put results in new row
d[nrow(d)+1,cols] = colSums(d[,cols])

However, I would agree with the sentiments that this is a bad idea; far better to have the mean values stored in a new object leaving the original data table untainted.  


> On 31 Mar 2017, at 17:20, Bruce Ratner PhD <[hidden email]> wrote:
>
> Hi R'ers:
> Given a data.frame of five columns and ten rows.
> I would like to take the sum of, say, the first and third columns only.
> For the remaining columns, I do not want any calculations, thus rending their "values" on the "total" row blank. The sum/total row is to be combined to the original data.frame, yielding a data.frame with five columns and eleven rows.
>
> Thanks, in advance.
> Bruce
>
>
> ______________
> Bruce Ratner PhD
> The Significant Statistician™
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

Bert Gunter-2
In reply to this post by Jeff Newmiller
All:

1. I agree wholeheartedly with prior responses.

2. But let's suppose that for some reason, you *did* want to carry
around some "calculated values" with the data frame. Then one way to
do it is to add them as attributes to the data frame. This way they
cannot "pollute" the data in the way Jeff warned against; e.g.

attr(your_frame,"colsums") <- colSums(your_frame)

This of course calculates them all, but you can of course just attach
some (e.g. colSums(your_frame[,c(1,3)] )

3. This, of course, has the disadvantage of requiring recalculation of
the attribute if the data changes, which is an invitation to problems.
A better approach might be to attach the *function* that does the
calculation as an attribute, which when invoked always uses the
current data:

attr(your_frame,"colsums") <- function(x)colSums(x)

For example:

df <- data.frame(x=1:5,y=21:25)
attr(df,"colsums")<- function(x)colSums(x)

## then:
> attr(df,"colsums")(df)
  x   y
 15 115

## add a row
> df[6,] <- rep(100,2)
> attr(df,"colsums")(df)
  x   y
115 215


This survives changing the name of df:

> dat <- df
> attr(dat,"colsums")(dat)
  x   y
115 215

As it stands, the call: attr(df,"colsums")(df)  is a bit clumsy; one
could easily write a function that does this sort of thing more
cleanly, as, for example, is done via the "selfStart" functionality
for nonlinear models.

But all this presupposes that the OP is familiar with R programming
paradigms, especially the use of functions as first class objects, and
the language in general. While I may have missed this, his posts do
not seem to me to indicate such familiarity, so as others have
suggested, perhaps the best answer is to first spend some time with an
R tutorial or two and *not* try to mimic bad spreadsheet practices in
R.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 31, 2017 at 2:49 PM, Jeff Newmiller
<[hidden email]> wrote:

> You can also look at the knitr-RMarkdown work flow, or the knitr-latex work flow. In both of these it is reasonable to convert your data frame to a temporary character-only form purely for output purposes. However, one can usually use an existing function to push your results out without damaging your working data.
>
> It is important to separate your data from your output because mixing results (totals) with data makes using the data further extremely difficult. Mixing them is one of the major flaws of the spreadsheet model of computation, and it causes problems there as well as in R.
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 31, 2017 1:05:09 PM PDT, William Michels via R-help <[hidden email]> wrote:
>>Again, you should always copy the R-help list on replies to your OP.
>>
>>The short answer is you **shouldn't** replace NAs with blanks in your
>>matrix or dataframe.  NA is the proper designation for those cell
>>positions. Replacing NA with a "blank" in a dataframe will convert
>>that column to a "character" mode, precluding further numeric
>>manipulation of those columns.
>>
>>Consider your workflow:  are you tying to export a table? If so, take
>>a look at installing pander (see 'missing' argument on webpage below):
>>
>>https://cran.r-project.org/web/packages/pander/README.html
>>
>>Finally, please review the Introductory PDF, available here:
>>
>>https://cran.r-project.org/doc/manuals/R-intro.pdf
>>
>>HTH, Bill.
>>
>>William Michels, Ph.D.
>>
>>
>>
>>On Fri, Mar 31, 2017 at 11:21 AM, BR_email <[hidden email]> wrote:
>>> William:
>>> How can I replace the "NAs" with blanks?
>>> Bruce
>>>
>>> Bruce Ratner, Ph.D.
>>> The Significant Statistician™
>>>
>>>
>>> William Michels wrote:
>>>>
>>>> I'm sure there are more efficient ways, but this works:
>>>>
>>>>> test1 <- matrix(runif(50), nrow=10, ncol=5)
>>>>> ## test1 <- as.data.frame(test1)
>>>>> test1 <- rbind(test1, NA)
>>>>> test1[11, c(1,3)] <- colSums(test1[1:10,c(1,3)])
>>>>> test1
>>>>
>>>>
>>>> HTH,
>>>>
>>>> Bill.
>>>>
>>>> William Michels, Ph.D.
>>>>
>>>>
>>>>
>>>> On Fri, Mar 31, 2017 at 9:20 AM, Bruce Ratner PhD <[hidden email]>
>>wrote:
>>>>>
>>>>> Hi R'ers:
>>>>> Given a data.frame of five columns and ten rows.
>>>>> I would like to take the sum of, say, the first and third columns
>>only.
>>>>> For the remaining columns, I do not want any calculations, thus
>>rending
>>>>> their "values" on the "total" row blank. The sum/total row is to be
>>combined
>>>>> to the original data.frame, yielding a data.frame with five columns
>>and
>>>>> eleven rows.
>>>>>
>>>>> Thanks, in advance.
>>>>> Bruce
>>>>>
>>>>>
>>>>> ______________
>>>>> Bruce Ratner PhD
>>>>> The Significant Statistician™
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>
>>
>>______________________________________________
>>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Taking the sum of only some columns of a data frame

R help mailing list-2
Thank you Jeff for pointing out bad spreadsheet practices in R,
seconded by Mathew and Bert.

I should have considered creating a second dataframe ("test1_summary")
to distinguish raw from processed data. Those who want to address
memory issues caused by unnecessary duplication, feel free to chime
in.

Finally, thank you Bert for your most informative post on adding
attributes to dataframes. I really learned a lot!

Best Regards,

Bill.

William Michels, Ph.D.



On Fri, Mar 31, 2017 at 4:59 PM, Bert Gunter <[hidden email]> wrote:

> All:
>
> 1. I agree wholeheartedly with prior responses.
>
> 2. But let's suppose that for some reason, you *did* want to carry
> around some "calculated values" with the data frame. Then one way to
> do it is to add them as attributes to the data frame. This way they
> cannot "pollute" the data in the way Jeff warned against; e.g.
>
> attr(your_frame,"colsums") <- colSums(your_frame)
>
> This of course calculates them all, but you can of course just attach
> some (e.g. colSums(your_frame[,c(1,3)] )
>
> 3. This, of course, has the disadvantage of requiring recalculation of
> the attribute if the data changes, which is an invitation to problems.
> A better approach might be to attach the *function* that does the
> calculation as an attribute, which when invoked always uses the
> current data:
>
> attr(your_frame,"colsums") <- function(x)colSums(x)
>
> For example:
>
> df <- data.frame(x=1:5,y=21:25)
> attr(df,"colsums")<- function(x)colSums(x)
>
> ## then:
>> attr(df,"colsums")(df)
>   x   y
>  15 115
>
> ## add a row
>> df[6,] <- rep(100,2)
>> attr(df,"colsums")(df)
>   x   y
> 115 215
>
>
> This survives changing the name of df:
>
>> dat <- df
>> attr(dat,"colsums")(dat)
>   x   y
> 115 215
>
> As it stands, the call: attr(df,"colsums")(df)  is a bit clumsy; one
> could easily write a function that does this sort of thing more
> cleanly, as, for example, is done via the "selfStart" functionality
> for nonlinear models.
>
> But all this presupposes that the OP is familiar with R programming
> paradigms, especially the use of functions as first class objects, and
> the language in general. While I may have missed this, his posts do
> not seem to me to indicate such familiarity, so as others have
> suggested, perhaps the best answer is to first spend some time with an
> R tutorial or two and *not* try to mimic bad spreadsheet practices in
> R.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Mar 31, 2017 at 2:49 PM, Jeff Newmiller
> <[hidden email]> wrote:
>> You can also look at the knitr-RMarkdown work flow, or the knitr-latex work flow. In both of these it is reasonable to convert your data frame to a temporary character-only form purely for output purposes. However, one can usually use an existing function to push your results out without damaging your working data.
>>
>> It is important to separate your data from your output because mixing results (totals) with data makes using the data further extremely difficult. Mixing them is one of the major flaws of the spreadsheet model of computation, and it causes problems there as well as in R.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On March 31, 2017 1:05:09 PM PDT, William Michels via R-help <[hidden email]> wrote:
>>>Again, you should always copy the R-help list on replies to your OP.
>>>
>>>The short answer is you **shouldn't** replace NAs with blanks in your
>>>matrix or dataframe.  NA is the proper designation for those cell
>>>positions. Replacing NA with a "blank" in a dataframe will convert
>>>that column to a "character" mode, precluding further numeric
>>>manipulation of those columns.
>>>
>>>Consider your workflow:  are you tying to export a table? If so, take
>>>a look at installing pander (see 'missing' argument on webpage below):
>>>
>>>https://cran.r-project.org/web/packages/pander/README.html
>>>
>>>Finally, please review the Introductory PDF, available here:
>>>
>>>https://cran.r-project.org/doc/manuals/R-intro.pdf
>>>
>>>HTH, Bill.
>>>
>>>William Michels, Ph.D.
>>>
>>>
>>>
>>>On Fri, Mar 31, 2017 at 11:21 AM, BR_email <[hidden email]> wrote:
>>>> William:
>>>> How can I replace the "NAs" with blanks?
>>>> Bruce
>>>>
>>>> Bruce Ratner, Ph.D.
>>>> The Significant Statistician™
>>>>
>>>>
>>>> William Michels wrote:
>>>>>
>>>>> I'm sure there are more efficient ways, but this works:
>>>>>
>>>>>> test1 <- matrix(runif(50), nrow=10, ncol=5)
>>>>>> ## test1 <- as.data.frame(test1)
>>>>>> test1 <- rbind(test1, NA)
>>>>>> test1[11, c(1,3)] <- colSums(test1[1:10,c(1,3)])
>>>>>> test1
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>> Bill.
>>>>>
>>>>> William Michels, Ph.D.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 31, 2017 at 9:20 AM, Bruce Ratner PhD <[hidden email]>
>>>wrote:
>>>>>>
>>>>>> Hi R'ers:
>>>>>> Given a data.frame of five columns and ten rows.
>>>>>> I would like to take the sum of, say, the first and third columns
>>>only.
>>>>>> For the remaining columns, I do not want any calculations, thus
>>>rending
>>>>>> their "values" on the "total" row blank. The sum/total row is to be
>>>combined
>>>>>> to the original data.frame, yielding a data.frame with five columns
>>>and
>>>>>> eleven rows.
>>>>>>
>>>>>> Thanks, in advance.
>>>>>> Bruce
>>>>>>
>>>>>>
>>>>>> ______________
>>>>>> Bruce Ratner PhD
>>>>>> The Significant Statistician™
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>______________________________________________
>>>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.