Re : Adding column sum to new row in data frame

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re : Adding column sum to new row in data frame

mohan L
Dear All,

I have data some thing like this:

    State Jan Feb Mar Apr May Jun  AAA 1 1 0 2 2 0  BBB 1298 1195 1212 1244
1158 845  CCC 0 0 0 1 2 1  DDD 5 11 17 15 10 9  EEE 18 28 27 23 23 16  FFF
68 152 184 135 111 86


I want to sum all the column(Jan, Feb, Mar ...) and have to merge the total
at last row. like this:

StateJanFebMarAprMayJunAAA110220BBB12981195121212441158845CCC 000121DDD51117
15109EEE182827232316FFF6815218413511186Total    1390 1387 1440 1420 1306
957


I am doing some thing like this, but I don't know how to merge "Total" to
"data"  Or I don't know there may be a alternative way.

> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
> data
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA    1    1    0    2    2   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC     0    0    0    1    2   1
4   DDD    5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86

> attributes(data)
$names
[1] "State" "Jan"   "Feb"   "Mar"   "Apr"   "May"   "Jun"

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4 5 6


> x <- data[,2:ncol(data)]

> x

   Jan  Feb  Mar  Apr  May Jun
1    1    1    0    2    2   0
2 1298 1195 1212 1244 1158 845
3    0    0    0    1    2   1
4    5   11   17   15   10   9
5   18   28   27   23   23  16
6   68  152  184  135  111  86

> Total <- sapply(x,sum,na.rm=T)

> Total
 Jan  Feb  Mar  Apr  May  Jun
1390 1387 1440 1420 1306  957

I hope there may be alternative way.  Any help will be appreciated.

Thanks & Rg
Mohan L

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

Joshua Wiley-2
Dear Mohan,

Is this what you want?

rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))

If your State column is a factor, it will return a warning that NAs
were introduced (but the totals will still be at the bottom).  If
State is class character, then that row will have a name "Total" and
then the numbers.

Best regards,

Josh

On Wed, May 19, 2010 at 10:52 AM, Mohan L <[hidden email]> wrote:

> Dear All,
>
> I have data some thing like this:
>
>    State Jan Feb Mar Apr May Jun  AAA 1 1 0 2 2 0  BBB 1298 1195 1212 1244
> 1158 845  CCC 0 0 0 1 2 1  DDD 5 11 17 15 10 9  EEE 18 28 27 23 23 16  FFF
> 68 152 184 135 111 86
>
>
> I want to sum all the column(Jan, Feb, Mar ...) and have to merge the total
> at last row. like this:
>
> StateJanFebMarAprMayJunAAA110220BBB12981195121212441158845CCC 000121DDD51117
> 15109EEE182827232316FFF6815218413511186Total    1390 1387 1440 1420 1306
> 957
>
>
> I am doing some thing like this, but I don't know how to merge "Total" to
> "data"  Or I don't know there may be a alternative way.
>
>> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
>> data
>  State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
>
>> attributes(data)
> $names
> [1] "State" "Jan"   "Feb"   "Mar"   "Apr"   "May"   "Jun"
>
> $class
> [1] "data.frame"
>
> $row.names
> [1] 1 2 3 4 5 6
>
>
>> x <- data[,2:ncol(data)]
>
>> x
>
>   Jan  Feb  Mar  Apr  May Jun
> 1    1    1    0    2    2   0
> 2 1298 1195 1212 1244 1158 845
> 3    0    0    0    1    2   1
> 4    5   11   17   15   10   9
> 5   18   28   27   23   23  16
> 6   68  152  184  135  111  86
>
>> Total <- sapply(x,sum,na.rm=T)
>
>> Total
>  Jan  Feb  Mar  Apr  May  Jun
> 1390 1387 1440 1420 1306  957
>
> I hope there may be alternative way.  Any help will be appreciated.
>
> Thanks & Rg
> Mohan L
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

mohan L
Hi Joshua,
>
>
>
> rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
>

Yes. This what exactly I want. Thanks for your time.


>
> If your State column is a factor, it will return a warning that NAs
> were introduced (but the totals will still be at the bottom).
>

Yes.

> is.factor(data$State)
[1] TRUE

 State column is a factor, it will return warning message like this .
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
  invalid factor level, NAs generated

I know that this is common factor vs character problem in R. I think there
may be a safe way to handle this:

> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)

> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
  invalid factor level, NAs generated

> a
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA    1    1    0    2    2   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC     0    0    0    1    2   1
4   DDD    5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7  <NA> 1390 1387 1440 1420 1306 957

How to safely avoid this warning massage?
Now I have <NA> instead of "Total" in last row State column. How to I
replace it as "Total"?

Thanks & Rg
Mohan L

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

Wu Gong
Add Total instead of NA:

rbind(data,cbind(State="Total", t(apply(data[,-1],2,sum))))


Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

mohan L
In reply to this post by mohan L
>
>
> How to safely avoid this warning massage?
> Now I have <NA> instead of "Total" in last row State column. How to I
> replace it as "Total"?
>
> Dear All,

The below link provides a very good explanation of  "Creating factor
variables" and way to avoid the warning message
http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm

Now it works for me without warning message ( invalid factor level, NAs
generated ) according to the document above , I did  below to avoid the
warning:

> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
> data
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA    1    1    0    2    2   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC     0    0    0    1    2   1
4   DDD    5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86


> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
  invalid factor level, NAs generated

> a
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA    1    1    0    2    2   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC     0    0    0    1    2   1
4   DDD    5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7  <NA> 1390 1387 1440 1420 1306 957

We can see that instead of "Total", the label was <NA>. To do this
correctly, I have added the new level, "Total", to the factor column
data$State using the factor function with the levels argument. Then I can
finally add an element to the factor variable from the new level. here is
the steps

> levels(data$State)
[1] "AAA"  "BBB"  "CCC " "DDD"  "EEE"  "FFF"

> data$State <- factor(data$State,levels=c(levels(data$State),"Total"))

> data$State
[1] AAA  BBB  CCC  DDD  EEE  FFF
Levels: AAA BBB CCC  DDD EEE FFF Total

> levels(data$State)
[1] "AAA"   "BBB"   "CCC "  "DDD"   "EEE"   "FFF"   "Total"

> x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))

Now the above works without warning.

> x
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA    1    1    0    2    2   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC     0    0    0    1    2   1
4   DDD    5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7 Total 1390 1387 1440 1420 1306 957

I think I am doing right. If  I miss understood anything. Please guide me  I
am beginer to R.

Thanks & Rg
Mohan L

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

David Winsemius

On May 19, 2010, at 11:19 PM, Mohan L wrote:

>>
>>
>> How to safely avoid this warning massage?
>> Now I have <NA> instead of "Total" in last row State column. How to I
>> replace it as "Total"?
>>
>> Dear All,
>
> The below link provides a very good explanation of  "Creating factor
> variables" and way to avoid the warning message
> http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm
>
> Now it works for me without warning message ( invalid factor level,  
> NAs
> generated ) according to the document above , I did  below to avoid  
> the
> warning:
>
>> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
>> data
>  State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
>
>
>> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
> Warning message:
> In `[<-.factor`(`*tmp*`, ri, value = "Total") :
>  invalid factor level, NAs generated
>
>> a
>  State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
> 7  <NA> 1390 1387 1440 1420 1306 957
>
> We can see that instead of "Total", the label was <NA>. To do this
> correctly, I have added the new level, "Total", to the factor column
> data$State using the factor function with the levels argument. Then  
> I can
> finally add an element to the factor variable from the new level.  
> here is
> the steps
>
>> levels(data$State)
> [1] "AAA"  "BBB"  "CCC " "DDD"  "EEE"  "FFF"
>
>> data$State <- factor(data$State,levels=c(levels(data$State),"Total"))
>
>> data$State
> [1] AAA  BBB  CCC  DDD  EEE  FFF
> Levels: AAA BBB CCC  DDD EEE FFF Total
>
>> levels(data$State)
> [1] "AAA"   "BBB"   "CCC "  "DDD"   "EEE"   "FFF"   "Total"
>
>> x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
>
> Now the above works without warning.
>
>> x
>  State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
> 7 Total 1390 1387 1440 1420 1306 957
>
> I think I am doing right. If  I miss understood anything. Please  
> guide me  I
> am beginer to R.

If you had instead used stringsAsFactors=FALSE with the read.table  
function, the "State" column would have been character rather than  
factor, and you would have avoided all those difficulties.

--
David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

Joshua Wiley-2
In reply to this post by mohan L
Dear Mohan,

First, I would like to modify my code slightly to:

data <- rbind(data,data.frame(State="Total",t(apply(data[,-1], 2, sum,
na.rm=TRUE))))

This actually will add a 7th level to your factor automatically.  The
reason I wanted to change from using c() to data.frame() is that if
one uses c(), all the columns are converted to character (this has to
do with different methods for rbind, see ?rbind particularly the
Details and Value section which describe the different methods for
rbind and what its behavior will be if it is using the default
method).  This may not be an issue, but it would hamper any subsequent
calculations you may wish to perform on your data.

Aside from the way I just showed, you have many options.

(1) Read the data in and do not convert the state names to factor in
the first place (David's suggestion). This is most likely the easiest.
 The only reason I can see not wanting to use this is if there were
several other columns you wanted treated as factors, in which case
look at option #3.

See ?read.csv for details.  Note that: "'read.csv' and 'read.csv2' are
identical to 'read.table' except for the defaults." so it is perfectly
legal to use the stringsAsFactors argument shown for read.table.

data <- read.csv(file='ipsample.csv', header=TRUE,
stringsAsFactors=FALSE) #what your read in code might look like

Even if you do this, I would still recommend against my original code
because it converts everything else to character too.

(2) Add a level to the factor for total (as you showed).  I do not see
any real problem with this, but it is a bit of a complicated solution.
 My updated code actually leads to similar results.

(3) Supposing the data is already read in as a factor and you would
like to change it, you can convert just that column to character.
Once the column is character, you can name that cell Total, or
whatever else you like.

data$State <- as.character(data$State)

(4) Wu Gong suggested this option.  The difficulty with this option is
that because cbind is like rbind, the default without any dataframe
arguments will wind up with everything being converted to character,
just like my original code.

data <- rbind(data,cbind(State="Total", t(apply(data[,-1],2,sum))))

Depending exactly what you are working with, different solutions may
be better or wrose and these are certainly not the only ways you could
handle it.

Best regards,

Josh

On Wed, May 19, 2010 at 8:19 PM, Mohan L <[hidden email]> wrote:

>
>>
>>
>> How to safely avoid this warning massage?
>> Now I have <NA> instead of "Total" in last row State column. How to I
>> replace it as "Total"?
>>
> Dear All,
>
> The below link provides a very good explanation of  "Creating factor
> variables" and way to avoid the warning message
> http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm
>
> Now it works for me without warning message ( invalid factor level, NAs
> generated ) according to the document above , I did  below to avoid the
> warning:
>
>> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
>> data
>   State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
>
>
>> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
> Warning message:
> In `[<-.factor`(`*tmp*`, ri, value = "Total") :
>   invalid factor level, NAs generated
>
>> a
>   State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
> 7  <NA> 1390 1387 1440 1420 1306 957
>
> We can see that instead of "Total", the label was <NA>. To do this
> correctly, I have added the new level, "Total", to the factor column
> data$State using the factor function with the levels argument. Then I can
> finally add an element to the factor variable from the new level. here is
> the steps
>
>> levels(data$State)
> [1] "AAA"  "BBB"  "CCC " "DDD"  "EEE"  "FFF"
>
>> data$State <- factor(data$State,levels=c(levels(data$State),"Total"))
>
>> data$State
> [1] AAA  BBB  CCC  DDD  EEE  FFF
> Levels: AAA BBB CCC  DDD EEE FFF Total
>
>> levels(data$State)
> [1] "AAA"   "BBB"   "CCC "  "DDD"   "EEE"   "FFF"   "Total"
>
>> x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
>
> Now the above works without warning.
>
>> x
>   State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
> 7 Total 1390 1387 1440 1420 1306 957
>
> I think I am doing right. If  I miss understood anything. Please guide me  I
> am beginer to R.
>
> Thanks & Rg
> Mohan L
>
>
>
>



--
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

mohan L
On Thu, May 20, 2010 at 10:31 AM, Joshua Wiley <[hidden email]>wrote:

> Dear Mohan,
>
> First, I would like to modify my code slightly to:
>
> data <- rbind(data,data.frame(State="Total",t(apply(data[,-1], 2, sum,
> na.rm=TRUE))))
>
> This actually will add a 7th level to your factor automatically.  The
> reason I wanted to change from using c() to data.frame() is that if
> one uses c(), all the columns are converted to character (this has to
> do with different methods for rbind, see ?rbind particularly the
> Details and Value section which describe the different methods for
> rbind and what its behavior will be if it is using the default
> method).  This may not be an issue, but it would hamper any subsequent
> calculations you may wish to perform on your data.
>

Thanks for your great help.


Mohan L

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : Adding column sum to new row in data frame

David Winsemius
In reply to this post by Joshua Wiley-2

On May 20, 2010, at 1:01 AM, Joshua Wiley wrote:

> Dear Mohan,
>
> First, I would like to modify my code slightly to:
>
> data <- rbind(data,data.frame(State="Total",t(apply(data[,-1], 2, sum,
> na.rm=TRUE))))
>
> This actually will add a 7th level to your factor automatically.  The
> reason I wanted to change from using c() to data.frame() is that if
> one uses c(), all the columns are converted to character (this has to
> do with different methods for rbind, see ?rbind particularly the
> Details and Value section which describe the different methods for
> rbind and what its behavior will be if it is using the default
> method).  This may not be an issue, but it would hamper any subsequent
> calculations you may wish to perform on your data.


Actually the coercion to a single element type occurs as soon as you  
use c()

 > c("a", 1:10)
  [1] "a"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Whether rbind might also coerce would depend on the class of arguments  
it is supplied. If you have created an argument with c() that is all  
character because that is the "least common denominator", and then try  
to rbind it to a dataframe, the numeric columns will character-ized.
--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...