Dividing rows in groups

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Dividing rows in groups

R help mailing list-2
Hi


I have two data frames as shown below (second one is obtained by aggregating rows of similar IDs in df1.). They both have similar number of columns but rows of df2 are lesser than rows of df1.


df1:
ID       A             B
1         1             2
1         0             3
2        5             NA
2         1             3
3         1             4
4         NA           NA
4         0             1
4         3             0
5         2             5
5         7           NA


df2:
ID       A          B
1         1          5
2         6          3
3         1          4
4         3          1
5        9          5

Now, to obtain weight of each value of df1, I want to divide each row of df1 by the row of df2 having similar ID. What I want is as below:

ID    A    B
1    1    0.4
1    0    0.6
2    0.83  NA
2    0.17  1
3    1     4
4    NA    NA
4    0     1
4    1     0
5    0.22  1
5    0.78  NA


Kindly guide me in this regard.

Thanks
Saba

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dividing rows in groups

David Winsemius

> On Apr 23, 2016, at 9:46 PM, Saba Sehrish via R-help <[hidden email]> wrote:
>
> Hi
>
>
> I have two data frames as shown below (second one is obtained by aggregating rows of similar IDs in df1.). They both have similar number of columns but rows of df2 are lesser than rows of df1.
>
>
> df1:
> ID       A             B
> 1         1             2
> 1         0             3
> 2        5             NA
> 2         1             3
> 3         1             4
> 4         NA           NA
> 4         0             1
> 4         3             0
> 5         2             5
> 5         7           NA
>
>
> df2:
> ID       A          B
> 1         1          5
> 2         6          3
> 3         1          4
> 4         3          1
> 5        9          5
>
> Now, to obtain weight of each value of df1, I want to divide each row of df1 by the row of df2 having similar ID. What I want is as below:
>
> ID    A    B
> 1    1    0.4
> 1    0    0.6
> 2    0.83  NA
> 2    0.17  1
> 3    1     4
> 4    NA    NA
> 4    0     1
> 4    1     0
> 5    0.22  1
> 5    0.78  NA
>
>
> Kindly guide me in this regard.

Do a merge on "ID"  and then calculate.


--

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dividing rows in groups

jholtman
In reply to this post by R help mailing list-2
You can use 'dplyr':

> library(dplyr)
> df1 <- read.table(text = "ID       A             B
+ 1         1             2
+ 1         0             3
+ 2        5             NA
+ 2         1             3
+ 3         1             4
+ 4         NA           NA
+ 4         0             1
+ 4         3             0
+ 5         2             5
+ 5         7           NA", header = TRUE)
> df2 <- df1 %>%
+         group_by(ID) %>%
+         mutate(new_A = A / sum(A, na.rm = TRUE)
+             , new_B = B / sum(B, na.rm = TRUE)
+             )
>
> df2
Source: local data frame [10 x 5]
Groups: ID [5]

      ID     A     B     new_A new_B
   (int) (int) (int)     (dbl) (dbl)
1      1     1     2 1.0000000   0.4
2      1     0     3 0.0000000   0.6
3      2     5    NA 0.8333333    NA
4      2     1     3 0.1666667   1.0
5      3     1     4 1.0000000   1.0
6      4    NA    NA        NA    NA
7      4     0     1 0.0000000   1.0
8      4     3     0 1.0000000   0.0
9      5     2     5 0.2222222   1.0
10     5     7    NA 0.7777778    NA



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Apr 24, 2016 at 12:46 AM, Saba Sehrish via R-help <
[hidden email]> wrote:

> Hi
>
>
> I have two data frames as shown below (second one is obtained by
> aggregating rows of similar IDs in df1.). They both have similar number of
> columns but rows of df2 are lesser than rows of df1.
>
>
> df1:
> ID       A             B
> 1         1             2
> 1         0             3
> 2        5             NA
> 2         1             3
> 3         1             4
> 4         NA           NA
> 4         0             1
> 4         3             0
> 5         2             5
> 5         7           NA
>
>
> df2:
> ID       A          B
> 1         1          5
> 2         6          3
> 3         1          4
> 4         3          1
> 5        9          5
>
> Now, to obtain weight of each value of df1, I want to divide each row of
> df1 by the row of df2 having similar ID. What I want is as below:
>
> ID    A    B
> 1    1    0.4
> 1    0    0.6
> 2    0.83  NA
> 2    0.17  1
> 3    1     4
> 4    NA    NA
> 4    0     1
> 4    1     0
> 5    0.22  1
> 5    0.78  NA
>
>
> Kindly guide me in this regard.
>
> Thanks
> Saba
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dividing rows in groups

jholtman
This will handle all the columns; it assumes the ones you want to start
with are in column 2 through the end:


> library(dplyr)
> df1 <- read.table(text = "ID       A             B
+ 1         1             2
+ 1         0             3
+ 2        5             NA
+ 2         1             3
+ 3         1             4
+ 4         NA           NA
+ 4         0             1
+ 4         3             0
+ 5         2             5
+ 5         7           NA", header = TRUE)
> df2 <- df1 %>%
+         group_by(ID) %>%
+         do({  # now process and indeterinant number of columns
+             result <- .  # get original input
+             for (i in 2:ncol(result)){
+                 result[[i]] <- result[[i]] / sum(result[[i]], na.rm =
TRUE)
+             }
+             result  # return value
+         })
>
> df2
Source: local data frame [10 x 3]
Groups: ID [5]

      ID         A     B
   (int)     (dbl) (dbl)
1      1 1.0000000   0.4
2      1 0.0000000   0.6
3      2 0.8333333    NA
4      2 0.1666667   1.0
5      3 1.0000000   1.0
6      4        NA    NA
7      4 0.0000000   1.0
8      4 1.0000000   0.0
9      5 0.2222222   1.0
10     5 0.7777778    NA



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Apr 24, 2016 at 12:02 PM, Saba Sehrish <[hidden email]>
wrote:

> Hi Jim
>
> Thanks a lot. Its really helpful but actually I have around 10,000 columns
> & 8000 rows. I simply want to get the weight of each observation in a group
> (group is identified by ID). Therefore, I am looking for an easy way to
> divide each observation in a row with the sum of all values in its group.
>
> How can I apply this on such a huge data set?
>
> Regards
> Saba
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Mon, 25 Apr, 2016 at 3:38 AM, jim holtman
> <[hidden email]> wrote:
> You can use 'dplyr':
>
> > library(dplyr)
> > df1 <- read.table(text = "ID       A             B
> + 1         1             2
> + 1         0             3
> + 2        5             NA
> + 2         1             3
> + 3         1             4
> + 4         NA           NA
> + 4         0             1
> + 4         3             0
> + 5         2             5
> + 5         7           NA", header = TRUE)
> > df2 <- df1 %>%
> +         group_by(ID) %>%
> +         mutate(new_A = A / sum(A, na.rm = TRUE)
> +             , new_B = B / sum(B, na.rm = TRUE)
> +             )
> >
> > df2
> Source: local data frame [10 x 5]
> Groups: ID [5]
>
>       ID     A     B     new_A new_B
>    (int) (int) (int)     (dbl) (dbl)
> 1      1     1     2 1.0000000   0.4
> 2      1     0     3 0.0000000   0.6
> 3      2     5    NA 0.8333333    NA
> 4      2     1     3 0.1666667   1.0
> 5      3     1     4 1.0000000   1.0
> 6      4    NA    NA        NA    NA
> 7      4     0     1 0.0000000   1.0
> 8      4     3     0 1.0000000   0.0
> 9      5     2     5 0.2222222   1.0
> 10     5     7    NA 0.7777778    NA
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sun, Apr 24, 2016 at 12:46 AM, Saba Sehrish via R-help <
> [hidden email]> wrote:
>
>> Hi
>>
>>
>> I have two data frames as shown below (second one is obtained by
>> aggregating rows of similar IDs in df1.). They both have similar number of
>> columns but rows of df2 are lesser than rows of df1.
>>
>>
>> df1:
>> ID       A             B
>> 1         1             2
>> 1         0             3
>> 2        5             NA
>> 2         1             3
>> 3         1             4
>> 4         NA           NA
>> 4         0             1
>> 4         3             0
>> 5         2             5
>> 5         7           NA
>>
>>
>> df2:
>> ID       A          B
>> 1         1          5
>> 2         6          3
>> 3         1          4
>> 4         3          1
>> 5        9          5
>>
>> Now, to obtain weight of each value of df1, I want to divide each row of
>> df1 by the row of df2 having similar ID. What I want is as below:
>>
>> ID    A    B
>> 1    1    0.4
>> 1    0    0.6
>> 2    0.83  NA
>> 2    0.17  1
>> 3    1     4
>> 4    NA    NA
>> 4    0     1
>> 4    1     0
>> 5    0.22  1
>> 5    0.78  NA
>>
>>
>> Kindly guide me in this regard.
>>
>> Thanks
>> Saba
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.