Conditional sum

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Conditional sum

mdvaan
Hi,

I have a DF like this:

DF = data.frame(read.table(textConnection("    A  B  C
1 b1  1999  0.25
2 c1  1999  0.25
3 d1  1999  0.25
4 a2  1999  0.25
5 c2  1999  0.25
6 d2  1999  0.25
7 a3  1999  0.25
8 b3  1999  0.25
9 d3  1999  0.25
10 a4  1999  0.25
11 b4  1999  0.25
12 c4  1999  0.25
13 b1  2001  0.5
14 a2  2001  0.5
15 b1  2004  0.33
16 c1  2004  0.33
17 a2  2004  0.33
18 c2  2004  0.33
19 a3  2004  0.33
20 b3  2004  0.33
21 d2  1980  0.4
22 a3  1980  0.4
23 b4  1981  0.4
24 c1  1981  0.4"),head=TRUE,stringsAsFactors=FALSE))

For each factor in A I want to sum the values of C for all years(Bn) prior to the current year(Bi):

1 b1  1999  0.25  0
2 c1  1999  0.25  0.4
3 d1  1999  0.25  0
4 a2  1999  0.25  0
5 c2  1999  0.25  0
6 d2  1999  0.25  0.4
7 a3  1999  0.25  0.4
8 b3  1999  0.25  0
9 d3  1999  0.25  0
10 a4 1999  0.25  0
11 b4 1999  0.25  0.4
12 c4 1999  0.25  0
13 b1 2001  0.50  0.25
14 a2 2001  0.50  0.25
15 b1 2004  0.33  0.75
16 c1 2004  0.33  0.65
17 a2 2004  0.33  0.75
18 c2 2004  0.33  0.25
19 a3 2004  0.33  0.65
20 b3 2004  0.33  0.25
21 d2 1980  0.40  0
22 a3 1980  0.40  0
23 b4 1981  0.40  0
24 c1 1981  0.40  0

If I do this - DF$D<-ave(DF$C,DF$A,FUN = function(x) sum(x)) - it returns for each factor the sum of C for all years. How do I build the (B) condition in this function? Thanks very much!
Reply | Threaded
Open this post in threaded view
|

Re: Conditional sum

Dieter Menne
mathijsdevaan wrote
I have a DF like this:

DF = data.frame(read.table(textConnection("    A  B  C
1 b1  1999  0.25
2 c1  1999  0.25
..
For each factor in A I want to sum the values of C for all years(Bn) prior to the current year(Bi):

1 b1  1999  0.25  0
2 c1  1999  0.25  0.4
3 d1  1999  0.25  0
In steps following the "thinking order". You could shorten this considerably. I slightly changed you column names to more speakable ones.

Dieter


DF = data.frame(read.table(textConnection("    group  year  C
1 b1  1999  0.25
2 c1  1999  0.25
3 d1  1999  0.25
4 a2  1999  0.25
5 c2  1999  0.25
6 d2  1999  0.25
7 a3  1999  0.25
8 b3  1999  0.25
9 d3  1999  0.25
10 a4  1999  0.25
11 b4  1999  0.25
12 c4  1999  0.25
13 b1  2001  0.5
14 a2  2001  0.5
15 b1  2004  0.33
16 c1  2004  0.33
17 a2  2004  0.33
18 c2  2004  0.33
19 a3  2004  0.33
20 b3  2004  0.33
21 d2  1980  0.4
22 a3  1980  0.4
23 b4  1981  0.4
24 c1  1981  0.4"),head=TRUE))

by(DF,DF$group, FUN = function(x){
  print(str(x))
})
# Looks like we should order...
# Other solutions are possible, but ordering all first might (not tested)
# be the most efficient way for large sets
DF = DF[order(DF$group,DF$year),]
# Let's try cumsum on each group
by(DF,DF$group, FUN = function(x){
  cumsum(x$C)
})
# That's not exactly your defininition of "prior"
# correct for first value
by(DF,DF$group, FUN = function(x){
  cumsum(x$C)-x$C
})
# Now the data are  in right order, make vector of result
DF$D = unlist(by(DF,DF$group, FUN = function(x){
  cumsum(x$C)
}))
# You could sort by row names now to restore the old order
Reply | Threaded
Open this post in threaded view
|

Re: Conditional sum

mdvaan
Dieter Menne wrote
In steps following the "thinking order". You could shorten this considerably. I slightly changed you column names to more speakable ones.

Dieter


DF = data.frame(read.table(textConnection("    group  year  C
1 b1  1999  0.25
2 c1  1999  0.25
3 d1  1999  0.25
4 a2  1999  0.25
5 c2  1999  0.25
6 d2  1999  0.25
7 a3  1999  0.25
8 b3  1999  0.25
9 d3  1999  0.25
10 a4  1999  0.25
11 b4  1999  0.25
12 c4  1999  0.25
13 b1  2001  0.5
14 a2  2001  0.5
15 b1  2004  0.33
16 c1  2004  0.33
17 a2  2004  0.33
18 c2  2004  0.33
19 a3  2004  0.33
20 b3  2004  0.33
21 d2  1980  0.4
22 a3  1980  0.4
23 b4  1981  0.4
24 c1  1981  0.4"),head=TRUE))

by(DF,DF$group, FUN = function(x){
  print(str(x))
})
# Looks like we should order...
# Other solutions are possible, but ordering all first might (not tested)
# be the most efficient way for large sets
DF = DF[order(DF$group,DF$year),]
# Let's try cumsum on each group
by(DF,DF$group, FUN = function(x){
  cumsum(x$C)
})
# That's not exactly your defininition of "prior"
# correct for first value
by(DF,DF$group, FUN = function(x){
  cumsum(x$C)-x$C
})
# Now the data are  in right order, make vector of result
DF$D = unlist(by(DF,DF$group, FUN = function(x){
  cumsum(x$C)
}))
# You could sort by row names now to restore the old order
Thanks for the quick response, but it doesn't do the trick. There are two problems:
1. The ith value of the newly created variable DF$D also includes the ith value of DF$C (this problem is easily solved by DF$D = DF$D-DF$C.)
2. If group i in DF$group appears more than once in year t, the value of the second observation of that group exceeds (includes) the value of the first observation. Example (group b1 and a2 in 2001 are duplicated):

DF = data.frame(read.table(textConnection("    group  year  C
1 b1  1999  0.25
2 c1  1999  0.25
3 d1  1999  0.25
4 a2  1999  0.25
5 c2  1999  0.25
6 d2  1999  0.25
7 a3  1999  0.25
8 b3  1999  0.25
9 d3  1999  0.25
10 a4  1999  0.25
11 b4  1999  0.25
12 c4  1999  0.25
13 b1  2001  0.5
14 a2  2001  0.5
15 b1  2004  0.33
16 c1  2004  0.33
17 a2  2004  0.33
18 c2  2004  0.33
19 a3  2004  0.33
20 b3  2004  0.33
21 d2  1980  0.4
22 a3  1980  0.4
23 b4  1981  0.4
24 c1  1981  0.4
25 b1  2001  0.5
26 a2  2001  0.5"),head=TRUE))

by(DF,DF$group, FUN = function(x){print(str(x))})

DF = DF[order(DF$group,DF$year),]

by(DF,DF$group, FUN = function(x){cumsum(x$C)})

by(DF,DF$group, FUN = function(x){cumsum(x$C)-x$C})

DF$D = unlist(by(DF,DF$group, FUN = function(x){cumsum(x$C)}))

DF$D = DF$D-DF$C
Reply | Threaded
Open this post in threaded view
|

Re: Conditional sum

mdvaan
I am still struggling (I'm an R novice). Basically I just want to sum the values per group if the year condition is met. I have the feeling that using a loop would work, but I am not really familiar with loops. Something like this?

for(i in 1:length(DF$C))
        {
        DF<-which(DF$year<DF[i,"year"])
        DF$D<-ave(DF$C,DF$group,FUN = function(x) sum(x))
        }

It doesn't work and probably looks awful, so can someone point me in the right direction? Thanks!

M

<quote author="mathijsdevaan">
Dieter Menne wrote
In steps following the "thinking order". You could shorten this considerably. I slightly changed you column names to more speakable ones.

Dieter


DF = data.frame(read.table(textConnection("    group  year  C
1 b1  1999  0.25
2 c1  1999  0.25
3 d1  1999  0.25
4 a2  1999  0.25
5 c2  1999  0.25
6 d2  1999  0.25
7 a3  1999  0.25
8 b3  1999  0.25
9 d3  1999  0.25
10 a4  1999  0.25
11 b4  1999  0.25
12 c4  1999  0.25
13 b1  2001  0.5
14 a2  2001  0.5
15 b1  2004  0.33
16 c1  2004  0.33
17 a2  2004  0.33
18 c2  2004  0.33
19 a3  2004  0.33
20 b3  2004  0.33
21 d2  1980  0.4
22 a3  1980  0.4
23 b4  1981  0.4
24 c1  1981  0.4"),head=TRUE))

by(DF,DF$group, FUN = function(x){
  print(str(x))
})
# Looks like we should order...
# Other solutions are possible, but ordering all first might (not tested)
# be the most efficient way for large sets
DF = DF[order(DF$group,DF$year),]
# Let's try cumsum on each group
by(DF,DF$group, FUN = function(x){
  cumsum(x$C)
})
# That's not exactly your defininition of "prior"
# correct for first value
by(DF,DF$group, FUN = function(x){
  cumsum(x$C)-x$C
})
# Now the data are  in right order, make vector of result
DF$D = unlist(by(DF,DF$group, FUN = function(x){
  cumsum(x$C)
}))
# You could sort by row names now to restore the old order
Thanks for the quick response, but it doesn't do the trick. There are two problems:
1. The ith value of the newly created variable DF$D also includes the ith value of DF$C (this problem is easily solved by DF$D = DF$D-DF$C.)
2. If group i in DF$group appears more than once in year t, the value of the second observation of that group exceeds (includes) the value of the first observation. Example (group b1 and a2 in 2001 are duplicated):

DF = data.frame(read.table(textConnection("    group  year  C
1 b1  1999  0.25
2 c1  1999  0.25
3 d1  1999  0.25
4 a2  1999  0.25
5 c2  1999  0.25
6 d2  1999  0.25
7 a3  1999  0.25
8 b3  1999  0.25
9 d3  1999  0.25
10 a4  1999  0.25
11 b4  1999  0.25
12 c4  1999  0.25
13 b1  2001  0.5
14 a2  2001  0.5
15 b1  2004  0.33
16 c1  2004  0.33
17 a2  2004  0.33
18 c2  2004  0.33
19 a3  2004  0.33
20 b3  2004  0.33
21 d2  1980  0.4
22 a3  1980  0.4
23 b4  1981  0.4
24 c1  1981  0.4
25 b1  2001  0.5
26 a2  2001  0.5"),head=TRUE))

by(DF,DF$group, FUN = function(x){print(str(x))})

DF = DF[order(DF$group,DF$year),]

by(DF,DF$group, FUN = function(x){cumsum(x$C)})

by(DF,DF$group, FUN = function(x){cumsum(x$C)-x$C})

DF$D = unlist(by(DF,DF$group, FUN = function(x){cumsum(x$C)}))

DF$D = DF$D-DF$C