# Conditional sum

4 messages
Open this post in threaded view
|

## Conditional sum

 Hi, I have a DF like this: DF = data.frame(read.table(textConnection("    A  B  C 1 b1  1999  0.25 2 c1  1999  0.25 3 d1  1999  0.25 4 a2  1999  0.25 5 c2  1999  0.25 6 d2  1999  0.25 7 a3  1999  0.25 8 b3  1999  0.25 9 d3  1999  0.25 10 a4  1999  0.25 11 b4  1999  0.25 12 c4  1999  0.25 13 b1  2001  0.5 14 a2  2001  0.5 15 b1  2004  0.33 16 c1  2004  0.33 17 a2  2004  0.33 18 c2  2004  0.33 19 a3  2004  0.33 20 b3  2004  0.33 21 d2  1980  0.4 22 a3  1980  0.4 23 b4  1981  0.4 24 c1  1981  0.4"),head=TRUE,stringsAsFactors=FALSE)) For each factor in A I want to sum the values of C for all years(Bn) prior to the current year(Bi): 1 b1  1999  0.25  0 2 c1  1999  0.25  0.4 3 d1  1999  0.25  0 4 a2  1999  0.25  0 5 c2  1999  0.25  0 6 d2  1999  0.25  0.4 7 a3  1999  0.25  0.4 8 b3  1999  0.25  0 9 d3  1999  0.25  0 10 a4 1999  0.25  0 11 b4 1999  0.25  0.4 12 c4 1999  0.25  0 13 b1 2001  0.50  0.25 14 a2 2001  0.50  0.25 15 b1 2004  0.33  0.75 16 c1 2004  0.33  0.65 17 a2 2004  0.33  0.75 18 c2 2004  0.33  0.25 19 a3 2004  0.33  0.65 20 b3 2004  0.33  0.25 21 d2 1980  0.40  0 22 a3 1980  0.40  0 23 b4 1981  0.40  0 24 c1 1981  0.40  0 If I do this - DF\$D<-ave(DF\$C,DF\$A,FUN = function(x) sum(x)) - it returns for each factor the sum of C for all years. How do I build the (B) condition in this function? Thanks very much!
Open this post in threaded view
|

## Re: Conditional sum

 mathijsdevaan wrote I have a DF like this: DF = data.frame(read.table(textConnection("    A  B  C 1 b1  1999  0.25 2 c1  1999  0.25 .. For each factor in A I want to sum the values of C for all years(Bn) prior to the current year(Bi): 1 b1  1999  0.25  0 2 c1  1999  0.25  0.4 3 d1  1999  0.25  0 In steps following the "thinking order". You could shorten this considerably. I slightly changed you column names to more speakable ones. Dieter DF = data.frame(read.table(textConnection("    group  year  C 1 b1  1999  0.25 2 c1  1999  0.25 3 d1  1999  0.25 4 a2  1999  0.25 5 c2  1999  0.25 6 d2  1999  0.25 7 a3  1999  0.25 8 b3  1999  0.25 9 d3  1999  0.25 10 a4  1999  0.25 11 b4  1999  0.25 12 c4  1999  0.25 13 b1  2001  0.5 14 a2  2001  0.5 15 b1  2004  0.33 16 c1  2004  0.33 17 a2  2004  0.33 18 c2  2004  0.33 19 a3  2004  0.33 20 b3  2004  0.33 21 d2  1980  0.4 22 a3  1980  0.4 23 b4  1981  0.4 24 c1  1981  0.4"),head=TRUE)) by(DF,DF\$group, FUN = function(x){   print(str(x)) }) # Looks like we should order... # Other solutions are possible, but ordering all first might (not tested) # be the most efficient way for large sets DF = DF[order(DF\$group,DF\$year),] # Let's try cumsum on each group by(DF,DF\$group, FUN = function(x){   cumsum(x\$C) }) # That's not exactly your defininition of "prior" # correct for first value by(DF,DF\$group, FUN = function(x){   cumsum(x\$C)-x\$C }) # Now the data are  in right order, make vector of result DF\$D = unlist(by(DF,DF\$group, FUN = function(x){   cumsum(x\$C) })) # You could sort by row names now to restore the old order