Computing growth rate

Computing growth rate

 Hi, I am trying to calculate growth rate (say, sales, though it is to be computed for many variables) in a panel data set. Problem is that I have missing data for many firms for many years. To put it simply, I have created this short dataframe (original df id much bigger) df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7), fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3)) # this gives me co_code1 fyear1 sales1 1      1100   1990   1000 2      1100   1991   1100 3      1100   1992   1200 4      1100   1993   1300 5      1100   1994   1400 6      1100   1995   1500 7      1100   1996   1600 8      1200   1990   1000 9      1200   1991   1100 10     1200   1992   1200 11     1200   1993   1300 12     1200   1994   1400 13     1200   1995   1500 14     1200   1996   1600 15     1300   1990   1000 16     1300   1991   1100 17     1300   1992   1200 18     1300   1993   1300 19     1300   1994   1400 20     1300   1995   1500 21     1300   1996   1600 # I am now removing a couple of rows df1<-df1[-c(5, 8), ] # the result is    co_code1 fyear1 sales1 1      1100   1990   1000 2      1100   1991   1100 3      1100   1992   1200 4      1100   1993   1300 6      1100   1995   1500 7      1100   1996   1600 9      1200   1991   1100 10     1200   1992   1200 11     1200   1993   1300 12     1200   1994   1400 13     1200   1995   1500 14     1200   1996   1600 15     1300   1990   1000 16     1300   1991   1100 17     1300   1992   1200 18     1300   1993   1300 19     1300   1994   1400 20     1300   1995   1500 21     1300   1996   1600 # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been removed. If I try, d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100) # this apparently gives wrong results for the year 1995 (as shown below) as growth rates are computed considering yearly increment.    co_code1 fyear1 sales1    growth 1      1100   1990   1000        NA 2      1100   1991   1100 10.000000 3      1100   1992   1200  9.090909 4      1100   1993   1300  8.333333 5      1100   1995   1500 15.384615 6      1100   1996   1600  6.666667 7      1200   1991   1100        NA 8      1200   1992   1200  9.090909 9      1200   1993   1300  8.333333 10     1200   1994   1400  7.692308 11     1200   1995   1500  7.142857 12     1200   1996   1600  6.666667 13     1300   1990   1000        NA 14     1300   1991   1100 10.000000 15     1300   1992   1200  9.090909 16     1300   1993   1300  8.333333 17     1300   1994   1400  7.692308 18     1300   1995   1500  7.142857 19     1300   1996   1600  6.666667 # I thought of using the formula only when the increment of fyear1 is only 1 while in a co_code1, by using this formula d<-ddply(df1,          "co_code1",          transform,          if(diff(fyear1)==1){            growth=(exp(diff(log(df1\$sales1)))-1)*100          } else{            growth=NA          }) But, this doesn't work. I am getting the following error. In if (diff(fyear1) == 1) { :   the condition has length > 1 and only the first element will be used (repeated a few times). # I have searched for a solution, but somehow couldn't get one. Hope that some kind soul will guide me here. Regards, Brijesh K Mishra Indian Institute of Management, Indore India
Re: Computing growth rate

 Hello, That is a very common mistake. if() accepts only one TRUE/FALSE, for a vectorized version you need ?ifelse. Something like the following (untested). growth <- ifelse(diff(fyear1)==1, (exp(diff(log(df1\$sales1)))-1)*100, NA) Hope this helps, Rui Barradas
Re: Computing growth rate

 In your case use ifelse() as explained by Rui. But it can be done more easily since the fyear1 and co_code1 are synchronized. Add a new column to df1 like this df1\$growth <- c(NA,          ifelse(diff(df1\$fyear1)==1,                     (exp(diff(log(df1\$sales1)))-1)*100,                     NA                     )         ) and display df1. From your request I cannot determine if this is what you want. regards, Berend Hasselman
Re: Computing growth rate

 Dear Mr. Barradas, Thanks a lot for pointing that. I tried that in a few steps- 1. when I evaluated d<-ddply(df1,"co_code1",transform, growth <- ifelse(diff(fyear1)==1, (exp(diff(log(df1\$sales1)))-1)*100, NA)) I got the following, i.e., I was not getting the growth column automatically. co_code1 fyear1 sales1 1      1100   1990   1000 2      1100   1991   1100 3      1100   1992   1200 4      1100   1993   1300 5      1100   1995   1500 6      1100   1996   1600 7      1200   1991   1100 8      1200   1992   1200 9      1200   1993   1300 10     1200   1994   1400 11     1200   1995   1500 12     1200   1996   1600 13     1300   1990   1000 14     1300   1992   1200 15     1300   1993   1300 16     1300   1994   1400 17     1300   1995   1500 18     1300   1996   1600 2. When, just for the heck of it, the assign mark (<-) was changed to '=' as done previously, d<-ddply(df1,"co_code1",transform, growth = ifelse(diff(fyear1)==1, (exp(diff(log(df1\$sales1)))-1)*100, NA)) It was no longer evaluated-error was "Error in data.frame(list(co_code1 = c(1100, 1100, 1100, 1100, 1100, 1100 :   arguments imply differing number of rows: 6, 5" 3. The following gives the desired result df1\$growth<-c(NA, ifelse(diff(df1\$fyear1)==1, (exp(diff(log(df1\$sales1)))-1)*100, NA)) But now I am no longer restricting each iteranation to 'co_code1'-hypothetically if one co_code1 is followed by another with incremental 'fyear1' difference as 1, growth will be evaluated. Is there a better and more elegant way of doing it? Thanks and regards, Brijesh Something like the following > (untested). > > growth <- ifelse(diff(fyear1)==1, (exp(diff(log(df1\$sales1)))-1)*100, NA) > > Hope this helps, > > Rui Barradas > > > Em 15-12-2016 03:40, Brijesh Mishra escreveu: >> >> Hi, >> >> I am trying to calculate growth rate (say, sales, though it is to be >> computed for many variables) in a panel data set. Problem is that I >> have missing data for many firms for many years. To put it simply, I >> have created this short dataframe (original df id much bigger) >> >> df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7), >> fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3)) >> >> # this gives me >> co_code1 fyear1 sales1 >> 1      1100   1990   1000 >> 2      1100   1991   1100 >> 3      1100   1992   1200 >> 4      1100   1993   1300 >> 5      1100   1994   1400 >> 6      1100   1995   1500 >> 7      1100   1996   1600 >> 8      1200   1990   1000 >> 9      1200   1991   1100 >> 10     1200   1992   1200 >> 11     1200   1993   1300 >> 12     1200   1994   1400 >> 13     1200   1995   1500 >> 14     1200   1996   1600 >> 15     1300   1990   1000 >> 16     1300   1991   1100 >> 17     1300   1992   1200 >> 18     1300   1993   1300 >> 19     1300   1994   1400 >> 20     1300   1995   1500 >> 21     1300   1996   1600 >> >> # I am now removing a couple of rows >> df1<-df1[-c(5, 8), ] >> # the result is >>     co_code1 fyear1 sales1 >> 1      1100   1990   1000 >> 2      1100   1991   1100 >> 3      1100   1992   1200 >> 4      1100   1993   1300 >> 6      1100   1995   1500 >> 7      1100   1996   1600 >> 9      1200   1991   1100 >> 10     1200   1992   1200 >> 11     1200   1993   1300 >> 12     1200   1994   1400 >> 13     1200   1995   1500 >> 14     1200   1996   1600 >> 15     1300   1990   1000 >> 16     1300   1991   1100 >> 17     1300   1992   1200 >> 18     1300   1993   1300 >> 19     1300   1994   1400 >> 20     1300   1995   1500 >> 21     1300   1996   1600 >> # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been >> removed. If I try, >> d<-ddply(df1,"co_code1",transform, >> growth=c(NA,exp(diff(log(sales1)))-1)*100) >> >> # this apparently gives wrong results for the year 1995 (as shown >> below) as growth rates are computed considering yearly increment. >> >>     co_code1 fyear1 sales1    growth >> 1      1100   1990   1000        NA >> 2      1100   1991   1100 10.000000 >> 3      1100   1992   1200  9.090909 >> 4      1100   1993   1300  8.333333 >> 5      1100   1995   1500 15.384615 >> 6      1100   1996   1600  6.666667 >> 7      1200   1991   1100        NA >> 8      1200   1992   1200  9.090909 >> 9      1200   1993   1300  8.333333 >> 10     1200   1994   1400  7.692308 >> 11     1200   1995   1500  7.142857 >> 12     1200   1996   1600  6.666667 >> 13     1300   1990   1000        NA >> 14     1300   1991   1100 10.000000 >> 15     1300   1992   1200  9.090909 >> 16     1300   1993   1300  8.333333 >> 17     1300   1994   1400  7.692308 >> 18     1300   1995   1500  7.142857 >> 19     1300   1996   1600  6.666667 >> # I thought of using the formula only when the increment of fyear1 is >> only 1 while in a co_code1, by using this formula >> >> d<-ddply(df1, >>           "co_code1", >>           transform, >>           if(diff(fyear1)==1){ >>             growth=(exp(diff(log(df1\$sales1)))-1)*100 >>           } else{ >>             growth=NA >>           }) >> >> But, this doesn't work. I am getting the following error. In if (diff(fyear1) == 1) { :    the condition has length > 1 and only the first element will be used (repeated a few times). # I have searched for a solution, but somehow couldn't get one. Hope that some kind soul will guide me here. Regards, Brijesh K Mishra Indian Institute of Management, Indore India
Re: Computing growth rate

 Dear Mr Hasselman, I missed you mail, while I was typing my own mail as a reply to Mr. Barradas suggestion. In fact, I implemented your suggestion even before reading it. But, I have a concern that I have noted (though its only hypothetical- such a scenario is very unlikely to occur). Is there a way to restrict such calculations co_code1 wise? Many thanks, Brijesh
Re: Computing growth rate

 This was ensured while using ddply()... To put it simply, I >>> have created this short dataframe (original df id much bigger) >>> >>> df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7), >>> fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3)) >>> >>> # this gives me >>> co_code1 fyear1 sales1 >>> 1      1100   1990   1000 >>> 2      1100   1991   1100 >>> 3      1100   1992   1200 >>> 4      1100   1993   1300 >>> 5      1100   1994   1400 >>> 6      1100   1995   1500 >>> 7      1100   1996   1600 >>> 8      1200   1990   1000 >>> 9      1200   1991   1100 >>> 10     1200   1992   1200 >>> 11     1200   1993   1300 >>> 12     1200   1994   1400 >>> 13     1200   1995   1500 >>> 14     1200   1996   1600 >>> 15     1300   1990   1000 >>> 16     1300   1991   1100 >>> 17     1300   1992   1200 >>> 18     1300   1993   1300 >>> 19     1300   1994   1400 >>> 20     1300   1995   1500 >>> 21     1300   1996   1600 >>> >>> # I am now removing a couple of rows >>> df1<-df1[-c(5, 8), ] >>> # the result is >>>   co_code1 fyear1 sales1 >>> 1      1100   1990   1000 >>> 2      1100   1991   1100 >>> 3      1100   1992   1200 >>> 4      1100   1993   1300 >>> 6      1100   1995   1500 >>> 7      1100   1996   1600 >>> 9      1200   1991   1100 >>> 10     1200   1992   1200 >>> 11     1200   1993   1300 >>> 12     1200   1994   1400 >>> 13     1200   1995   1500 >>> 14     1200   1996   1600 >>> 15     1300   1990   1000 >>> 16     1300   1991   1100 >>> 17     1300   1992   1200 >>> 18     1300   1993   1300 >>> 19     1300   1994   1400 >>> 20     1300   1995   1500 >>> 21     1300   1996   1600 >>> # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been >>> removed. If I try, >>> d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100) >>> >>> # this apparently gives wrong results for the year 1995 (as shown >>> below) as growth rates are computed considering yearly increment. >>> >>>   co_code1 fyear1 sales1    growth >>> 1      1100   1990   1000        NA >>> 2      1100   1991   1100 10.000000 >>> 3      1100   1992   1200  9.090909 >>> 4      1100   1993   1300  8.333333 >>> 5      1100   1995   1500 15.384615 >>> 6      1100   1996   1600  6.666667 >>> 7      1200   1991   1100        NA >>> 8      1200   1992   1200  9.090909 >>> 9      1200   1993   1300  8.333333 >>> 10     1200   1994   1400  7.692308 >>> 11     1200   1995   1500  7.142857 >>> 12     1200   1996   1600  6.666667 >>> 13     1300   1990   1000        NA >>> 14     1300   1991   1100 10.000000 >>> 15     1300   1992   1200  9.090909 >>> 16     1300   1993   1300  8.333333 >>> 17     1300   1994   1400  7.692308 >>> 18     1300   1995   1500  7.142857 >>> 19     1300   1996   1600  6.666667 >>> # I thought of using the formula only when the increment of fyear1 is >>> only 1 while in a co_code1, by using this formula >>> >>> d<-ddply(df1, >>>         "co_code1", >>>         transform, >>>         if(diff(fyear1)==1){ >>>           growth=(exp(diff(log(df1\$sales1)))-1)*100 >>>         } else{ >>>           growth=NA >>>         }) >>> >>> But, this doesn't work. I am getting the following error. >>> >>> In if (diff(fyear1) == 1) { : >>>  the condition has length > 1 and only the first element will be used >>> (repeated a few times). >>> >>> # I have searched for a solution, but somehow couldn't get one. Hope >>> that some kind soul will guide me here. >>> >> >> In your case use ifelse() as explained by Rui. >> But it can be done more easily since the fyear1 and co_code1 are synchronized. >> Add a new column to df1 like this >> >> df1\$growth <- c(NA, >>          ifelse(diff(df1\$fyear1)==1, >>                     (exp(diff(log(df1\$sales1)))-1)*100, >>                     NA >>                     ) >>         ) >> >> and display df1. From your request I cannot determine if this is what you want. regards, Berend Hasselman
Re: Computing growth rate

Re: Computing growth rate

 Like this? df2 <- ddply(df1,"co_code1", transform,     growth=c(NA, ifelse(diff(fyear1)==1, (exp(diff(log(sales1)))-1)*100,NA))     ) But do also look at Petr Pikal's solution. Which of the two solutions you prefer depends on what you want in your special case. Berend
Re: Computing growth rate

 Berend - Unless you need the change in sales year by year, you might consider looking at each company's sales over the years and use regression or other type of trend analysis to get an overall trend... Or, if not, simply divide diff(sales) by diff(fyear1) for each company so at least you get the average over the missing years. David To put it simply, I >> have created this short dataframe (original df id much bigger) >> >> df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7), >> fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3)) >> >> # this gives me >> co_code1 fyear1 sales1 >> 1      1100   1990   1000 >> 2      1100   1991   1100 >> 3      1100   1992   1200 >> 4      1100   1993   1300 >> 5      1100   1994   1400 >> 6      1100   1995   1500 >> 7      1100   1996   1600 >> 8      1200   1990   1000 >> 9      1200   1991   1100 >> 10     1200   1992   1200 >> 11     1200   1993   1300 >> 12     1200   1994   1400 >> 13     1200   1995   1500 >> 14     1200   1996   1600 >> 15     1300   1990   1000 >> 16     1300   1991   1100 >> 17     1300   1992   1200 >> 18     1300   1993   1300 >> 19     1300   1994   1400 >> 20     1300   1995   1500 >> 21     1300   1996   1600 >> >> # I am now removing a couple of rows >> df1<-df1[-c(5, 8), ] >> # the result is >>    co_code1 fyear1 sales1 >> 1      1100   1990   1000 >> 2      1100   1991   1100 >> 3      1100   1992   1200 >> 4      1100   1993   1300 >> 6      1100   1995   1500 >> 7      1100   1996   1600 >> 9      1200   1991   1100 >> 10     1200   1992   1200 >> 11     1200   1993   1300 >> 12     1200   1994   1400 >> 13     1200   1995   1500 >> 14     1200   1996   1600 >> 15     1300   1990   1000 >> 16     1300   1991   1100 >> 17     1300   1992   1200 >> 18     1300   1993   1300 >> 19     1300   1994   1400 >> 20     1300   1995   1500 >> 21     1300   1996   1600 >> # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been >> removed. If I try, >> d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100) >> >> # this apparently gives wrong results for the year 1995 (as shown >> below) as growth rates are computed considering yearly increment. >> >>    co_code1 fyear1 sales1    growth >> 1      1100   1990   1000        NA >> 2      1100   1991   1100 10.000000 >> 3      1100   1992   1200  9.090909 >> 4      1100   1993   1300  8.333333 >> 5      1100   1995   1500 15.384615 >> 6      1100   1996   1600  6.666667 >> 7      1200   1991   1100        NA >> 8      1200   1992   1200  9.090909 >> 9      1200   1993   1300  8.333333 >> 10     1200   1994   1400  7.692308 >> 11     1200   1995   1500  7.142857 >> 12     1200   1996   1600  6.666667 >> 13     1300   1990   1000        NA >> 14     1300   1991   1100 10.000000 >> 15     1300   1992   1200  9.090909 >> 16     1300   1993   1300  8.333333 >> 17     1300   1994   1400  7.692308 >> 18     1300   1995   1500  7.142857 >> 19     1300   1996   1600  6.666667 >> # I thought of using the formula only when the increment of fyear1 is >> only 1 while in a co_code1, by using this formula >> >> d<-ddply(df1, >>          "co_code1", >>          transform, >>          if(diff(fyear1)==1){ >>            growth=(exp(diff(log(df1\$sales1)))-1)*100 >>          } else{ >>            growth=NA >>          }) >> >> But, this doesn't work. I am getting the following error. >> >> In if (diff(fyear1) == 1) { : >>   the condition has length > 1 and only the first element will be used >> (repeated a few times). >> >> # I have searched for a solution, but somehow couldn't get one. Hope >> that some kind soul will guide me here. >> > In your case use ifelse() as explained by Rui. > But it can be done more easily since the fyear1 and co_code1 are synchronized. > Add a new column to df1 like this > > df1\$growth <- c(NA, >           ifelse(diff(df1\$fyear1)==1, >                      (exp(diff(log(df1\$sales1)))-1)*100, >                      NA >                      ) >          ) > > and display df1. From your request I cannot determine if this is what you want. regards, Berend Hasselman -- David K Stevens, P.E., Ph.D. Professor and Head, Environmental Engineering Civil and Environmental Engineering Utah Water Research Laboratory 8200 Old Main Hill Logan, UT  84322-8200 435 797 3229 - voice 435 797 1363 - fax