MA process in panels

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

MA process in panels

Philipp Grueber
Dear R users,

I have an unbalanced panel with an average of I=100 individuals and a total of T=1370 time intervals, i.e. T>>I. So far, I have been using the plm package.

I wish to estimate a FE model like:

res<-plm(x~c+v, data=pdata_frame, effect="twoways", model="within", na.action=na.omit)

…where c varies over i and t, and v represents an exogenous impact on x varying over time but not over i. I discover significant time effects comparing the above model with a plm(…,effect="individual", …)-model (using pftest).

MY PROBLEM:
I discover high levels of serial correlation in the errors. Including lags of x, coefficients are significant at least put to 30 lags. If I set my dataset to weekly observations (approx. 5 days = 1 week), the coefficients of the lags are significant at least up to the 15th lag (I didn't test a larger number of lags). The more lags I include, the less sections can be included in my sample (the panel is unbalanced, i.e. data is not available for the whole period for all individuals -- in fact, full data is available for only few individuals).  

Checking the acf() and pacf() of x, I find that for the large majority of individuals, x is an MA() process. That's plausible because it would explain the high levels of autocorrelation. However, I do not know a lot about MA models for panel data.  



Most books I have found so far only touch on MA processes in panels but do not discuss the estimation problems and implementation in further details. Therefore, I have the following questions:

1) Are there any issues specific to panel models with an MA component?

2) Is there an implementation for panel MA models in R?

3) If not, I have thought about the following solution. Does this approach provide correct/ reliable results?

_________________________________________

#Unfortunately, I was unable to create an appropriate panel dataset with an MA process in the residuals. Maybe someone has an idea where to find such data? Nevertheless you should be able to follow my subsequent thoughts:

# I should be able to get my (time- and sectionally) demeaned series as follows:

res1<-plm(x~c+v,data=pdata_frame, effect="twoways", model="within", na.action=na.omit))
dem_yt<-pmodel.response(res)
demXt<-model.matrix(res)

# Given the demeaned series, I need to set the first observation(s) in each cross-section to NA in order to avoid inter-sectional links in the lagged residuals (i.e. in the MA component).
#Note: Delete the first n observations per section for a MA(n) regression. For me, an MA(1) process should be fine (I hope):

n<-1
for ( i in unique(pdata_frame$i)){
        dem_yt[na.omit(pdata_frame$i)==i][1:n]<-rep(NA,n)
        demXt$c[na.omit(pdata_frame$i)==i][1:n]<-rep(NA,n)
        demXt$v[na.omit(pdata_frame$i)==i][1:n]<-rep(NA,n)
}

#I think I should now be able to use standard ARIMA methods such as

res2<-arima(x=dem_yt,xreg=demXt,order=c(0,0,1))

#Alternatively, I tried to obtain res2 using maxLik() from the maxLik package, but I am not sure about how to specify the log-likelihood function:

tslag <- function(x, d=l)
{
  x <- as.vector(x)
  n <- length(x)
  c(rep(NA,d),x)[1:n]
}

log_Lik<-function(param) {
b1<-param[1]
b2<-param[2]
b3<-param[3]
sigma<-param[4]
ll<- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(dem_yt-(b1*demXt[,1]+b2*demXt[,2]) + b3*tslag(dem_yt-(b1*demXt[,1]+b2*demXt[,2]),d=1))^2/sigma^2)
ll
}

res2<-maxLik(logLik=log_Lik,start=c(coef(res1),1,1),method="nr")
_______________________________________


Am I on the right track? Is there an easier way to do this? Did I miss something important?

Any help is appreciated, thanks a lot in advance!

Best,
Philipp


______________________
Philipp Grueber
EBS Universitaet fuer Wirtschaft und Recht
Wiesbaden, Germany



____________________________________
EBS Universitaet fuer Wirtschaft und Recht
FARE Department
Wiesbaden/ Germany
http://www.ebs.edu/index.php?id=finacc&L=0
Reply | Threaded
Open this post in threaded view
|

Re: MA process in panels

Philipp Grueber
Dear R users,

after some time I have picked up working on this dataset again. I have found a way which produces reasonable results but I am not sure whether it is truly the correct way to go.  

I am still looking for a way to estimate a panel ARMA(1,1) with cross-sectional fixed effects (later I wish to include time fixed effects as well -- but for the sake of simplicity, I drop these for now). Since I have a large panel (see above), I wish to regress demeaned time series. After trying out a lot of different approaches (mainly using the nlme-package and the plm-package, but also trying to specify a maxLik function) without figuring out a way to estimate the ARMA, I began thinking about the arima function -- and how it could help me to solve my issue.

Please find attached my thoughts on this with an example based on the Grunfeld dataset (unfortunately I did not find a larger, more appropriate panel dataset). If my approach is incorrect, I hope these thoughts nevertheless help some more advanced R users to find a proper way to estimate panel ARMAs.

Any comment is highly appreciated!

Thank you very much again,
Philipp Grueber


# Data Import
library(plm)
library(lmtest)
data(Grunfeld)

tslag <- function(x, d=l)
{
  x <- as.vector(x)
  n <- length(x)
  c(rep(NA,d),x)[1:n]
}

#In a final dataset, lagging should be done for each cross-section. For the sake of comparability of arima(...,xreg=Grunfeld$inv1,order=c(0,0,0)) with arima(...,xreg=0,order=c(1,0,0)), let's do this:
Grunfeld$inv1<-tslag(Grunfeld$inv,d=1)

# In order to avoid differing results due to heterogenous handling of NAs:
Grunfeld<-Grunfeld[-1,]

# Create dummy variables for comparison:
mat_i<-as.data.frame.matrix(table(cbind.data.frame(N=1:nrow(Grunfeld),T=Grunfeld$firm)))[,-1]

#Now, these two should be the same, but there seems to be a convergence problem (not enough data?).
arima_0<-arima(x=Grunfeld$inv,xreg=cbind(Grunfeld$value,Grunfeld$capital,mat_i),transform.pars=F,include.mean=T,order=c(1,0,0))
coef(arima_0)[1:4]

arima_1<-arima(x=Grunfeld$inv,xreg=cbind(Grunfeld$inv1,Grunfeld$value,Grunfeld$capital,mat_i),transform.pars=F,include.mean=T,order=c(0,0,0))
coef(arima_1)[1:4]

#I can show that they are not the same but closer using better data:
a<-arima.sim(n = 10001,model=list(ar=0.5,order=c(1,0,0)))
b<-a[-1]
c<-tslag(a,d=1)[-1]
coef(lm(b~c))
coef(arima(x=b,include.mean=T,order=c(1,0,0)))
coef(arima(x=b,xreg=c,include.mean=T,order=c(0,0,0)))

coef(lm(b~c-1))
coef(arima(x=b,include.mean=F,order=c(1,0,0)))
coef(arima(x=b,include.mean=F,xreg=c,order=c(0,0,0)))


#Probably, this does not matter as for an ARMA(1,0) model, I would prefer to use OLS and thus, the lm function anyway!

#Why do I want to know? Because only if these two (arima_0 and arima_1) are the same, I would be able to use the cross-sectionally lagged series of inv, inv_1 as the lagged endogenous variable, right?
Grunfeld$inv_1<-c()
for (i in unique(Grunfeld$firm)){
        Grunfeld$inv_1[Grunfeld$firm==i]<-tslag(Grunfeld$inv[Grunfeld$firm==i],d=1)
}
#note: for the sake of comparability, I will not do so in the following.

# Create demeaned series
y_i<-Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$firm)
y1_i<-Grunfeld$inv1-ave(Grunfeld$inv1,index=Grunfeld$firm)
x1_i<-Grunfeld$value-ave(Grunfeld$value,index=Grunfeld$firm)
x2_i<-Grunfeld$capital-ave(Grunfeld$capital,index=Grunfeld$firm)

y_it1<-y_i-ave(y_i,index=Grunfeld$year)
y1_it1<-y1_i-ave(y1_i,index=Grunfeld$year)
x1_it1<-x1_i-ave(x1_i,index=Grunfeld$year)
x2_it1<-x2_i-ave(x2_i,index=Grunfeld$year)


#In order to simplify my calculation, I now wish to use demeaned data. I am able to show that these two are the same:
arima_a<-arima(x=Grunfeld$inv,xreg=cbind(Grunfeld$value,Grunfeld$capital,Grunfeld$inv1,mat_i),transform.pars=F,include.mean=T,order=c(0,0,0))
coef(arima_a)[1:4]
arima_b<-arima(x=y_i,xreg=cbind(y1_i,x1_i,x2_i),transform.pars=F,include.mean=F,order=c(0,0,0))
coef(arima_b)[1:3]

#Even though I believe I do not need a constant term (due to demeaning), here's the test:
arima_c<-arima(x=y_i,xreg=cbind(y1_i,x1_i,x2_i),transform.pars=F,include.mean=T,order=c(0,0,0))
coef(arima_c)[1:4]


#Related with the above question is the fact, that these coefficients are different from the following ones:
arima_d<-arima(x=Grunfeld$inv,xreg=cbind(Grunfeld$value,Grunfeld$capital,mat_i),transform.pars=F,include.mean=T,order=c(1,0,0))
coef(arima_d)[1:4]
arima_e<-arima(x=y_i,xreg=cbind(x1_i,x2_i),transform.pars=F,include.mean=F,order=c(1,0,0))
coef(arima_e)[1:3]
arima_f<-arima(x=y_i,xreg=cbind(x1_i,x2_i),transform.pars=F,include.mean=T,order=c(1,0,0))
coef(arima_f)[1:4]


#Finally, I wish to complete my ARMA by introducing the MA process.
arima_g<-arima(x=Grunfeld$inv,xreg=cbind(Grunfeld$value,Grunfeld$capital,Grunfeld$inv1,mat_i),transform.pars=F,include.mean=T,order=c(0,0,1))
coef(arima_g)[1:5]
arima_h<-arima(x=y_i,xreg=cbind(y1_i,x1_i,x2_i),transform.pars=F,include.mean=F,order=c(0,0,1))
coef(arima_h)[1:4]
arima_i<-arima(x=y_i,xreg=cbind(y1_i,x1_i,x2_i),transform.pars=F,include.mean=T,order=c(0,0,1))
coef(arima_i)[1:5]
#
arima_j<-arima(x=Grunfeld$inv,xreg=cbind(Grunfeld$value,Grunfeld$capital,mat_i),transform.pars=F,include.mean=T,order=c(1,0,1))
coef(arima_j)[1:5]
arima_k<-arima(x=y_i,xreg=cbind(x1_i,x2_i),transform.pars=F,include.mean=F,order=c(1,0,1))
coef(arima_k)[1:4]
arima_l<-arima(x=y_i,xreg=cbind(x1_i,x2_i),transform.pars=F,include.mean=T,order=c(1,0,1))
coef(arima_l)[1:5]


# The coefficients of models "g" through "l" are reasonably close. Am I right to assume that model "i" using arima(...,xreg=y1_i,order=c(0,0,1)) is preferred, and that even though the intercept should be 0, including a constant is safer? (note: plus I need to replace Grunfeld$inv1 by Grunfeld$inv_1...)  

 





____________________________________
EBS Universitaet fuer Wirtschaft und Recht
FARE Department
Wiesbaden/ Germany
http://www.ebs.edu/index.php?id=finacc&L=0