|
I fit a simple linear model y = bX to a data set today, and that produced 24 residuals (I have 24 data points, one for each year from 1984-2007). I would like to test the time-independence of the residuals of my model, and I was recommended by my supervisor to use the Ljung-Box test. The Box.test function in R takes 4 arguments:
x a numeric vector or univariate time series. lag the statistic will be based on lag autocorrelation coefficients. type test to be performed: partial matching is used. fitdf number of degrees of freedom to be subtracted if x is a series of residuals. Unfortunately, I never took a statistics class where I learned the Ljung-Box test, and information about it online is hard to find. What does "lag" mean, and what value would you guys recommend I use for the test? Also, what does "fitdf" represent, and what would the value for that parameter be in my case? Finally, the value of x is a vector of my 24 residuals, correct? Thank you all so much. I apologize for the basic nature of the question. Steven [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hello,
That's a statistics question, but it's also about using an R function. The Ljung-Box test isn't supposed to be used in such a context, to test the residuals of an ols y = bX + e. It is used to test time independence of the original series or of the residuals of an ARMA(p, q) fit. In both cases you are right, 'x' is a series. 'lag' can be explained as follows: you have a time series and want to know if the value observed today depends on what was observed in the past. Then, a linear regression of "today" on "yesterday" could be X[t] = b[1]*X[t-1] + e[t], e ~ Normal(0, sigma^2) A linear regression on two time units in the past would be X[t] = b[1]*X[t-1] + b[2]*X[t-2] + e[t], e ~ Normal(0, sigma^2) etc. This is a regression of the series on itself lagged by a certain number of time units, the present is regressed on the past. Function ar() fits this kind of model to a time series. In the first case, the order is p=1, in the second, p=2. Now, in the first case, is there second order serial correlation? Test the residuals with lag=2, fitdf=1, the value of p. Third order? lag=3, fitdf=p=1, etc. You are NOT fitting this type of model, so the Ljung-Box test is misused. Test the original series with default parameters, lag=1. If there is serial correlation, fit an AR (Auto-Regressive) model with ar(). See the help page ?ar. And see a statiscian with experience in time series. It's a world on its own, I haven't even mentioned seasonality. And almost everything else about time series. Do ask someone near you. Hope this helps, Rui Barradas Em 26-06-2012 19:01, Steven Winter escreveu: > I fit a simple linear model y = bX to a data set today, and that produced 24 residuals (I have 24 data points, one for each year from 1984-2007). I would like to test the time-independence of the residuals of my model, and I was recommended by my supervisor to use the Ljung-Box test. The Box.test function in R takes 4 arguments: > > x a numeric vector or univariate time series. > lag the statistic will be based on lag autocorrelation > coefficients. > type test to be performed: partial matching is used. > fitdf number of degrees of freedom to be subtracted if x is a series of residuals. > > Unfortunately, I never took a statistics class where I learned the Ljung-Box test, and information about it online is hard to find. What does "lag" mean, and what value would you guys recommend I use for the test? Also, what does "fitdf" represent, and what would the value for that parameter be in my case? Finally, the value of x is a vector of my 24 residuals, correct? > > Thank you all so much. I apologize for the basic nature of the question. > > Steven > [[alternative HTML version deleted]] > > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hello,
No, the Ljung-Box test wouldn't be inappropriate in that case. First you detrend the series and then test for serial independence. It's even usual to do so. I would use the default values for lag and fitdf. But use type="Ljung", the Box-Pierce test is nowadays seldom used in pratice, if at all. It does have great historical and pedagogic interess, the Ljung-Box test statistic follows it and corrects its variance estimation's bias. The parameter fitdf is relevant if you test the residuals of a fitted ARMA(p, q) model, which isn't the case, so keep it equal to zero. In that case, lag is chosen such that lag > fitdf. 1 will do. Oh, and please, it's Rui, not Mr. Rui Barradas Em 27-06-2012 16:55, Steven Winter escreveu: > Dear Mr. Barradas, > > Thank you for your help. Let's say I have the yearly standard deviation > of temperatures over New York City for the past 24 years. So, there are > 24 data points. I would like to put a linear/quadratic/some kind of > model on top of the data to show that there might be a trend in the data > over time. But to do so, I have to test the time independence of the > residuals. Would you say the Ljung-Box test is inappropriate in this > case? If so, what would be my values for "lag" and "fitdf" that I plug > into the Box.test function in R? > > Thank you, > Steven > > ------------------------------------------------------------------------ > *From:* Rui Barradas <[hidden email]> > *To:* Steven Winter <[hidden email]> > *Cc:* [hidden email] > *Sent:* Tuesday, June 26, 2012 3:13 PM > *Subject:* Re: [R] Ljung-Box test (Box.test) > > Hello, > > That's a statistics question, but it's also about using an R function. > > The Ljung-Box test isn't supposed to be used in such a context, to test > the residuals of an ols y = bX + e. It is used to test time independence > of the original series or of the residuals of an ARMA(p, q) fit. > > In both cases you are right, 'x' is a series. > 'lag' can be explained as follows: you have a time series and want to > know if the value observed today depends on what was observed in the > past. Then, a linear regression of "today" on "yesterday" could be > > X[t] = b[1]*X[t-1] + e[t], e ~ Normal(0, sigma^2) > > A linear regression on two time units in the past would be > > X[t] = b[1]*X[t-1] + b[2]*X[t-2] + e[t], e ~ Normal(0, sigma^2) > > etc. This is a regression of the series on itself lagged by a certain > number of time units, the present is regressed on the past. Function > ar() fits this kind of model to a time series. In the first case, the > order is p=1, in the second, p=2. > > Now, in the first case, is there second order serial correlation? Test > the residuals with lag=2, fitdf=1, the value of p. Third order? lag=3, > fitdf=p=1, etc. > > You are NOT fitting this type of model, so the Ljung-Box test is > misused. Test the original series with default parameters, lag=1. If > there is serial correlation, fit an AR (Auto-Regressive) model with > ar(). See the help page ?ar. And see a statiscian with experience in > time series. It's a world on its own, I haven't even mentioned > seasonality. And almost everything else about time series. > > Do ask someone near you. > > Hope this helps, > > Rui Barradas > Em 26-06-2012 19:01, Steven Winter escreveu: > > I fit a simple linear model y = bX to a data set today, and that > produced 24 residuals (I have 24 data points, one for each year from > 1984-2007). I would like to test the time-independence of the residuals > of my model, and I was recommended by my supervisor to use the Ljung-Box > test. The Box.test function in R takes 4 arguments: > > > > x a numeric vector or univariate time series. > > lag the statistic will be based on lag autocorrelation > > coefficients. > > type test to be performed: partial matching is used. > > fitdf number of degrees of freedom to be subtracted if x is a series > of residuals. > > > > Unfortunately, I never took a statistics class where I learned the > Ljung-Box test, and information about it online is hard to find. What > does "lag" mean, and what value would you guys recommend I use for the > test? Also, what does "fitdf" represent, and what would the value for > that parameter be in my case? Finally, the value of x is a vector of my > 24 residuals, correct? > > > > Thank you all so much. I apologize for the basic nature of the question. > > > > Steven > > [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > [hidden email] <mailto:[hidden email]> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
