offset in gam and spatial scale of variables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

offset in gam and spatial scale of variables

Lucia Rueda
Hi,

We are analizing the relationship between the abundance of groupers in line transects and some variables. We are using the quasipoisson distribution. Do we need to include the length of the transects as an offset if they all have the same length??

Also, can we include in the gam models variables that are measured at different spatial scales? We have done an analysis to see what variables are better for different sizes of buffers around the transect lines and some variables are better at different scales. Can we run the gam model with several explanatory variables if they are measured at different spatial scales?

Thanks,

Lucia
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Joris FA Meys
Could you specify the package you use? If it is mgcv, this one centers your
variables before applying the smooths. That's something to take into account
when comparing different models.

In any way, If scales are too different, I try rescaling by either :
expressing things in different units (meter versus kilometer, gr)

On Wed, May 19, 2010 at 10:37 AM, Lucia Rueda <[hidden email]> wrote:

>
> Hi,
>
> We are analizing the relationship between the abundance of groupers in line
> transects and some variables. We are using the quasipoisson distribution.
> Do
> we need to include the length of the transects as an offset if they all
> have
> the same length??
>
> Also, can we include in the gam models variables that are measured at
> different spatial scales? We have done an analysis to see what variables
> are
> better for different sizes of buffers around the transect lines and some
> variables are better at different scales. Can we run the gam model with
> several explanatory variables if they are measured at different spatial
> scales?
>
> Thanks,
>
> Lucia
> --
> View this message in context:
> http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp2222483p2222483.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Joris FA Meys
In reply to this post by Lucia Rueda
Could you specify the package you use? If it is mgcv, this one centers your
variables before applying the smooths. That's something to take into account
when comparing different models.

In any way, If scales are too different, I try rescaling by either :
- expressing things in different units (meter versus kilometer, gram versus
kilogram)
- dividing by the standard deviation to get all variables appx on the same
order of magnitude. This does change the interpretation of your model
though.

But somehow I have the feeling you're not talking about that kind of
difference in scales. Could you please explain a bit more in detail what it
is exactly you're trying to do? I also suspect some autocorrelation problem,
which would direct you towards a gamm method.

Cheers
Joris

On Wed, May 19, 2010 at 10:37 AM, Lucia Rueda <[hidden email]> wrote:

>
> Hi,
>
> We are analizing the relationship between the abundance of groupers in line
> transects and some variables. We are using the quasipoisson distribution.
> Do
> we need to include the length of the transects as an offset if they all
> have
> the same length??
>
> Also, can we include in the gam models variables that are measured at
> different spatial scales? We have done an analysis to see what variables
> are
> better for different sizes of buffers around the transect lines and some
> variables are better at different scales. Can we run the gam model with
> several explanatory variables if they are measured at different spatial
> scales?
>
> Thanks,
>
> Lucia
> --
> View this message in context:
> http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp2222483p2222483.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Lucia Rueda
Hi Joris,

We're using mgcv.

We have data on abundance of groupers on line transects that have the same legth. My coworker has selected a bunch of variables and he has calculated them in terms of total area in different sizes of buffers around the centroid of the transect. He has run gam models (quasipoisson, mgcv) for each explanatory variable at each size of buffer. Then he has selected the signifficant variables. Some variables explain a higher percentage of deviance at different sizes of buffers. And now he wants to build a gam model trying the different explanatory variables but using the values that correspond to the size of the buffer where they explain a higher deviance, so one variable might have the values of a smaller scale whereas other might correspond to a higher buffer size (I don't know if I made myself clear). I am wondering if this is correct.

Also I don't know if he should include an offset in spite all the transects have the same length.

I'm in charge of looking at the spatial correlation once he builds the model. I don't know much about it but I was thinking of doing a Moran test, correlogram and variogram and then if there's spatial autocorrelation doing gamm, sar or gee.

Thanks,

Lucia
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Simon Wood-4
In reply to this post by Joris FA Meys
On Wednesday 19 May 2010 15:29, Joris Meys wrote:
> Could you specify the package you use? If it is mgcv, this one centers your
> variables before applying the smooths. That's something to take into
> account when comparing different models.
--- er, actually it only centres variables in this way for some smoothing
bases, for numerical stability purposes: but this is done in a way that is
user transparent and makes absolutely no difference to model interpretation
or comparison. Of course the smooths themselves are subject to `centering
constraints'  (but that's very different to centering the variables) ---  
these are just identifiability constraints --- all gam fitting packages have
to put some identifiability constraints on the smooths, and the centering
constraints used by `mgcv' and `gam' have the benefit of minimizing the
standard errors on the constrained smooths.

best,
Simon

--
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Simon Wood-4
In reply to this post by Lucia Rueda

> We are analizing the relationship between the abundance of groupers in line
> transects and some variables. We are using the quasipoisson distribution.
> Do we need to include the length of the transects as an offset if they all
> have the same length??
--- not just for fitting, I suppose: although I guess you may need some care
in interpreting the units of the fitted model predictions, if you leave it
out.

> Also, can we include in the gam models variables that are measured at
> different spatial scales? We have done an analysis to see what variables
> are better for different sizes of buffers around the transect lines and
> some variables are better at different scales. Can we run the gam model
> with several explanatory variables if they are measured at different
> spatial scales?
--- Do you mean, for example, that that sea surface temperature was measured
every in 10km grid squares by satellite, whereas salinity was measured every
quarter nautical mile directly?

--- If so, I think that you can use such data, but you  need a clear method
for converting what is measured about the covariate to  a covariate value
associated with each response measurement. As an example you might have
salinity measures that are widely scattered, and do not coincide with the
locations of response measurements. One option is to smooth or interpolate
the salinity values, and use the resulting predicted salinities at each
response datum location as covariates. Of course if you do this sort of thing
it's important that only such predicted salinities are used for predicting
from the model (i.e. not to switch to direct measurements of salinity for
prediction)

best,
Simon

>
> Thanks,
>
> Lucia

--
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Joris FA Meys
In reply to this post by Simon Wood-4
Thank you for the correction. I was thinking about the difference of using a
variable with a smoother, and comparing that to a model with that variable
without smoother. I should specify that I mostly use thin plate regression
splines.

If the spline itself is not deviating from linearity, you get a nice
straight line going through the point (mean(Data),0) if you look at the
marginal plots. Use the same variable unchanged but as a simple linear
effect in the model, and the same line will run through (0,0). At least,
that's what I noticed and also the reason why I center my variables first.
The models are essentially the same, the shift is mainly in the intercept.
But the centering got a bit a reflex.

Cheers
Joris

On Wed, May 19, 2010 at 8:20 PM, Simon Wood <[hidden email]> wrote:

> On Wednesday 19 May 2010 15:29, Joris Meys wrote:
> > Could you specify the package you use? If it is mgcv, this one centers
> your
> > variables before applying the smooths. That's something to take into
> > account when comparing different models.
> --- er, actually it only centres variables in this way for some smoothing
> bases, for numerical stability purposes: but this is done in a way that is
> user transparent and makes absolutely no difference to model interpretation
> or comparison. Of course the smooths themselves are subject to `centering
> constraints'  (but that's very different to centering the variables) ---
> these are just identifiability constraints --- all gam fitting packages
> have
> to put some identifiability constraints on the smooths, and the centering
> constraints used by `mgcv' and `gam' have the benefit of minimizing the
> standard errors on the constrained smooths.
>
> best,
> Simon
>
> --
> > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > +44 1225 386603  www.maths.bath.ac.uk/~sw283<http://www.maths.bath.ac.uk/%7Esw283>
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Joris FA Meys
In reply to this post by Lucia Rueda
On Wed, May 19, 2010 at 4:51 PM, Lucia Rueda <[hidden email]> wrote:

>
> Hi Joris,
>
> We're using mgcv.
>
> We have data on abundance of groupers on line transects that have the same
> length.

I only now realized groupers are actually fish :-). Should work on my
english skills...


> My coworker has selected a bunch of variables and he has calculated
> them in terms of total area in different sizes of buffers around the
> centroid of the transect. He has run gam models (quasipoisson, mgcv) for
> each explanatory variable at each size of buffer.


Here you lost me a bit. How should I imagine those buffers? Is it, as Simon
said, some area? Then that would mean you measure eg salinity along the
transect, and average the numbers using a window of a specific size? Or am I
seeing it wrong?

Then he has selected the
> signifficant variables. Some variables explain a higher percentage of
> deviance at different sizes of buffers. And now he wants to build a gam
> model trying the different explanatory variables but using the values that
> correspond to the size of the buffer where they explain a higher deviance,
> so one variable might have the values of a smaller scale whereas other
> might
> correspond to a higher buffer size (I don't know if I made myself clear). I
> am wondering if this is correct.
>

It seems not correct to me. Model building in these frameworks, especially
when using inference, should be driven by hypothesis, not by any correlation
in the data. Especially with smooths one has to be very careful.

Another issue is the correlation between environmental variables, They often
covary along transects, meaning that you can have confounding and even
aliasing in your dataset. This has to be checked and taken into account
_before_ building the models. I have the impression that his approach does
not take care of this.

Next, I believe that data should be used as raw as possible, to not
jeopardize the interpretation. If you use different buffer sizes, you can't
just say that variable X and Y contribute significantly to the explanation
of the variation, but that variable X and Y contributes significantly,
depending on the scale it is measured.

It also depends on whether your goal is purely predictive, or if you want to
do inference. In case you want to conclude something about the significance
of the parameters, his approach seems unvalid to me. How to explain that the
significance of a variable depends on the scale of measurement? One assumes
a continuous relation -unless working with factors- so the scale shouldn't
make much of a difference anyway. If you can predict the number of groupers
by the amount of bald men in Hong-Kong, by all means, do so. But I wouldn't
formulate a scientific conclusion based on the significance of that model,
if you get my drift.

Also I don't know if he should include an offset in spite all the transects
> have the same length.
>
Do you mean an intercept? In that case I'd always include one, except in
very specific cases.

>
> I'm in charge of looking at the spatial correlation once he builds the
> model. I don't know much about it but I was thinking of doing a Moran test,
> correlogram and variogram and then if there's spatial autocorrelation doing
> gamm, sar or gee.
>
Gamm is a very powerful tool, but -if I understood Simon's book correctly-
you cannot trust the anova's on the gam-component of the gamm-object when
using link functions. LR tests can give some information, but there is not a
solid statistical framework yet for formal hypothesis testing of those
models.

I also wonder why building a model without, and then doing the same with the
correct variance-covariance structure. Personally, I'd do it the other way
around. Not that it will change much about the predictions, but it
definitely will change the inference.

In any case, all of these are my personal opinions on a problem I do not
understand fully. It's some general considerations, feel free to think
different.

>
> Thanks,
>
> Lucia
> --
> View this message in context:
> http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp2222483p2222976.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Lucia Rueda
Hi,

Thanks for the inputs. I talked to my coworker, who has been the one doing the analysis. Perhaps I wasn't making myself clear about the “differences in spatial scales”.  Here is what he says:

"The truth is that measuring scales (i.e all area related variable are measured in m2) and spatial definition of initial cartography are homogeneous among extracted variables. But all variables (ie. sum of the total rocky bottom in the surrounding area) are computed for each different integration areas (buffer) (i.e in an area of 40squaremeters around the sample, in an area of 80m2, …).
The question is then if we can build a model that includes variables measured at different buffers (for example a model that includes 3 variables:  1.-  the amount of rocky bottom in an area of 80m2 ; 2- the amount of sandy bottom in an area of 200m2; and the mean depth calculated in a surrounding area of 50m2) considering that each variable may be expressing different ecological processes. I believe that if there is not an ecological constrain in the interpretation of the variables (and their ecological effect over the specie), including them in a model is correct, unless there is not a mathematical constrain."

Also, about the spatial correlation I thought from what I've read so far that I had to build the model and then check if there was spatial correlation in the residuals since they are supposed to be i.i.d. And if it turns out that they are then I have to do something about it like gamm, gee, sar, car, etc.

Cheers,

Lucia

Reply | Threaded
Open this post in threaded view
|

Re: offset in gam and spatial scale of variables

Joris FA Meys
On Thu, May 20, 2010 at 3:20 PM, Lucia Rueda <[hidden email]> wrote:

>
> Hi,
>
> Thanks for the inputs. I talked to my coworker, who has been the one doing
> the analysis. Perhaps I wasn't making myself clear about the “differences
> in
> spatial scales”.  Here is what he says:
>
> "The truth is that measuring scales (i.e all area related variable are
> measured in m2) and spatial definition of initial cartography are
> homogeneous among extracted variables. But all variables (ie. sum of the
> total rocky bottom in the surrounding area) are computed for each different
> integration areas (buffer) (i.e in an area of 40squaremeters around the
> sample, in an area of 80m2, …).
> The question is then if we can build a model that includes variables
> measured at different buffers (for example a model that includes 3
> variables:  1.-  the amount of rocky bottom in an area of 80m2 ; 2- the
> amount of sandy bottom in an area of 200m2; and the mean depth calculated
> in
> a surrounding area of 50m2) considering that each variable may be
> expressing
> different ecological processes. I believe that if there is not an
> ecological
> constrain in the interpretation of the variables (and their ecological
> effect over the specie), including them in a model is correct, unless there
> is not a mathematical constrain."
>
If you look upon it that way, you might indeed consider using them in
different buffers, but as you said, you should be able to interprete them in
an ecological way. I'd be surprised if depth and bottom have a different
effect-scale, as they both are related to the territorium of the animal.
Plus, you cannot conclude anything from the difference in deviance
explained. You can't say anything about the homerange or so based on the
observation that more deviance is explained when looking on a scale of 200m2
for example. So if you have good ecological reasons to include them, you
can, but if it's merely because on one scale they explain more of the
deviance, I still believe it is a very dangerous approach...


> Also, about the spatial correlation I thought from what I've read so far
> that I had to build the model and then check if there was spatial
> correlation in the residuals since they are supposed to be i.i.d. And if it
> turns out that they are then I have to do something about it like gamm,
> gee,
> sar, car, etc.
>
That's an approach that is often used. In essence, that's true. Correlation
between the raw data can be due to cocorrelation with some other factor in
space or time. But a pre-analysis of correlations and autocorrelations can
tell you already quite some. In any case, you always have to check the
residuals after the model building. My main point was that using the
correlation will definitely influence the significance of the parameters.

Anyway, good luck with it. I learnt pretty fast that as long as you can
explain what you're doing and why you're doing it, there's a big grey zone
between right and wrong. Otherwise it wouldn't be statistics, would it? ;-)

Cheers
Joris

>
> Cheers,
>
> Lucia
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp2222483p2224528.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.