glm: offset

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

glm: offset

John Sorkin
R 2.6.0
Windows XP

A question about running a generalized linear model.

I am running a glm with
(1) a poisson distribution and a log link:
   family=poisson(link = "log")
and an offset.
I would like to know if I should express the offset as the log of the offset value, i.e.
offset=log(NumUniqPt)
or as:
offset=NumUniqPt

I suspect I need to use the log, bu t I can't find any discussion of this in MASS 1994 or on the man page for glm.
Thanks
John


John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glm: offset

Simon Blomberg-4
Yes, use the log. I've had the same problem in the past, too. Try it on
a toy example to confirm it for yourself.

Cheers,

Simon.

On Sun, 2008-03-02 at 22:01 -0500, John Sorkin wrote:

> R 2.6.0
> Windows XP
>
> A question about running a generalized linear model.
>
> I am running a glm with
> (1) a poisson distribution and a log link:
>    family=poisson(link = "log")
> and an offset.
> I would like to know if I should express the offset as the log of the offset value, i.e.
> offset=log(NumUniqPt)
> or as:
> offset=NumUniqPt
>
> I suspect I need to use the log, bu t I can't find any discussion of this in MASS 1994 or on the man page for glm.
> Thanks
> John
>
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> Confidentiality Statement:
> This email message, including any attachments, is for th...{{dropped:6}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glm: offset

Wensui Liu
In reply to this post by John Sorkin
HI, John,
my understanding is that you should use log(...) instead of its
original scale. Below is the logic in the case of poisson reg.
log(y / offset) = x'b
=> log(y) - log(offset) = x'b
=> log(y) = x'b + log(offset)


On Sun, Mar 2, 2008 at 10:01 PM, John Sorkin
<[hidden email]> wrote:

> R 2.6.0
>  Windows XP
>
>  A question about running a generalized linear model.
>
>  I am running a glm with
>  (1) a poisson distribution and a log link:
>    family=poisson(link = "log")
>  and an offset.
>  I would like to know if I should express the offset as the log of the offset value, i.e.
>  offset=log(NumUniqPt)
>  or as:
>  offset=NumUniqPt
>
>  I suspect I need to use the log, bu t I can't find any discussion of this in MASS 1994 or on the man page for glm.
>  Thanks
>  John
>
>
>  John Sorkin M.D., Ph.D.
>  Chief, Biostatistics and Informatics
>  University of Maryland School of Medicine Division of Gerontology
>  Baltimore VA Medical Center
>  10 North Greene Street
>  GRECC (BT/18/GR)
>  Baltimore, MD 21201-1524
>  (Phone) 410-605-7119
>  (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>  Confidentiality Statement:
>  This email message, including any attachments, is for th...{{dropped:6}}
>
>  ______________________________________________
>  [hidden email] mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>



--
===============================
WenSui Liu
ChoicePoint Precision Marketing
Phone: 678-893-9457
Email : [hidden email]
Blog   : statcompute.spaces.live.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glm: offset

Ted.Harding-2
On 03-Mar-08 03:19:01, Wensui Liu wrote:
> HI, John,
> my understanding is that you should use log(...) instead of its
> original scale. Below is the logic in the case of poisson reg.
> log(y / offset) = x'b
> => log(y) - log(offset) = x'b
> => log(y) = x'b + log(offset)

Well, this is where it gets interesting!
The above statement of the "logic" begs the question (i.e. assumes
the answer).

I would go according to the general interpretation of "offset"
in LM and GLM modelling -- an "offset" is

  "a quantitative variable whose regression coefficient
   is known to be 1"
  [McCullough and Nelder (1983) "Generalised Linear Models",
    page 138]

Since the GLM for a Poisson regression with log link is to model

  L = log(mu) = a + b1*X1 + B2*X2 + ...

mu is the Poisson mean, and where X1, X2, ... are the raw
(untransformed, unless you have other reasons for tranforming
them prior to bringing them into the regression) explanatory
variables, if X1 is the variable you wish to use as "offset"
in the above sense then it should be used un-transformed.
On this basis, the answer to John Sorkin's question should be:
don't use log(NumUniPt), use NumUniPt.

There's a potential confusion here in that presumably
"NumUniPt" may be a positive variable whose distribution
in the data may be skew, i.e. the sort of variable that
you may feel urged to take the log of before using it.

But that would be an "other reason" in the sense of my
comment above.

After all, suppose "NumUniPt" denoted a variable in the
data that could take negative values. Would you be happy
to use log(NumUniPt) in that case?

Best wishes to all,
Ted.


> On Sun, Mar 2, 2008 at 10:01 PM, John Sorkin
> <[hidden email]> wrote:
>> R 2.6.0
>>  Windows XP
>>
>>  A question about running a generalized linear model.
>>
>>  I am running a glm with
>>  (1) a poisson distribution and a log link:
>>    family=poisson(link = "log")
>>  and an offset.
>>  I would like to know if I should express the offset as the log of the
>>  offset value, i.e.
>>  offset=log(NumUniqPt)
>>  or as:
>>  offset=NumUniqPt
>>
>>  I suspect I need to use the log, bu t I can't find any discussion of
>>  this in MASS 1994 or on the man page for glm.
>>  Thanks
>>  John
>>
>>
>>  John Sorkin M.D., Ph.D.
>>  Chief, Biostatistics and Informatics
>>  University of Maryland School of Medicine Division of Gerontology
>>  Baltimore VA Medical Center
>>  10 North Greene Street
>>  GRECC (BT/18/GR)
>>  Baltimore, MD 21201-1524
>>  (Phone) 410-605-7119
>>  (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>
>>  Confidentiality Statement:
>>  This email message, including any attachments, is for
>>  th...{{dropped:6}}
>>
>>  ______________________________________________
>>  [hidden email] mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>  PLEASE do read the posting guide
>>  http://www.R-project.org/posting-guide.html
>>  and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> ===============================
> WenSui Liu
> ChoicePoint Precision Marketing
> Phone: 678-893-9457
> Email : [hidden email]
> Blog   : statcompute.spaces.live.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Mar-08                                       Time: 07:51:32
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glm: offset

Prof Brian Ripley
On Mon, 3 Mar 2008, [hidden email] wrote:

> On 03-Mar-08 03:19:01, Wensui Liu wrote:
>> HI, John,
>> my understanding is that you should use log(...) instead of its
>> original scale. Below is the logic in the case of poisson reg.
>> log(y / offset) = x'b
>> => log(y) - log(offset) = x'b
>> => log(y) = x'b + log(offset)
>
> Well, this is where it gets interesting!
> The above statement of the "logic" begs the question (i.e. assumes
> the answer).
>
> I would go according to the general interpretation of "offset"
> in LM and GLM modelling -- an "offset" is
>
>  "a quantitative variable whose regression coefficient
>   is known to be 1"
>  [McCullough and Nelder (1983) "Generalised Linear Models",
>    page 138]

Yes, and that is how it is defined in R too -- see ?offset.

The issue is more what you want to do with the offset.  In a Poisson
regression, the offset is most often used to include exposure time, the
Poisson model being for log rate.  Thus

mu = lambda*T, log(lamba) = Xb

means

log(mu) = Xb + log(T)

is the model for Poisson counts of occurrences in time intervals and hence
the offset is log(T).

As ?offset hints, there are examples under ?glm (taken from MASS) and for
dataset Insurance in package MASS.  One with non-logged offset and one
with ....



> Since the GLM for a Poisson regression with log link is to model
>
>  L = log(mu) = a + b1*X1 + B2*X2 + ...
>
> mu is the Poisson mean, and where X1, X2, ... are the raw
> (untransformed, unless you have other reasons for tranforming
> them prior to bringing them into the regression) explanatory
> variables, if X1 is the variable you wish to use as "offset"
> in the above sense then it should be used un-transformed.
> On this basis, the answer to John Sorkin's question should be:
> don't use log(NumUniPt), use NumUniPt.
>
> There's a potential confusion here in that presumably
> "NumUniPt" may be a positive variable whose distribution
> in the data may be skew, i.e. the sort of variable that
> you may feel urged to take the log of before using it.
>
> But that would be an "other reason" in the sense of my
> comment above.
>
> After all, suppose "NumUniPt" denoted a variable in the
> data that could take negative values. Would you be happy
> to use log(NumUniPt) in that case?
>
> Best wishes to all,
> Ted.
>
>
>> On Sun, Mar 2, 2008 at 10:01 PM, John Sorkin
>> <[hidden email]> wrote:
>>> R 2.6.0
>>>  Windows XP
>>>
>>>  A question about running a generalized linear model.
>>>
>>>  I am running a glm with
>>>  (1) a poisson distribution and a log link:
>>>    family=poisson(link = "log")
>>>  and an offset.
>>>  I would like to know if I should express the offset as the log of the
>>>  offset value, i.e.
>>>  offset=log(NumUniqPt)
>>>  or as:
>>>  offset=NumUniqPt
>>>
>>>  I suspect I need to use the log, bu t I can't find any discussion of
>>>  this in MASS 1994 or on the man page for glm.
>>>  Thanks
>>>  John
>>>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.