Behavior of ordered factors in glm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Behavior of ordered factors in glm

David Winsemius
I have a variable which is roughly age categories in decades. In the
original data, it came in coded:
> str(xxx)
'data.frame':   58271 obs. of  29 variables:
 $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1 1...
snip

I then defined issuecat as ordered:
> xxx$issuecat<-as.ordered(xxx$issuecat)

When I include issuecat in a glm model, the result makes me think I
have asked R for a linear+quadratic+cubic+quartic polynomial fit. The
results are not terribly surprising under that interpretation, but I
was hoping for only a linear term (which I was taught to called a "test
of trend"), at least as a starting point.

> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
> summary(age.mdl)

Call:
glm(formula = actual ~ issuecat, family = "poisson", data = xxx)

Deviance Residuals:
    Min       1Q   Median       3Q      Max  
-0.3190  -0.2262  -0.1649  -0.1221   5.4776  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
issuecat.Q  -0.06568    0.11842  -0.555    0.579    
issuecat.C   0.08838    0.09737   0.908    0.364    
issuecat^4  -0.02701    0.07786  -0.347    0.729

This also means my advice to a another poster this morning may have
been misleading. I have tried puzzling out what I don't understand by
looking at indices or searching in MASSv2, the Blue Book, Thompson's
application of R to Agresti's text, and the FAQ, so far without
success. What I would like to achieve is having the lowest age category
be a reference category (with the intercept being the log-rate) and
each succeeding age category  be incremented by 1. The linear estimate
would be the log(risk-ratio) for increasing ages. I don't want the
higher order polynomial estimates. Am I hoping for too much?

--
David Winsemius
using R 2.6.1 in WinXP

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of ordered factors in glm

David Winsemius
David Winsemius <[hidden email]> wrote in
news:Xns9A1CC05755274dNOTwinscomcast@80.91.229.13:

> I have a variable which is roughly age categories in decades. In the
> original data, it came in coded:
>> str(xxx)
> 'data.frame':   58271 obs. of  29 variables:
>  $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1
>  1...
> snip
>
> I then defined issuecat as ordered:
>> xxx$issuecat<-as.ordered(xxx$issuecat)
>
> When I include issuecat in a glm model, the result makes me think I
> have asked R for a linear+quadratic+cubic+quartic polynomial fit.
> The results are not terribly surprising under that interpretation,
> but I was hoping for only a linear term (which I was taught to
> call a "test of trend"), at least as a starting point.
>
>> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
>> summary(age.mdl)
>
> Call:
> glm(formula = actual ~ issuecat, family = "poisson", data = xxx)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max  
> -0.3190  -0.2262  -0.1649  -0.1221   5.4776  
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
> issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
> issuecat.Q  -0.06568    0.11842  -0.555    0.579    
> issuecat.C   0.08838    0.09737   0.908    0.364    
> issuecat^4  -0.02701    0.07786  -0.347    0.729
>
> This also means my advice to a another poster this morning may have
> been misleading. I have tried puzzling out what I don't understand
> by looking at indices or searching in MASSv2, the Blue Book,
> Thompson's application of R to Agresti's text, and the FAQ, so far
> without success. What I would like to achieve is having the lowest
> age category be a reference category (with the intercept being the
> log-rate) and each succeeding age category  be incremented by 1. The
> linear estimate would be the log(risk-ratio) for increasing ages. I
> don't want the higher order polynomial estimates. Am I hoping for
> too much?
>

I acheived what I needed by:

> xxx$agecat<-as.numeric(xxx$issuecat)
> xxx$agecat<-xxx$agecat-1

The results look quite sensible:
> exp.mdl<-glm(actual~gendercat+agecat+smokecat, data=xxx,
family="poisson", offset=expected)
> summary(exp.mdl)

Call:
glm(formula = actual ~ gendercat + agecat + smokecat, family =
"poisson",
    data = xxx, offset = expected)

Deviance Residuals:
    Min       1Q   Median       3Q      Max  
-0.5596  -0.2327  -0.1671  -0.1199   5.2386  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.89410    0.11009 -53.539  < 2e-16 ***
gendercatMale    0.29660    0.06426   4.615 3.92e-06 ***
agecat           0.66143    0.02958  22.360  < 2e-16 ***
smokecatSmoker   0.22178    0.07870   2.818  0.00483 **
smokecatUnknown  0.02378    0.08607   0.276  0.78233

I remain curious about how to correctly control ordered factors, or I
should just simply avoid them.

--
David Winsemius

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of ordered factors in glm

Marc Schwartz
In reply to this post by David Winsemius
David Winsemius wrote:

> I have a variable which is roughly age categories in decades. In the
> original data, it came in coded:
>> str(xxx)
> 'data.frame':   58271 obs. of  29 variables:
>  $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1 1...
> snip
>
> I then defined issuecat as ordered:
>> xxx$issuecat<-as.ordered(xxx$issuecat)
>
> When I include issuecat in a glm model, the result makes me think I
> have asked R for a linear+quadratic+cubic+quartic polynomial fit. The
> results are not terribly surprising under that interpretation, but I
> was hoping for only a linear term (which I was taught to called a "test
> of trend"), at least as a starting point.
>
>> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
>> summary(age.mdl)
>
> Call:
> glm(formula = actual ~ issuecat, family = "poisson", data = xxx)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max  
> -0.3190  -0.2262  -0.1649  -0.1221   5.4776  
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
> issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
> issuecat.Q  -0.06568    0.11842  -0.555    0.579    
> issuecat.C   0.08838    0.09737   0.908    0.364    
> issuecat^4  -0.02701    0.07786  -0.347    0.729
>
> This also means my advice to a another poster this morning may have
> been misleading. I have tried puzzling out what I don't understand by
> looking at indices or searching in MASSv2, the Blue Book, Thompson's
> application of R to Agresti's text, and the FAQ, so far without
> success. What I would like to achieve is having the lowest age category
> be a reference category (with the intercept being the log-rate) and
> each succeeding age category  be incremented by 1. The linear estimate
> would be the log(risk-ratio) for increasing ages. I don't want the
> higher order polynomial estimates. Am I hoping for too much?



David,

What you are seeing is the impact of using ordered factors versus
unordered factors.

Reading ?options, you will note:

contrasts:
the default contrasts used in model fitting such as with aov or lm. A
character vector of length two, the first giving the function to be used
with unordered factors and the second the function to be used with
ordered factors. By default the elements are named c("unordered",
"ordered"), but the names are unused.


The default in R (which is not the same as S-PLUS) is:

 > options("contrasts")
$contrasts
         unordered           ordered
"contr.treatment"      "contr.poly"


Thus, note that when using ordered factors, the default handling of
factors is contr.poly. Reading ?contrast, you will note:

   contr.poly returns contrasts based on orthogonal polynomials.


To show a quick and dirty example from ?glm:


counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)


# First, the default with outcome as an unordered factor:
 > summary(glm(counts ~ outcome, family=poisson()))

Call:
glm(formula = counts ~ outcome, family = poisson())

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-0.9666  -0.6712  -0.1696   0.8472   1.0494

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   3.0445     0.1260  24.165   <2e-16 ***
outcome2     -0.4543     0.2022  -2.247   0.0246 *
outcome3     -0.2930     0.1927  -1.520   0.1285
...



# Now using outcome as an ordered factor:
 > summary(glm(counts ~ as.ordered(outcome), family=poisson()))

Call:
glm(formula = counts ~ as.ordered(outcome), family = poisson())

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-0.9666  -0.6712  -0.1696   0.8472   1.0494

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)             2.7954     0.0831  33.640   <2e-16 ***
as.ordered(outcome).L  -0.2072     0.1363  -1.520   0.1285
as.ordered(outcome).Q   0.2513     0.1512   1.662   0.0965 .
...


Unfortunately, MASSv2 is the only one of the four editions that I do not
have for some reason. In MASSv4, this is covered starting on page 146.
This is also covered in an Intro to R, in section 11.1.1 on contrasts.

For typical clinical applications, the default treatment contrasts are
sufficient, whereby the first level of the factor is considered the
reference level and all others are compared against it. Thus, using
unordered factors is more common, at least in my experience and likely
the etiology of the difference between S-PLUS and R in this regard.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of ordered factors in glm

Duncan Murdoch
In reply to this post by David Winsemius
On 05/01/2008 7:16 PM, David Winsemius wrote:

> David Winsemius <[hidden email]> wrote in
> news:Xns9A1CC05755274dNOTwinscomcast@80.91.229.13:
>
>> I have a variable which is roughly age categories in decades. In the
>> original data, it came in coded:
>>> str(xxx)
>> 'data.frame':   58271 obs. of  29 variables:
>>  $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1
>>  1...
>> snip
>>
>> I then defined issuecat as ordered:
>>> xxx$issuecat<-as.ordered(xxx$issuecat)
>> When I include issuecat in a glm model, the result makes me think I
>> have asked R for a linear+quadratic+cubic+quartic polynomial fit.
>> The results are not terribly surprising under that interpretation,
>> but I was hoping for only a linear term (which I was taught to
>> call a "test of trend"), at least as a starting point.
>>
>>> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
>>> summary(age.mdl)
>> Call:
>> glm(formula = actual ~ issuecat, family = "poisson", data = xxx)
>>
>> Deviance Residuals:
>>     Min       1Q   Median       3Q      Max  
>> -0.3190  -0.2262  -0.1649  -0.1221   5.4776  
>>
>> Coefficients:
>>             Estimate Std. Error z value Pr(>|z|)    
>> (Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
>> issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
>> issuecat.Q  -0.06568    0.11842  -0.555    0.579    
>> issuecat.C   0.08838    0.09737   0.908    0.364    
>> issuecat^4  -0.02701    0.07786  -0.347    0.729
>>
>> This also means my advice to a another poster this morning may have
>> been misleading. I have tried puzzling out what I don't understand
>> by looking at indices or searching in MASSv2, the Blue Book,
>> Thompson's application of R to Agresti's text, and the FAQ, so far
>> without success. What I would like to achieve is having the lowest
>> age category be a reference category (with the intercept being the
>> log-rate) and each succeeding age category  be incremented by 1. The
>> linear estimate would be the log(risk-ratio) for increasing ages. I
>> don't want the higher order polynomial estimates. Am I hoping for
>> too much?
>>
>
> I acheived what I needed by:
>
>> xxx$agecat<-as.numeric(xxx$issuecat)
>> xxx$agecat<-xxx$agecat-1
>
> The results look quite sensible:
>> exp.mdl<-glm(actual~gendercat+agecat+smokecat, data=xxx,
> family="poisson", offset=expected)
>> summary(exp.mdl)
>
> Call:
> glm(formula = actual ~ gendercat + agecat + smokecat, family =
> "poisson",
>     data = xxx, offset = expected)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max  
> -0.5596  -0.2327  -0.1671  -0.1199   5.2386  
>
> Coefficients:
>                 Estimate Std. Error z value Pr(>|z|)    
> (Intercept)     -5.89410    0.11009 -53.539  < 2e-16 ***
> gendercatMale    0.29660    0.06426   4.615 3.92e-06 ***
> agecat           0.66143    0.02958  22.360  < 2e-16 ***
> smokecatSmoker   0.22178    0.07870   2.818  0.00483 **
> smokecatUnknown  0.02378    0.08607   0.276  0.78233
>
> I remain curious about how to correctly control ordered factors, or I
> should just simply avoid them.

If you're using a factor, R generally assumes you mean each level is a
different category, so you get levels-1 parameters.  If you don't want
this, you shouldn't use a factor:  convert to a numeric scale, just as
you did.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of ordered factors in glm

Prof Brian Ripley
Further to Duncan's comments, you can control factor codings via
options(contrasts=), by setting contrasts() on the factor and via C().
This does enable you to code an ordered factor as a linear term, for
example.

The only place I know that this is discussed in any detail is in Bill
Venables' account in MASS chapter 6.

On Sat, 5 Jan 2008, Duncan Murdoch wrote:

> On 05/01/2008 7:16 PM, David Winsemius wrote:
>> David Winsemius <[hidden email]> wrote in
>> news:Xns9A1CC05755274dNOTwinscomcast@80.91.229.13:
>>
>>> I have a variable which is roughly age categories in decades. In the
>>> original data, it came in coded:
>>>> str(xxx)
>>> 'data.frame':   58271 obs. of  29 variables:
>>>  $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1
>>>  1...
>>> snip
>>>
>>> I then defined issuecat as ordered:
>>>> xxx$issuecat<-as.ordered(xxx$issuecat)
>>> When I include issuecat in a glm model, the result makes me think I
>>> have asked R for a linear+quadratic+cubic+quartic polynomial fit.
>>> The results are not terribly surprising under that interpretation,
>>> but I was hoping for only a linear term (which I was taught to
>>> call a "test of trend"), at least as a starting point.
>>>
>>>> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
>>>> summary(age.mdl)
>>> Call:
>>> glm(formula = actual ~ issuecat, family = "poisson", data = xxx)
>>>
>>> Deviance Residuals:
>>>     Min       1Q   Median       3Q      Max
>>> -0.3190  -0.2262  -0.1649  -0.1221   5.4776
>>>
>>> Coefficients:
>>>             Estimate Std. Error z value Pr(>|z|)
>>> (Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
>>> issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
>>> issuecat.Q  -0.06568    0.11842  -0.555    0.579
>>> issuecat.C   0.08838    0.09737   0.908    0.364
>>> issuecat^4  -0.02701    0.07786  -0.347    0.729
>>>
>>> This also means my advice to a another poster this morning may have
>>> been misleading. I have tried puzzling out what I don't understand
>>> by looking at indices or searching in MASSv2, the Blue Book,
>>> Thompson's application of R to Agresti's text, and the FAQ, so far
>>> without success. What I would like to achieve is having the lowest
>>> age category be a reference category (with the intercept being the
>>> log-rate) and each succeeding age category  be incremented by 1. The
>>> linear estimate would be the log(risk-ratio) for increasing ages. I
>>> don't want the higher order polynomial estimates. Am I hoping for
>>> too much?
>>>
>>
>> I acheived what I needed by:
>>
>>> xxx$agecat<-as.numeric(xxx$issuecat)
>>> xxx$agecat<-xxx$agecat-1
>>
>> The results look quite sensible:
>>> exp.mdl<-glm(actual~gendercat+agecat+smokecat, data=xxx,
>> family="poisson", offset=expected)
>>> summary(exp.mdl)
>>
>> Call:
>> glm(formula = actual ~ gendercat + agecat + smokecat, family =
>> "poisson",
>>     data = xxx, offset = expected)
>>
>> Deviance Residuals:
>>     Min       1Q   Median       3Q      Max
>> -0.5596  -0.2327  -0.1671  -0.1199   5.2386
>>
>> Coefficients:
>>                 Estimate Std. Error z value Pr(>|z|)
>> (Intercept)     -5.89410    0.11009 -53.539  < 2e-16 ***
>> gendercatMale    0.29660    0.06426   4.615 3.92e-06 ***
>> agecat           0.66143    0.02958  22.360  < 2e-16 ***
>> smokecatSmoker   0.22178    0.07870   2.818  0.00483 **
>> smokecatUnknown  0.02378    0.08607   0.276  0.78233
>>
>> I remain curious about how to correctly control ordered factors, or I
>> should just simply avoid them.
>
> If you're using a factor, R generally assumes you mean each level is a
> different category, so you get levels-1 parameters.  If you don't want
> this, you shouldn't use a factor:  convert to a numeric scale, just as
> you did.
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of ordered factors in glm

David Winsemius

Thank you, Dr Ripley. After some false starts and consulting MASS2,
Chambers&Hastie and the help files, this worked acceptably.

> xxx$issuecat2<-C(xxx$issuecat2,poly,1)
> attr(xxx$issuecat2,"contrasts")
                 .L
0-39  -6.324555e-01
40-49 -3.162278e-01
50-59 -3.287978e-17
60-69  3.162278e-01
70+    6.324555e-01

> exp.mdl<-glm(actual~gendercat+issuecat2+smokecat,  
                data=xxx,family="poisson",offset=expected)
> summary(exp.mdl)

Deviance Residuals:
    Min       1Q   Median       3Q      Max  
-0.5596  -0.2327  -0.1671  -0.1199   5.2386  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -4.57125    0.06650 -68.743  < 2e-16 ***
gendercatMale    0.29660    0.06426   4.615 3.92e-06 ***
issuecat2.L      2.09161    0.09354  22.360  < 2e-16 ***
smokecatSmoker   0.22178    0.07870   2.818  0.00483 **
smokecatUnknown  0.02378    0.08607   0.276  0.78233

The reference category is different, but the effect of a one category
increase in age-decade on the log(rate) is(2.09*0.316) = 0.6604 which
seems acceptable agreement with my earlier as.numeric(factor) estimate
of 0.6614.

--
David Winsemius

Prof Brian Ripley <[hidden email]> wrote in
news:[hidden email]:

> Further to Duncan's comments, you can control factor codings via
> options(contrasts=), by setting contrasts() on the factor and via
> C(). This does enable you to code an ordered factor as a linear
> term, for example.
>
> The only place I know that this is discussed in any detail is in
> Bill Venables' account in MASS chapter 6.
>
> On Sat, 5 Jan 2008, Duncan Murdoch wrote:
>
>> On 05/01/2008 7:16 PM, David Winsemius wrote:
>>> David Winsemius <[hidden email]> wrote in
>>> news:Xns9A1CC05755274dNOTwinscomcast@80.91.229.13:
>>>
>>>> I have a variable which is roughly age categories in decades. In
>>>> the original data, it came in coded:
>>>>> str(xxx)
>>>> 'data.frame':   58271 obs. of  29 variables:
>>>>  $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1
>>>>  1...
>>>> snip
>>>>
>>>> I then defined issuecat as ordered:
>>>>> xxx$issuecat<-as.ordered(xxx$issuecat)
>>>> When I include issuecat in a glm model, the result makes me think
>>>> I have asked R for a linear+quadratic+cubic+quartic polynomial
>>>> fit. The results are not terribly surprising under that
>>>> interpretation, but I was hoping for only a linear term (which I
>>>> was taught to call a "test of trend"), at least as a starting
>>>> point.
>>>>
>>>>> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
>>>>> summary(age.mdl)
>>>> Call:
>>>> glm(formula = actual ~ issuecat, family = "poisson", data = xxx)
>>>>
>>>> Deviance Residuals:
>>>>     Min       1Q   Median       3Q      Max
>>>> -0.3190  -0.2262  -0.1649  -0.1221   5.4776
>>>>
>>>> Coefficients:
>>>>             Estimate Std. Error z value Pr(>|z|)
>>>> (Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
>>>> issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
>>>> issuecat.Q  -0.06568    0.11842  -0.555    0.579
>>>> issuecat.C   0.08838    0.09737   0.908    0.364
>>>> issuecat^4  -0.02701    0.07786  -0.347    0.729
>>>>
>>>> This also means my advice to a another poster this morning may
>>>> have been misleading. I have tried puzzling out what I don't
>>>> understand by looking at indices or searching in MASSv2, the Blue
>>>> Book, Thompson's application of R to Agresti's text, and the FAQ,
>>>> so far without success. What I would like to achieve is having
>>>> the lowest age category be a reference category (with the
>>>> intercept being the log-rate) and each succeeding age category
>>>> be incremented by 1. The linear estimate would be the
>>>> log(risk-ratio) for increasing ages. I don't want the higher
>>>> order polynomial estimates. Am I hoping for too much?
>>>>
>>>
>>> I acheived what I needed by:
>>>
>>>> xxx$agecat<-as.numeric(xxx$issuecat)
>>>> xxx$agecat<-xxx$agecat-1
>>>
>>> The results look quite sensible:
>>>> exp.mdl<-glm(actual~gendercat+agecat+smokecat, data=xxx,
>>> family="poisson", offset=expected)
>>>> summary(exp.mdl)
>>>
>>> Call:
>>> glm(formula = actual ~ gendercat + agecat + smokecat, family =
>>> "poisson",
>>>     data = xxx, offset = expected)
>>>
>>> Deviance Residuals:
>>>     Min       1Q   Median       3Q      Max
>>> -0.5596  -0.2327  -0.1671  -0.1199   5.2386
>>>
>>> Coefficients:
>>>                 Estimate Std. Error z value Pr(>|z|)
>>> (Intercept)     -5.89410    0.11009 -53.539  < 2e-16 ***
>>> gendercatMale    0.29660    0.06426   4.615 3.92e-06 ***
>>> agecat           0.66143    0.02958  22.360  < 2e-16 ***
>>> smokecatSmoker   0.22178    0.07870   2.818  0.00483 **
>>> smokecatUnknown  0.02378    0.08607   0.276  0.78233
>>>
>>> I remain curious about how to correctly control ordered factors,
>>> or I should just simply avoid them.
>>
>> If you're using a factor, R generally assumes you mean each level
>> is a different category, so you get levels-1 parameters.  If you
>> don't want this, you shouldn't use a factor:  convert to a numeric
>> scale, just as you did.
>>
>> Duncan Murdoch
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.