Meta-analysis of prevalence at the country level with mgcv/gamm4

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Meta-analysis of prevalence at the country level with mgcv/gamm4

Julien Riou
Dear R community,

I'm working on a meta-analysis of prevalence data. The aim is to get
estimates of prevalence at the country level. The main issue is that the
disease is highly correlated with age, and the sample ages of included
studies are highly heterogeneous. Only median age is available for most
studies, so I can't use SMR-like tricks. I figured I could use
meta-regression to solve this, including age as a fixed-effect and
introducing study-level and country-level random-effects.

The idea (that I took from Fowkes et
al<http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2961249-0/abstract>)
was to use this model to make country-specific predictions of prevalence
for each 5-year age group from 15 to 60 (using the median age of the
group), and to apply these predictions to the actual population size of
each of those groups in the selected country, in order to obtain total
infected population and to calculate age-adjusted prevalence in the 15-60
population from that.

I tried several ways to do this using R with packages meta and mgcv. I got
some satisfying results, but I'm not that confident with my results and
would appreciate some feedback.

First is some simulated data, then the description of my different
approaches:

data<-data.frame(id_study=c("UK1","UK2","UK3","FRA1","FRA2","BEL1","GER1","GER2","GER3"),

country=c("UK","UK","UK","FRANCE","FRANCE","BELGIUM","GERMANY","GERMANY","GERMANY"),
                 n_events=c(91,49,18,10,50,6,9,10,22),
                 n_total=c(3041,580,252,480,887,256,400,206,300),
                 study_median_age=c(25,50,58,30,42,26,27,28,36))

*Standard random-effect meta-analysis* with package meta.

I used metaprop() to get a first estimate of the prevalence in each country
without taking age into account, and to obtain weights. As expected,
heterogeneity was very high, so I used weights from the random-effects
model.

 meta <- metaprop(event=n_events,n=n_total,byvar=country,sm="PLOGIT",method.tau="REML",data=data)
 summary(meta)
 data$weight<-meta$w.random

I used meta to get a first estimate of the prevalence without taking age
into account, and to obtain weights. As expected, heterogeneity was very
high, so I used weights from the random-effects model.

*Generalized additive model* to include age with package mgcv.

The gam() model parameters (k and sp) were chosen using BIC and GCV number
(not shown here).

 model <- gam( cbind(n_events,n_total-n_events) ~
s(study_median_age,bs="cr",k=4,sp=2) + s(country,bs="re"),
weights=weight, data=data, family="binomial"(link=logit),
method="REML")
 plot(model,pages=1,residuals=T, all.terms=T, shade=T)

Predictions for each age group were obtained from this model as explained
earlier. CI were obtained directly using predict.gam(), that uses the
Bayesian posterior covariance matrix of the parameters. For exemple
considering UK:

 newdat<-data.frame(country="UK",study_median_age=seq(17,57,5))
 link<-predict(model,newdat,type="link",se.fit=T)$fit
 linkse<-predict(model,newdat,type="link",se.fit=T)$se
 newdat$prev<-model$family$linkinv(link)
 newdat$CIinf<-model$family$linkinv(link-1.96*linkse)
 newdat$CIsup<-model$family$linkinv(link+1.96*linkse)
 plot(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
 lines(newdat$CIinf~newdat$study_median_age, lty=2)
 lines(newdat$CIsup~newdat$study_median_age, lty=2)

The results were satisfying, representing the augmentation of the
prevalence with advanced age, with coherent confidence intervals. I
obtained a total prevalence for the country using the country population
structure (not shown, I hope it is clear enough).

However, I figured I needed to include study-level random-effects since
there was a high heterogeneity (even though I did not calculate
heterogeneity after the meta-regression).

*Introducing study-level random-effect* with package gamm4.

Since mgcv models can't handle that much random-effect parameters, I had to
switch to gamm4.

 model2 <- gamm4(cbind(n_events,n_total-n_events) ~
s(study_median_age,bs="cr",k=4) + s(country,bs="re"),
random=~(1|id_study), data=data, weights=weight,
family="binomial"(link=logit))
 plot(model2$gam,pages=1,residuals=T, all.terms=T, shade=T)

 link<-predict(model2$gam,newdat,type="link",se.fit=T)$fit
 linkse<-predict(model2$gam,newdat,type="link",se.fit=T)$se
 newdat$prev2<-model$family$linkinv(link)
 newdat$CIinf2<-model$family$linkinv(link-1.96*linkse)
 newdat$CIsup2<-model$family$linkinv(link+1.96*linkse)
 plot(newdat$prev2~newdat$study_median_age, type="l",col="red",ylim=c(0,0.11))
 lines(newdat$CIinf2~newdat$study_median_age, lty=2,col="red")
 lines(newdat$CIsup2~newdat$study_median_age, lty=2,col="red")
 lines(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
 lines(newdat$CIinf~newdat$study_median_age, lty=2)
 lines(newdat$CIsup~newdat$study_median_age, lty=2)

Since the study-level random effect was in the mer part of the fit, I
didn't have to handle it.

As you can see, I obtain rather different results, with a much smoother
relation between age and prevalence, and quite different confidence
intervals. It is even more different in the full-data analysis, where the
CI are much wider in the model including study-level RE, to the point it is
sometimes almost uninformative (prevalence between 0 and 15%, but if it is
the way it is...). Moreover, the study-level RE model seems to be more
stable when outliers are excluded.

*So, my questions are:*

   - Did I properly extract the weights from the metaprop() function and
   used them further?
   - Did I properly built my gam() and gamm4() models? I read a lot about
   this, but I'm not used to this king of models.
   - Which of these models should I use?

I would really appreciate some help, since neither my teachers nor my
colleagues could. It was a really harsh to conduct the systematic review,
and very frustrating to struggle with the analysis... Thank you in advance!

Julien

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Meta-analysis of prevalence at the country level with mgcv/gamm4

Michael Dewey
At 09:31 08/04/2014, Julien Riou wrote:
>Dear R community,

Comments in line below

>I'm working on a meta-analysis of prevalence data. The aim is to get
>estimates of prevalence at the country level. The main issue is that the
>disease is highly correlated with age, and the sample ages of included
>studies are highly heterogeneous. Only median age is available for most
>studies, so I can't use SMR-like tricks. I figured I could use
>meta-regression to solve this, including age as a fixed-effect and
>introducing study-level and country-level random-effects.
>
>The idea (that I took from Fowkes et
>al<http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2961249-0/abstract>)
>was to use this model to make country-specific predictions of prevalence
>for each 5-year age group from 15 to 60 (using the median age of the
>group), and to apply these predictions to the actual population size of
>each of those groups in the selected country, in order to obtain total
>infected population and to calculate age-adjusted prevalence in the 15-60
>population from that.
>
>I tried several ways to do this using R with packages meta and mgcv. I got
>some satisfying results, but I'm not that confident with my results and
>would appreciate some feedback.
>
>First is some simulated data, then the description of my different
>approaches:
>
>data<-data.frame(id_study=c("UK1","UK2","UK3","FRA1","FRA2","BEL1","GER1","GER2","GER3"),
>
>country=c("UK","UK","UK","FRANCE","FRANCE","BELGIUM","GERMANY","GERMANY","GERMANY"),
>                  n_events=c(91,49,18,10,50,6,9,10,22),
>                  n_total=c(3041,580,252,480,887,256,400,206,300),
>                  study_median_age=c(25,50,58,30,42,26,27,28,36))

So the first UK study with a median age of 25 is going to be used to
estimate prevalence over a range of ages? You are going to have to
make some very strong assumptions here which I personally would not
want to make.

Is there any possibility that in the real dataset you can fit your
model to those studies which do provide age-specific prevalences and
then use that to impute?

You do not say when these studies were published but I would ask the
authors of the primary studies if they can make the information
available to you. You may have already done that of course. I referee
quite a few papers on systematic reviews and my impression is that
some authors are amenable to doing the work for you. You mileage may
vary of course.

>*Standard random-effect meta-analysis* with package meta.
>
>I used metaprop() to get a first estimate of the prevalence in each country
>without taking age into account, and to obtain weights. As expected,
>heterogeneity was very high, so I used weights from the random-effects
>model.

Which will be nearly equal and so hardly worth using in my opinion
but again your mileage may vary.

>  meta <-
> metaprop(event=n_events,n=n_total,byvar=country,sm="PLOGIT",method.tau="REML",data=data)
>  summary(meta)
>  data$weight<-meta$w.random
>
>I used meta to get a first estimate of the prevalence without taking age
>into account, and to obtain weights. As expected, heterogeneity was very
>high, so I used weights from the random-effects model.
>
>*Generalized additive model* to include age with package mgcv.
>
>The gam() model parameters (k and sp) were chosen using BIC and GCV number
>(not shown here).
>
>  model <- gam( cbind(n_events,n_total-n_events) ~
>s(study_median_age,bs="cr",k=4,sp=2) + s(country,bs="re"),
>weights=weight, data=data, family="binomial"(link=logit),
>method="REML")
>  plot(model,pages=1,residuals=T, all.terms=T, shade=T)
>
>Predictions for each age group were obtained from this model as explained
>earlier. CI were obtained directly using predict.gam(), that uses the
>Bayesian posterior covariance matrix of the parameters. For exemple
>considering UK:
>
>  newdat<-data.frame(country="UK",study_median_age=seq(17,57,5))
>  link<-predict(model,newdat,type="link",se.fit=T)$fit
>  linkse<-predict(model,newdat,type="link",se.fit=T)$se
>  newdat$prev<-model$family$linkinv(link)
>  newdat$CIinf<-model$family$linkinv(link-1.96*linkse)
>  newdat$CIsup<-model$family$linkinv(link+1.96*linkse)
>  plot(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
>  lines(newdat$CIinf~newdat$study_median_age, lty=2)
>  lines(newdat$CIsup~newdat$study_median_age, lty=2)
>
>The results were satisfying, representing the augmentation of the
>prevalence with advanced age, with coherent confidence intervals. I
>obtained a total prevalence for the country using the country population
>structure (not shown, I hope it is clear enough).
>
>However, I figured I needed to include study-level random-effects since
>there was a high heterogeneity (even though I did not calculate
>heterogeneity after the meta-regression).
>
>*Introducing study-level random-effect* with package gamm4.
>
>Since mgcv models can't handle that much random-effect parameters, I had to
>switch to gamm4.
>
>  model2 <- gamm4(cbind(n_events,n_total-n_events) ~
>s(study_median_age,bs="cr",k=4) + s(country,bs="re"),
>random=~(1|id_study), data=data, weights=weight,
>family="binomial"(link=logit))
>  plot(model2$gam,pages=1,residuals=T, all.terms=T, shade=T)
>
>  link<-predict(model2$gam,newdat,type="link",se.fit=T)$fit
>  linkse<-predict(model2$gam,newdat,type="link",se.fit=T)$se
>  newdat$prev2<-model$family$linkinv(link)
>  newdat$CIinf2<-model$family$linkinv(link-1.96*linkse)
>  newdat$CIsup2<-model$family$linkinv(link+1.96*linkse)
>  plot(newdat$prev2~newdat$study_median_age,
> type="l",col="red",ylim=c(0,0.11))
>  lines(newdat$CIinf2~newdat$study_median_age, lty=2,col="red")
>  lines(newdat$CIsup2~newdat$study_median_age, lty=2,col="red")
>  lines(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
>  lines(newdat$CIinf~newdat$study_median_age, lty=2)
>  lines(newdat$CIsup~newdat$study_median_age, lty=2)
>
>Since the study-level random effect was in the mer part of the fit, I
>didn't have to handle it.
>
>As you can see, I obtain rather different results, with a much smoother
>relation between age and prevalence, and quite different confidence
>intervals. It is even more different in the full-data analysis, where the
>CI are much wider in the model including study-level RE, to the point it is
>sometimes almost uninformative (prevalence between 0 and 15%, but if it is
>the way it is...). Moreover, the study-level RE model seems to be more
>stable when outliers are excluded.
>
>*So, my questions are:*
>
>    - Did I properly extract the weights from the metaprop() function and
>    used them further?
>    - Did I properly built my gam() and gamm4() models? I read a lot about
>    this, but I'm not used to this king of models.
>    - Which of these models should I use?
>
>I would really appreciate some help, since neither my teachers nor my
>colleagues could. It was a really harsh to conduct the systematic review,
>and very frustrating to struggle with the analysis... Thank you in advance!

I am afraid that is the way with systematic reviews, you can only
synthesise what you find, not what you would like to have found.
Anyone who has done a review will sympathise with you, not that that
is any consolation.


>Julien
>
>         [[alternative HTML version deleted]]

Michael Dewey
[hidden email]
http://www.aghmed.fsnet.co.uk/home.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Meta-analysis of prevalence at the country level with mgcv/gamm4

Julien Riou
In reply to this post by Julien Riou
>
> Message: 11
> Date: Wed, 09 Apr 2014 18:39:30 +0100
> From: Michael Dewey <[hidden email]>
> To: Julien Riou <[hidden email]>, [hidden email]
> Subject: Re: [R] Meta-analysis of prevalence at the country level with
>         mgcv/gamm4
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="us-ascii"; format=flowed
>


Hi Michael,

Thank you for taking the time to help me.

 So the first UK study with a median age of 25 is going to be used to
> estimate prevalence over a range of ages? You are going to have to
> make some very strong assumptions here which I personally would not
> want to make.
>

I'm a little confused by this. In my understanding, the mixed-effects model
does not do that. The slope of the relation between age and prevalence will
be estimated from the full pool of studies, and the country-level random
intercept will be estimated from all studies in the country. So the
assumption here is that the relation between age and incidence is the same
in every country, which is quite reasonable. Of course, there will be more
uncertainty with the estimation of the random intercept if there is few
studies in a country, or if there is a strong inter-study variance in a
country. This will influence the confidence interval of the random
intercept, and so the CI of the predicted prevalence for this country.


Is there any possibility that in the real dataset you can fit your

> model to those studies which do provide age-specific prevalences and
> then use that to impute?
>
> You do not say when these studies were published but I would ask the
> authors of the primary studies if they can make the information
> available to you. You may have already done that of course. I referee
> quite a few papers on systematic reviews and my impression is that
> some authors are amenable to doing the work for you. You mileage may
> vary of course.
>

Yes, it would be easier to have prevalence for age subgroups of studies,
but we did not have access to that information for most studies even after
contacting the authors.


> >*Standard random-effect meta-analysis* with package meta.
> >
> >I used metaprop() to get a first estimate of the prevalence in each
> country
> >without taking age into account, and to obtain weights. As expected,
> >heterogeneity was very high, so I used weights from the random-effects
> >model.
>
> Which will be nearly equal and so hardly worth using in my opinion
> but again your mileage may vary.
>

The weights from the random-effects method were actually far from equals,
as sample size was highly variable between studies. With the RE method,
small studies have much more impact.


> I am afraid that is the way with systematic reviews, you can only
> synthesise what you find, not what you would like to have found.
> Anyone who has done a review will sympathise with you, not that that
> is any consolation.
>

I'm not sure I'm following your point. My objective is to synthesise the
included studies, while taking the age factor into account, since it is
strongly linked to prevalence and very heterogeneous. The alternative is to
only include studies with low median age, but I would lose a lot of
information.

Thank you again,
Julien

>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Meta-analysis of prevalence at the country level with mgcv/gamm4

Michael Dewey
At 13:17 10/04/2014, Julien Riou wrote:

> >
> > Message: 11
> > Date: Wed, 09 Apr 2014 18:39:30 +0100
> > From: Michael Dewey <[hidden email]>
> > To: Julien Riou <[hidden email]>, [hidden email]
> > Subject: Re: [R] Meta-analysis of prevalence at the country level with
> >         mgcv/gamm4
> > Message-ID: <[hidden email]>
> > Content-Type: text/plain; charset="us-ascii"; format=flowed
> >
>
>
>Hi Michael,
>
>Thank you for taking the time to help me.
>
>  So the first UK study with a median age of 25 is going to be used to
> > estimate prevalence over a range of ages? You are going to have to
> > make some very strong assumptions here which I personally would not
> > want to make.
> >
>
>I'm a little confused by this. In my understanding, the mixed-effects model
>does not do that. The slope of the relation between age and prevalence will
>be estimated from the full pool of studies, and the country-level random
>intercept will be estimated from all studies in the country. So the
>assumption here is that the relation between age and incidence is the same
>in every country, which is quite reasonable. Of course, there will be more
>uncertainty with the estimation of the random intercept if there is few
>studies in a country, or if there is a strong inter-study variance in a
>country. This will influence the confidence interval of the random
>intercept, and so the CI of the predicted prevalence for this country.

Your studies are ecological. You are estimating the relationship
between prevalence and being in a study of median age X which is not
necessarily the same as the relationship between prevalence and being
a person of age X.



>Is there any possibility that in the real dataset you can fit your
> > model to those studies which do provide age-specific prevalences and
> > then use that to impute?
> >
> > You do not say when these studies were published but I would ask the
> > authors of the primary studies if they can make the information
> > available to you. You may have already done that of course. I referee
> > quite a few papers on systematic reviews and my impression is that
> > some authors are amenable to doing the work for you. You mileage may
> > vary of course.
> >
>
>Yes, it would be easier to have prevalence for age subgroups of studies,
>but we did not have access to that information for most studies even after
>contacting the authors.
>
>
> > >*Standard random-effect meta-analysis* with package meta.
> > >
> > >I used metaprop() to get a first estimate of the prevalence in each
> > country
> > >without taking age into account, and to obtain weights. As expected,
> > >heterogeneity was very high, so I used weights from the random-effects
> > >model.
> >
> > Which will be nearly equal and so hardly worth using in my opinion
> > but again your mileage may vary.
> >
>
>The weights from the random-effects method were actually far from equals,
>as sample size was highly variable between studies. With the RE method,
>small studies have much more impact.
>
>
> > I am afraid that is the way with systematic reviews, you can only
> > synthesise what you find, not what you would like to have found.
> > Anyone who has done a review will sympathise with you, not that that
> > is any consolation.
> >
>
>I'm not sure I'm following your point. My objective is to synthesise the
>included studies, while taking the age factor into account, since it is
>strongly linked to prevalence and very heterogeneous. The alternative is to
>only include studies with low median age, but I would lose a lot of
>information.
>
>Thank you again,
>Julien
>
> >
> >
>
>         [[alternative HTML version deleted]]

Michael Dewey
[hidden email]
http://www.aghmed.fsnet.co.uk/home.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Meta-analysis of prevalence at the country level with mgcv/gamm4

Julien Riou
Hi Michael,

Thank you for your input. Actually the idea of using the model to estimate
a number of sick people by age group came later in the project, and I
missed the issue with the ecological nature of the analysis.

Initially, I intended to use the model to predict a single prevalence for
each country at the median age of the country population. Should I go back
to this approach? If I'm not mistaken, this could be interpreted as
age-adjusted pooled prevalence at the country level. In this case, should I
use the model with the study-level random effect?

Tkanks again for your time,
Julien





On 11 April 2014 13:15, Michael Dewey <[hidden email]> wrote:

> At 13:17 10/04/2014, Julien Riou wrote:
>
>> >
>> > Message: 11
>> > Date: Wed, 09 Apr 2014 18:39:30 +0100
>> > From: Michael Dewey <[hidden email]>
>> > To: Julien Riou <[hidden email]>, [hidden email]
>> > Subject: Re: [R] Meta-analysis of prevalence at the country level with
>> >         mgcv/gamm4
>> > Message-ID: <[hidden email]>
>> > Content-Type: text/plain; charset="us-ascii"; format=flowed
>> >
>>
>>
>> Hi Michael,
>>
>> Thank you for taking the time to help me.
>>
>>  So the first UK study with a median age of 25 is going to be used to
>> > estimate prevalence over a range of ages? You are going to have to
>> > make some very strong assumptions here which I personally would not
>> > want to make.
>> >
>>
>> I'm a little confused by this. In my understanding, the mixed-effects
>> model
>> does not do that. The slope of the relation between age and prevalence
>> will
>> be estimated from the full pool of studies, and the country-level random
>> intercept will be estimated from all studies in the country. So the
>> assumption here is that the relation between age and incidence is the same
>> in every country, which is quite reasonable. Of course, there will be more
>> uncertainty with the estimation of the random intercept if there is few
>> studies in a country, or if there is a strong inter-study variance in a
>> country. This will influence the confidence interval of the random
>> intercept, and so the CI of the predicted prevalence for this country.
>>
>
> Your studies are ecological. You are estimating the relationship between
> prevalence and being in a study of median age X which is not necessarily
> the same as the relationship between prevalence and being a person of age X.
>
>
>
>  Is there any possibility that in the real dataset you can fit your
>> > model to those studies which do provide age-specific prevalences and
>> > then use that to impute?
>> >
>> > You do not say when these studies were published but I would ask the
>> > authors of the primary studies if they can make the information
>> > available to you. You may have already done that of course. I referee
>> > quite a few papers on systematic reviews and my impression is that
>> > some authors are amenable to doing the work for you. You mileage may
>> > vary of course.
>> >
>>
>> Yes, it would be easier to have prevalence for age subgroups of studies,
>> but we did not have access to that information for most studies even after
>> contacting the authors.
>>
>>
>> > >*Standard random-effect meta-analysis* with package meta.
>> > >
>> > >I used metaprop() to get a first estimate of the prevalence in each
>> > country
>> > >without taking age into account, and to obtain weights. As expected,
>> > >heterogeneity was very high, so I used weights from the random-effects
>> > >model.
>> >
>> > Which will be nearly equal and so hardly worth using in my opinion
>> > but again your mileage may vary.
>> >
>>
>> The weights from the random-effects method were actually far from equals,
>> as sample size was highly variable between studies. With the RE method,
>> small studies have much more impact.
>>
>>
>> > I am afraid that is the way with systematic reviews, you can only
>> > synthesise what you find, not what you would like to have found.
>> > Anyone who has done a review will sympathise with you, not that that
>> > is any consolation.
>> >
>>
>> I'm not sure I'm following your point. My objective is to synthesise the
>> included studies, while taking the age factor into account, since it is
>> strongly linked to prevalence and very heterogeneous. The alternative is
>> to
>> only include studies with low median age, but I would lose a lot of
>> information.
>>
>> Thank you again,
>> Julien
>>
>> >
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>
> Michael Dewey
> [hidden email]
> http://www.aghmed.fsnet.co.uk/home.html
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.