Dear R community,

I'm working on a meta-analysis of prevalence data. The aim is to get

estimates of prevalence at the country level. The main issue is that the

disease is highly correlated with age, and the sample ages of included

studies are highly heterogeneous. Only median age is available for most

studies, so I can't use SMR-like tricks. I figured I could use

meta-regression to solve this, including age as a fixed-effect and

introducing study-level and country-level random-effects.

The idea (that I took from Fowkes et

al<

http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2961249-0/abstract>)

was to use this model to make country-specific predictions of prevalence

for each 5-year age group from 15 to 60 (using the median age of the

group), and to apply these predictions to the actual population size of

each of those groups in the selected country, in order to obtain total

infected population and to calculate age-adjusted prevalence in the 15-60

population from that.

I tried several ways to do this using R with packages meta and mgcv. I got

some satisfying results, but I'm not that confident with my results and

would appreciate some feedback.

First is some simulated data, then the description of my different

approaches:

data<-data.frame(id_study=c("UK1","UK2","UK3","FRA1","FRA2","BEL1","GER1","GER2","GER3"),

country=c("UK","UK","UK","FRANCE","FRANCE","BELGIUM","GERMANY","GERMANY","GERMANY"),

n_events=c(91,49,18,10,50,6,9,10,22),

n_total=c(3041,580,252,480,887,256,400,206,300),

study_median_age=c(25,50,58,30,42,26,27,28,36))

*Standard random-effect meta-analysis* with package meta.

I used metaprop() to get a first estimate of the prevalence in each country

without taking age into account, and to obtain weights. As expected,

heterogeneity was very high, so I used weights from the random-effects

model.

meta <- metaprop(event=n_events,n=n_total,byvar=country,sm="PLOGIT",method.tau="REML",data=data)

summary(meta)

data$weight<-meta$w.random

I used meta to get a first estimate of the prevalence without taking age

into account, and to obtain weights. As expected, heterogeneity was very

high, so I used weights from the random-effects model.

*Generalized additive model* to include age with package mgcv.

The gam() model parameters (k and sp) were chosen using BIC and GCV number

(not shown here).

model <- gam( cbind(n_events,n_total-n_events) ~

s(study_median_age,bs="cr",k=4,sp=2) + s(country,bs="re"),

weights=weight, data=data, family="binomial"(link=logit),

method="REML")

plot(model,pages=1,residuals=T, all.terms=T, shade=T)

Predictions for each age group were obtained from this model as explained

earlier. CI were obtained directly using predict.gam(), that uses the

Bayesian posterior covariance matrix of the parameters. For exemple

considering UK:

newdat<-data.frame(country="UK",study_median_age=seq(17,57,5))

link<-predict(model,newdat,type="link",se.fit=T)$fit

linkse<-predict(model,newdat,type="link",se.fit=T)$se

newdat$prev<-model$family$linkinv(link)

newdat$CIinf<-model$family$linkinv(link-1.96*linkse)

newdat$CIsup<-model$family$linkinv(link+1.96*linkse)

plot(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))

lines(newdat$CIinf~newdat$study_median_age, lty=2)

lines(newdat$CIsup~newdat$study_median_age, lty=2)

The results were satisfying, representing the augmentation of the

prevalence with advanced age, with coherent confidence intervals. I

obtained a total prevalence for the country using the country population

structure (not shown, I hope it is clear enough).

However, I figured I needed to include study-level random-effects since

there was a high heterogeneity (even though I did not calculate

heterogeneity after the meta-regression).

*Introducing study-level random-effect* with package gamm4.

Since mgcv models can't handle that much random-effect parameters, I had to

switch to gamm4.

model2 <- gamm4(cbind(n_events,n_total-n_events) ~

s(study_median_age,bs="cr",k=4) + s(country,bs="re"),

random=~(1|id_study), data=data, weights=weight,

family="binomial"(link=logit))

plot(model2$gam,pages=1,residuals=T, all.terms=T, shade=T)

link<-predict(model2$gam,newdat,type="link",se.fit=T)$fit

linkse<-predict(model2$gam,newdat,type="link",se.fit=T)$se

newdat$prev2<-model$family$linkinv(link)

newdat$CIinf2<-model$family$linkinv(link-1.96*linkse)

newdat$CIsup2<-model$family$linkinv(link+1.96*linkse)

plot(newdat$prev2~newdat$study_median_age, type="l",col="red",ylim=c(0,0.11))

lines(newdat$CIinf2~newdat$study_median_age, lty=2,col="red")

lines(newdat$CIsup2~newdat$study_median_age, lty=2,col="red")

lines(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))

lines(newdat$CIinf~newdat$study_median_age, lty=2)

lines(newdat$CIsup~newdat$study_median_age, lty=2)

Since the study-level random effect was in the mer part of the fit, I

didn't have to handle it.

As you can see, I obtain rather different results, with a much smoother

relation between age and prevalence, and quite different confidence

intervals. It is even more different in the full-data analysis, where the

CI are much wider in the model including study-level RE, to the point it is

sometimes almost uninformative (prevalence between 0 and 15%, but if it is

the way it is...). Moreover, the study-level RE model seems to be more

stable when outliers are excluded.

*So, my questions are:*

- Did I properly extract the weights from the metaprop() function and

used them further?

- Did I properly built my gam() and gamm4() models? I read a lot about

this, but I'm not used to this king of models.

- Which of these models should I use?

I would really appreciate some help, since neither my teachers nor my

colleagues could. It was a really harsh to conduct the systematic review,

and very frustrating to struggle with the analysis... Thank you in advance!

Julien

[[alternative HTML version deleted]]

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.