Multiple logistic regression

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple logistic regression

Stephanie Delalieux
Dear R-users,

Is there a function in R that classifies data in more than 2 groups using
logistic regression/classification? I want to compare the c-indices of
earlier research (lrm, binary response variables) with new c-indices
obtained from 'multiple' (more response variables) logistic regression.

Best regards,
Stephanie Delalieux

Department Biosystems
M³-BIORES
Group of Geomatics Engineering
[hidden email]


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Multiple logistic regression

Wincent
Do you mean multinomial logistic model?
If it is,the  multinom function in nnet package and multinomial
function in VGAM(http://www.stat.auckland.ac.nz/~yee) package can do
it.

But I have no idea about the c-index.

2006/3/8, Stephanie Delalieux <[hidden email]>:

> Dear R-users,
>
> Is there a function in R that classifies data in more than 2 groups using
> logistic regression/classification? I want to compare the c-indices of
> earlier research (lrm, binary response variables) with new c-indices
> obtained from 'multiple' (more response variables) logistic regression.
>
> Best regards,
> Stephanie Delalieux
>
> Department Biosystems
> M³-BIORES
> Group of Geomatics Engineering
> [hidden email]
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
黄荣贵
Deparment of Sociology
Fudan University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Multiple logistic regression

Frank Harrell
In reply to this post by Stephanie Delalieux
Stephanie Delalieux wrote:

> Dear R-users,
>
> Is there a function in R that classifies data in more than 2 groups using
> logistic regression/classification? I want to compare the c-indices of
> earlier research (lrm, binary response variables) with new c-indices
> obtained from 'multiple' (more response variables) logistic regression.
>
> Best regards,
> Stephanie Delalieux
>
> Department Biosystems
> M³-BIORES
> Group of Geomatics Engineering
> [hidden email]

There are many functions available but the answer depends on whether the
response variable has a natural ordering.  Whether it does or not,
though, C-indexes may decrease from the binary case because predicting
more categories is a more difficult task.

Frank Harrell

>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Multiple logistic regression

Antti Arppe
In reply to this post by Stephanie Delalieux

In addition to the multinom(nnet) function mentioned below there is
some literature on how one can divide such polytomous problems into an
set of dichotomous classifications and then aggregate the results,
e.g.:

1) one-vs-all
2) pairwise comparisons (aka [double] round-robin) (Führnkranz)
3) nested dichotomies
3) ensembles of nested dichotomies (aka ENDs) (Frank & Kramer)

The article by Eibe Frank & Stefan Kramer,

    Ensembles of nested dichotomies for multi-class problems
    http://wwwkramer.in.tum.de/kramer/frankkramer_icml04.pdf

firstly gives an concise overview of the various above strategies and
compares their performance, arguing for the use of the method they
have themselves devised, i.e. ENDs, and secondly provides references
for articles describing the other methods in detail (e.g. Führnkranz).
The strategies mentioned above have the advantage that they do not
have a default class, in contrast to the multinom function.

Another question is whether any of these strategies have been
implemented in a publicly avaiblable library? At least my recent
cursory searches in the R-help archives and with help.search("...")
have not produced any tangible results. I've managed to concoct a set
of R-functions which crudely implement the strategies 1) one-vs-all
and 2) pairwise comparisons, which I attach below. They are probably
too much geared to my own research question and cut a few too many
corners to be used more generally without substantial modification,
and they could most probably be implemented in a more elegant manner,
but they might nevertheless be of some inspiration.

Having hacked these solutions on my own it would be all too typical
that some of the above multilevel classification strategies have in
fact already been in implemented in an available library. So, is
anyone on this list aware of such functions/libraries?

Regards,

  -Antti Arppe

======================================================================
Antti Arppe - Master of Science (Engineering)
Researcher & doctoral student (Linguistics)
E-mail: [hidden email]
WWW: http://www.ling.helsinki.fi/~aarppe

> >  13. Multiple logistic regression (Stephanie Delalieux)
> > Date: Wed, 8 Mar 2006 14:15:58 +0100
> > From: "Stephanie Delalieux" <[hidden email]>
> > Subject: [R] Multiple logistic regression
> > To: <[hidden email]>
> >
> >        Is there a function in R that classifies data in more than
> > 2 groups using logistic regression/classification? I want to
> > compare the c-indices of earlier research (lrm, binary response
> > variables) with new c-indices obtained from 'multiple' (more
> > response variables) logistic regression.
> Message: 23
> Date: Wed, 8 Mar 2006 22:26:24 +0800
> From: ronggui <[hidden email]>
> Subject: Re: [R] Multiple logistic regression
> To: "Stephanie Delalieux" <[hidden email]>
> Cc: [hidden email]
>
> Do you mean multinomial logistic model?
> If it is,the  multinom function in nnet package and multinomial
> function in VGAM(http://www.stat.auckland.ac.nz/~yee) package can do
> it.
8-----

1) dat: data (with the first column containing the multiclass variable
which is being predicted)

2) fn: predictor variables as a string, e.g. fn <- "A + B + C". In
this implementation, the predictor variables are assumed to be logical
(and thus binary); therefore, the GLM model family=binomial, and
should be changed if the data is of another sort.

3) lex: list with multiple classes being predicted, e.g.
lex <- c("a", "b", "c", "d")

4) freq: a Nx1 vector mapping frequency order of predicted classes to
their actual order in (3) lex, needed for the double-round method for
determining ties (-> alternative with the highest frequency selected)

5) teach.test.ratio: a list of length(2) indicating the proportions of
the data to be used for teaching the models and testing,
e.g. c(1,1) -> 50% teach vs. 50% testing, c(2,1) -> 66.6% vs. 33.3%

6) iter: number of iteration rounds in evaluating the accuracy of
classication performance

7) classifier: either 'double.round.robin' or 'one.vs.all'

repeated.tests <- function(dat,fn,lex,freq,teach.test.ratio=c(1,1),iter=1,hold.out=FALSE,classifier="double.round.robin", ...)
{ n.tot = nrow(dat);
   if(length(teach.test.ratio)==2)
     n.teach=round(teach.test.ratio[1]*n.tot/sum(teach.test.ratio));
   n.test = n.tot - n.teach; nlex <- length(lex);
   success <- matrix(c(n.teach,round(n.teach*100/n.tot,2),n.test,round(n.test*100/n.tot,2),0,0,0),iter,7,byrow=TRUE);
   colnames(success) <- c("Teach","%","Test","%","Success","%","tau (Kendall)");
   test.lx <- matrix(0,iter,nlex);
   colnames(test.lx) <- lex;
   success.lx <- guess.lx <- test.lx;
   for(i in 1:iter)
      { selected <- sample(seq(1:n.tot),n.teach,replace=hold.out);
        teach <- dat[selected,];
        test <- dat[-selected,];
        result <- switch(classifier,
          "double.round.robin" = double.round.robin(teach,test,fn,lex,freq),
          "one.vs.all" = one.vs.all(teach,test,fn,lex));
        for(j in 1:n.test)
           { test.lx[i,pos(result[j,1],lex)] <- test.lx[i,pos(result[j,1],lex)]+1;
             guess.lx[i,pos(result[j,2],lex)] <- guess.lx[i,pos(result[j,2],lex)]+1;
             if(result[j,1]==result[j,2])
               { success[i,5]=success[i,5]+1;
                 success.lx[i,pos(result[j,1],lex)] <- success.lx[i, pos(result[j,1],lex)]+1;
               };
           };
        success[i,6]=round(success[i,5]*100/n.test,2);
        success[i,7] <- cor(result[,1],result[,2],method="kendall");
      };
   stats <- matrix(0,3,2);
   colnames(stats) <- c("Recall.Total", "Recall.Total.%");
   rownames(stats) <- c("Mean", "Std.Dev", "tau (Kendall)");
   stats[1,1] <- round(mean(success[,5]),1);
   stats[1,2] <- round(mean(success[,6]),2);
   stats[2,1] <- round(sd(success[,5]),1);
   stats[2,2] <- round(sd(success[,6]),2);
   stats[3,1] <- mean(success[,7]);
   stats[3,2] <- sd(success[,7]);
   stats.lx <- matrix(0,nlex,8);
   rownames(stats.lx) <- lex;
   colnames(stats.lx) <- c("Test.Mean", "Test/All.%", "Recall.Mean", "Recall.%", "Recall.Std.Dev", "Recall.Std.Dev.%", "Precision.Mean", "Precision.%");
   for(i in 1:nlex)
      { stats.lx[i,1] <- round(mean(test.lx[,i]),1);
        stats.lx[i,2] <- round(mean(test.lx[,i])*100/n.test,2);
        stats.lx[i,3] <- round(mean(success.lx[,i]),1);
        stats.lx[i,4] <- round(mean(success.lx[,i]/test.lx[,i])*100,2);
        stats.lx[i,5] <- round(sd(success.lx[,i]),1);
        stats.lx[i,6] <- round(sd(success.lx[,i]/test.lx[,i])*100,2);
        stats.lx[i,7] <- round(mean(guess.lx[,i]),1);
        stats.lx[i,8] <- round(mean(success.lx[,i]/guess.lx[,i]*100),2);
      }
   return(stats, success, stats.lx, test.lx, guess.lx, success.lx);
}

double.round.robin <- function(teach, test, fn, lex, freq, ...)
{ nlex=length(lex);
   preds <- prediction.matrix.pairwise(teach, test, fn, lex);
   npreds <- nrow(preds);
   comps <- cbind(test[,1],test[,1]);
   for(k in 1:npreds)
      { votes <- matrix(0,nlex);
        wins <- matrix(FALSE,nlex);
        nwins=0;
        for(i in 1:nlex)
           for(j in 1:nlex)
              if(i!=j)
                { if(j>=i) d=j-1 else d=j;
                  if(preds[k,(i-1)*(nlex-1)+d]>.5)
                    votes[i]=votes[i]+1
                  else
                    votes[j]=votes[j]+1;
                };
        for(i in 1:nlex)
           if(votes[i]==max(votes))
             { wins[i]=TRUE; nwins=nwins+1; };
        comps[k,1]<-lex[test[k,1]]; hit=FALSE;
        for(i in 1:nlex)
           if(wins[freq[i]]==TRUE && hit==FALSE)
             { comps[k,2]<-lex[freq[i]]; hit=TRUE; };
      };
return(comps);
}

one.vs.all <- function(teach, test, fn, lex, ...)
{ nlex=length(lex);
   preds <- prediction.matrix.one.vs.all(teach, test, fn, lex);
   npreds <- nrow(preds);
   comps <- matrix("",nrow(test),2);
   for(k in 1:npreds)
      { comps[k,1] <- lex[test[k,1]];
        comps[k,2] <- lex[which.max(preds[k,])];
      }
   return(comps);
}

prediction.matrix.pairwise <- function(teach, test, fn, lex, ...)
{ nlex <- length(lex);
   pred <- matrix(,dim(test)[1],0);
   for(i in 1:nlex)
      for(j in 1:nlex)
         if(lex[i]!=lex[j])
           { teach.glm <- glm.pairwise(teach, fn, lex[i], lex[j]);
             test.predict <- matrix(predict(teach.glm, newdata=test, type="response"),,1);
             colnames(test.predict) <- paste(c(lex[i], lex[j]), collapse="_");
             pred <- cbind(pred,test.predict);
           };
   return(pred);
}

prediction.matrix.one.vs.all <- function(teach, test, fn, lex, ...)
{ nlex <- length(lex);
   pred <- matrix(,nrow(test),0);
   for(i in 1:nlex)
      { teach.glm <- glm.one.vs.all(teach, fn, lex[i]);
        test.predict <- matrix(predict(teach.glm, newdata=test, type="response"),,1);
        colnames(test.predict) <- lex[i];
        pred <- cbind(pred,test.predict);
      };
   return(pred);
}

glm.pairwise <- function(dat,fn,lex1,lex2,...)
{ attach(dat);
   f <- as.formula(paste(c(lex1,fn),collapse=" ~ "));
   glm(formula = f,
   subset = (dat[lex1]==TRUE | dat[lex2]==TRUE),
   family=binomial)
}

glm.one.vs.all <- function(dat,fn,lex1,...)
{ attach(dat);
   f <- as.formula(paste(c(lex1,fn),collapse=" ~ "));
   glm(formula = f, family=binomial)
}

pos <- function (w,lex)
{ for(i in 1:length(lex))
      if(lex[i]==w) p=i;
   return(p);
}
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html