Multinomial Logit Model with lots of Dummy Variables

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Multinomial Logit Model with lots of Dummy Variables

ghpow1
Hi All,

I am attempting to build a Multinomial Logit model with dummy variables of the following form:

Dependent Variable : 0-8 Discrete Choices

Dummy Variable 1: 965 dummy vars[hidden email][hidden email]
Dummy Variable 2: 805 dummy vars

The data set I am using has the dummy columns pre-created, so it's a table of 72,381 rows and 1770 columns.

The first 965 columns represent the dummy columns for Variable 1
The next 805 columns represent the dummy columns for Variable 2

My code to build the mlogit model looks like the following. I want to know...is there a better way of doing this without these huge equations? (I probably also need a more powerful PC to do all of this).

I'll also want to perform a joint test of significance on the first 805 coefficients...

Is this possible?

Thanks

GP

[code]

#install MLOGIT
library(mlogit)

#load mydata
mydata = 0
mydata<-read.csv(file="G:\\data.csv",head=TRUE)
my_data=0

num.rows=length(mydata[,1])
num.cols=965+805+1


my_data=matrix(0,nr=num.rows,nc=num.cols)

for(i in 1:num.rows) {

        nb=mydata[i,2]
        np=mydata[i,3]

        my_data[i,nb]=1
        my_data[i,965+np]=1
        my_data[i,1+1770]=mydata[i,1]

       
}

#convert matrix to data.frame
# convert to data frame
my_data_frame<-as.data.frame(my_data)

#check data frame headers
head(my_data_frame)

#load dataframe into mldata with choice variable
mldata<-mlogit.data(my_data_frame, varying=NULL, choice="V1771", shape="wide")

#V1771 = dependent var
#V1-V965 = variable 1 dummies
#V966-V1700 = variable 2 dummies

#regress V1771 against all 1700 variables...
mlogit.model<-mlogit(V1771~0|V1+V2+V3...+V1700,data=mldata, reflevel="0")


[/code]

Reply | Threaded
Open this post in threaded view
|

Re: Multinomial Logit Model with lots of Dummy Variables

jthetzel
If you are just looking to collapse the dummy variables into two factor
variables, the following will work.

## Generate some example data
set.seed(1234)
n <- 100
# Generate outcome
outcome <- rbinom(n, 3, 0.5)
colnames(exposures) <- paste("V", seq(1:10), sep = "")

#Generate dummy variables for A and B
A <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))
B <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))

# Combine into data frame
dat <- data.frame(outcome, A, B)
names(dat) <- c('outcome', paste("A", seq(1:5), sep = ""), paste("B",
seq(1:5), sep = ""))
head(dat)


## Collapse dummies to factor variable
A <- apply(dat, 1, function(x)
{
A <- x[2:6]
A.names <- names(x[2:6])
A.value <- A.names[A==1]
return(A.value)
})

B <- apply(dat, 1, function(x)
{
B <- x[7:11]
B.names <- names(x[7:11])
B.names
B.value <- B.names[B==1]
return(B.value)
})

# Combine into new data frame
dat.new <- data.frame(dat$outcome, A, B)

head(dat.new)



Jeremy


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Jeremy T. Hetzel
Boston University
Reply | Threaded
Open this post in threaded view
|

Re: Multinomial Logit Model with lots of Dummy Variables

ghpow1
Hi

Thanks to Jeremy for his response...

I have been able to generate the factors and generate mlogit data using his code:

mldata<-mlogit.data(mydata, varying=NULL, choice="pitch_type_1", shape="wide")

my mlogit data looks like:

"dependent_var","A variable","B Var","chid","alt"
FALSE,"110","19",1,"0"
FALSE,"110","19",1,"1"
FALSE,"110","19",1,"2"
FALSE,"110","19",1,"3"
FALSE,"110","19",1,"4"
TRUE,"110","19",1,"5"
FALSE,"110","19",1,"6"
FALSE,"110","19",1,"7"
FALSE,"110","19",1,"8"
FALSE,"110","19",2,"0"
FALSE,"110","19",2,"1"
FALSE,"110","19",2,"2"
FALSE,"110","19",2,"3"
FALSE,"110","19",2,"4"
FALSE,"110","19",2,"5"
TRUE,"110","19",2,"6"
FALSE,"110","19",2,"7"
FALSE,"110","19",2,"8"
TRUE,"110","561",3,"0"
FALSE,"110","561",3,"1"
FALSE,"110","561",3,"2"
FALSE,"110","561",3,"3"
FALSE,"110","561",3,"4"
FALSE,"110","561",3,"5"
FALSE,"110","561",3,"6"
FALSE,"110","561",3,"7"
FALSE,"110","561",3,"8"
FALSE,"110","149",4,"0"
FALSE,"110","149",4,"1"
TRUE,"110","149",4,"2"

...

The mldata contains 651431 rows.  

If I try to run this full data set I get the following error:  


> mlogit.model<- mlogit(dependent_var~0|A+B, data = mldata, reflevel="0")
Error in model.matrix.default(formula, data) :
  allocMatrix: too many elements specified
Calls: mlogit ... model.matrix.mFormula -> model.matrix -> model.matrix.default
Execution halted

Smaller datasets (595 mldata rows) and mlogit works fine and generates regression output.  

Is there a problem with mlogit and huge datasets?  

I suppose this is perhaps not the best way to assess this kind of data, but I am trying to replicate a previous analysis that was completed on a similar amount of similar data.