unable to get bigglm working, ATTN: Thomas Lumley

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

unable to get bigglm working, ATTN: Thomas Lumley

stephenb
I am using an example posted in this help forum to work with a file. the head of the file looks like:
988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0

the code is:
make.data <- function (filename, chunksize, ...) {
  conn<-NULL;
  function (reset=FALSE) {
    if (reset) {
      if (!is.null(conn)) {
        close(conn);
      };
      conn <<- file (description=filename, open="r");
    } else {
      rval <- read.table (conn, nrows=chunksize,sep=' ',
        skip=0, header=FALSE,...);
      if (nrow(rval)==0) {
        close(conn);
        conn<<-NULL;
        rval<-NULL;
      } else {
        rval$relage <- rval$loctime/rval$term;
       
      };
    return(rval);
    }
  }
};

a <- make.data ( filename = "G:/sqldata/newf4.csv", chunksize = 100000,
  colClasses = list ("NULL", "Date","Date", "integer", "factor",rep("numeric",5),rep("integer",2)),
  col.names = c("id","dt", "promdt","term", "termfac", "commintr","commbal","issuebal","intr","ri","loctime","resp")
)
library(biglm);

bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
  data = a, family = binomial(link='logit'));
###   output:
> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
+   data = a, family = binomial(link='logit'));
Error in is(object, Class) :
  trying to get slot "className" from an object of a basic class ("list") with no slots
>

### the following can create a df, so the problem is not loading the data (maybe :-)
a <- read.table ( "G:/sqldata/newf4.csv", nrows= 500000, sep=' ',head=F,
  colClasses = c("NULL", "Date","Date","integer","factor",rep("numeric",5),rep("integer",2)),
  col.names = ("id","dt", "promdt","term", "termfac", "commintr","commbal","issuebal","intr","ri","loctime","resp")
)

Thanks everybody.
Reply | Threaded
Open this post in threaded view
|

Re: unable to get bigglm working, ATTN: Thomas Lumley

Thomas Lumley


Actually, I think the problem *is* reading in the data

If I try reading in your supplied lines of data with the read.table arguments() in your make.data() function I get your error message.
> data<-read.table(tmp<-textConnection(
+ "988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
+ 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
+ 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
+ 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
+ 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
+ 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
+ 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
+ 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
+ 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
+ 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
+ "), colClasses = list ("NULL", "Date","Date", "integer",
+ "factor",rep("numeric",5),rep("integer",2)),
+   col.names = c("id","dt", "promdt","term", "termfac",
+ "commintr","commbal","issuebal","intr","ri","loctime","resp")
+ )
Error in is(object, Class) :
   trying to get slot "className" from an object of a basic class ("list") with no slots

My guess is that the problem is that you use list() instead of c() for constructing your colClasses argument.  In the code for reading the file that you didn't have a problem with, you used c().

    -thomas


On Fri, 2 Jul 2010, stephenb wrote:

>
> I am using an example posted in this help forum to work with a file. the head
> of the file looks like:
> 988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
> 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
> 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
> 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
> 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
> 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
> 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
> 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
> 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
> 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
>
> the code is:
> make.data <- function (filename, chunksize, ...) {
>  conn<-NULL;
>  function (reset=FALSE) {
>    if (reset) {
>      if (!is.null(conn)) {
>        close(conn);
>      };
>      conn <<- file (description=filename, open="r");
>    } else {
>      rval <- read.table (conn, nrows=chunksize,sep=' ',
>        skip=0, header=FALSE,...);
>      if (nrow(rval)==0) {
>        close(conn);
>        conn<<-NULL;
>        rval<-NULL;
>      } else {
>        rval$relage <- rval$loctime/rval$term;
>
>      };
>    return(rval);
>    }
>  }
> };
>
> a <- make.data ( filename = "G:/sqldata/newf4.csv", chunksize = 100000,
>  colClasses = list ("NULL", "Date","Date", "integer",
> "factor",rep("numeric",5),rep("integer",2)),
>  col.names = c("id","dt", "promdt","term", "termfac",
> "commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
> library(biglm);
>
> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
>  data = a, family = binomial(link='logit'));
> ###   output:
>> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
> +   data = a, family = binomial(link='logit'));
> Error in is(object, Class) :
>  trying to get slot "className" from an object of a basic class ("list")
> with no slots
>>
>
> ### the following can create a df, so the problem is not loading the data
> (maybe :-)
> a <- read.table ( "G:/sqldata/newf4.csv", nrows= 500000, sep=' ',head=F,
>  colClasses = c("NULL",
> "Date","Date","integer","factor",rep("numeric",5),rep("integer",2)),
>  col.names = ("id","dt", "promdt","term", "termfac",
> "commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
>
> Thanks everybody.
> --
> View this message in context: http://r.789695.n4.nabble.com/unable-to-get-bigglm-working-ATTN-Thomas-Lumley-tp2276524p2276524.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

RE: unable to get bigglm working, ATTN: Thomas Lumley

stephenb

Sincere apologies and many thanks. The pasted code from the help forum has the error and I did not see it. When I wrote my own import I used “c” as I should.

See:

http://r.789695.n4.nabble.com/R-Example-function-for-bigglm-biglm-data-input-from-file-td816496.html#a816496

 

Stephen Bond

| Senior Analyst | Treasury Analytics | 416-956-3092


From: Thomas Lumley [via R] [mailto:[hidden email]]
Sent: Friday, July 02, 2010 12:38 PM
To: Bond, Stephen
Subject: Re: unable to get bigglm working, ATTN: Thomas Lumley

 



Actually, I think the problem *is* reading in the data

If I try reading in your supplied lines of data with the read.table arguments() in your make.data() function I get your error message.
> data<-read.table(tmp<-textConnection(
+ "988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
+ 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
+ 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
+ 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
+ 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
+ 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
+ 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
+ 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
+ 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
+ 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
+ "), colClasses = list ("NULL", "Date","Date", "integer",
+ "factor",rep("numeric",5),rep("integer",2)),
+   col.names = c("id","dt", "promdt","term", "termfac",
+ "commintr","commbal","issuebal","intr","ri","loctime","resp")
+ )
Error in is(object, Class) :
   trying to get slot "className" from an object of a basic class ("list") with no slots

My guess is that the problem is that you use list() instead of c() for constructing your colClasses argument.  In the code for reading the file that you didn't have a problem with, you used c().

    -thomas


On Fri, 2 Jul 2010, stephenb wrote:


>
> I am using an example posted in this help forum to work with a file. the head
> of the file looks like:
> 988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
> 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
> 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
> 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
> 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
> 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
> 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
> 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
> 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
> 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
>
> the code is:
> make.data <- function (filename, chunksize, ...) {
>  conn<-NULL;
>  function (reset=FALSE) {
>    if (reset) {
>      if (!is.null(conn)) {
>        close(conn);
>      };
>      conn <<- file (description=filename, open="r");
>    } else {
>      rval <- read.table (conn, nrows=chunksize,sep=' ',
>        skip=0, header=FALSE,...);
>      if (nrow(rval)==0) {
>        close(conn);
>        conn<<-NULL;
>        rval<-NULL;
>      } else {
>        rval$relage <- rval$loctime/rval$term;
>
>      };
>    return(rval);
>    }
>  }
> };
>
> a <- make.data ( filename = "G:/sqldata/newf4.csv", chunksize = 100000,
>  colClasses = list ("NULL", "Date","Date", "integer",
> "factor",rep("numeric",5),rep("integer",2)),
>  col.names = c("id","dt", "promdt","term", "termfac",
> "commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
> library(biglm);
>
> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
>  data = a, family = binomial(link='logit'));
> ###   output:
>> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
> +   data = a, family = binomial(link='logit'));
> Error in is(object, Class) :
>  trying to get slot "className" from an object of a basic class ("list")
> with no slots
>>
>
> ### the following can create a df, so the problem is not loading the data
> (maybe :-)
> a <- read.table ( "G:/sqldata/newf4.csv", nrows= 500000, sep=' ',head=F,
>  colClasses = c("NULL",
> "Date","Date","integer","factor",rep("numeric",5),rep("integer",2)),
>  col.names = ("id","dt", "promdt","term", "termfac",
> "commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
>
> Thanks everybody.
> --
> View this message in context: http://r.789695.n4.nabble.com/unable-to-get-bigglm-working-ATTN-Thomas-Lumley-tp2276524p2276524.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


View message @ http://r.789695.n4.nabble.com/unable-to-get-bigglm-working-ATTN-Thomas-Lumley-tp2276524p2276661.html
To unsubscribe from unable to get bigglm working, ATTN: Thomas Lumley, click here.

 

Reply | Threaded
Open this post in threaded view
|

Re: unable to get bigglm working, ATTN: Thomas Lumley

stephenb
In reply to this post by stephenb
after correcting the error spotted by Thomas, I tried again and it goes to work, but there is no result after 1 hour.
is there anything I can do to debug?
the dataset has 12mln rows and SAS on a server will produce results in 15 secs. I am running this on PC with XP (hence mem limit of 2.5 gigs) fresh after restart and no other applications.
thank you.
Stephen
Reply | Threaded
Open this post in threaded view
|

Re: unable to get bigglm working, ATTN: Thomas Lumley

stephenb
In reply to this post by Thomas Lumley
the model fails to converge after more than 3 hours ( I went home so don't know how long it took)

> bigglm (formula = resp ~ relage+I(relage^2)+termfac+ri ,
+   data = a, family = binomial(link='logit'));
Large data regression model: bigglm(formula = resp ~ relage + I(relage^2) + termfac + ri,
    data = a, family = binomial(link = "logit"))
Sample size =  12758187
failed to converge after 8  iterations
Warning message:
In bigglm.function(formula = resp ~ relage + I(relage^2) + termfac +  :
  ran out of iterations and failed to converge

SAS converges
NOTE: PROC LOGISTIC is modeling the probability that resp='1'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 12758187 observations read from the data set SRVRUSER.COMMIT.
NOTE: The data set WORK.OUT3 has 11 observations and 15 variables.
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           2:25.42
      cpu time            1:16.79

I did not see a trace argument in bigglm. is there another way to see what is happening?
Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: unable to get bigglm working, ATTN: Thomas Lumley

stephenb
I decided to give it 1 more variable, which is strongly significant to help the optimization and it throws:

> bigglm (formula = resp ~ relage+relage2+termfac+ri+sn ,
+   data = a, family = binomial(link='logit'));
Error in bigglm.function(formula = resp ~ relage + relage2 + termfac +  :
  model matrices incompatible