|
|
I am using an example posted in this help forum to work with a file. the head of the file looks like:
988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
the code is:
make.data <- function (filename, chunksize, ...) {
conn<-NULL;
function (reset=FALSE) {
if (reset) {
if (!is.null(conn)) {
close(conn);
};
conn <<- file (description=filename, open="r");
} else {
rval <- read.table (conn, nrows=chunksize,sep=' ',
skip=0, header=FALSE,...);
if (nrow(rval)==0) {
close(conn);
conn<<-NULL;
rval<-NULL;
} else {
rval$relage <- rval$loctime/rval$term;
};
return(rval);
}
}
};
a <- make.data ( filename = "G:/sqldata/newf4.csv", chunksize = 100000,
colClasses = list ("NULL", "Date","Date", "integer", "factor",rep("numeric",5),rep("integer",2)),
col.names = c("id","dt", "promdt","term", "termfac", "commintr","commbal","issuebal","intr","ri","loctime","resp")
)
library(biglm);
bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
data = a, family = binomial(link='logit'));
### output:
> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
+ data = a, family = binomial(link='logit'));
Error in is(object, Class) :
trying to get slot "className" from an object of a basic class ("list") with no slots
>
### the following can create a df, so the problem is not loading the data (maybe :-)
a <- read.table ( "G:/sqldata/newf4.csv", nrows= 500000, sep=' ',head=F,
colClasses = c("NULL", "Date","Date","integer","factor",rep("numeric",5),rep("integer",2)),
col.names = ("id","dt", "promdt","term", "termfac", "commintr","commbal","issuebal","intr","ri","loctime","resp")
)
Thanks everybody.
|
|
Actually, I think the problem *is* reading in the data
If I try reading in your supplied lines of data with the read.table arguments() in your make.data() function I get your error message.
> data<-read.table(tmp<-textConnection(
+ "988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
+ 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
+ 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
+ 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
+ 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
+ 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
+ 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
+ 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
+ 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
+ 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
+ "), colClasses = list ("NULL", "Date","Date", "integer",
+ "factor",rep("numeric",5),rep("integer",2)),
+ col.names = c("id","dt", "promdt","term", "termfac",
+ "commintr","commbal","issuebal","intr","ri","loctime","resp")
+ )
Error in is(object, Class) :
trying to get slot "className" from an object of a basic class ("list") with no slots
My guess is that the problem is that you use list() instead of c() for constructing your colClasses argument. In the code for reading the file that you didn't have a problem with, you used c().
-thomas
On Fri, 2 Jul 2010, stephenb wrote:
>
> I am using an example posted in this help forum to work with a file. the head
> of the file looks like:
> 988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
> 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
> 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
> 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
> 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
> 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
> 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
> 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
> 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
> 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
>
> the code is:
> make.data <- function (filename, chunksize, ...) {
> conn<-NULL;
> function (reset=FALSE) {
> if (reset) {
> if (!is.null(conn)) {
> close(conn);
> };
> conn <<- file (description=filename, open="r");
> } else {
> rval <- read.table (conn, nrows=chunksize,sep=' ',
> skip=0, header=FALSE,...);
> if (nrow(rval)==0) {
> close(conn);
> conn<<-NULL;
> rval<-NULL;
> } else {
> rval$relage <- rval$loctime/rval$term;
>
> };
> return(rval);
> }
> }
> };
>
> a <- make.data ( filename = "G:/sqldata/newf4.csv", chunksize = 100000,
> colClasses = list ("NULL", "Date","Date", "integer",
> "factor",rep("numeric",5),rep("integer",2)),
> col.names = c("id","dt", "promdt","term", "termfac",
> "commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
> library(biglm);
>
> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
> data = a, family = binomial(link='logit'));
> ### output:
>> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
> + data = a, family = binomial(link='logit'));
> Error in is(object, Class) :
> trying to get slot "className" from an object of a basic class ("list")
> with no slots
>>
>
> ### the following can create a df, so the problem is not loading the data
> (maybe :-)
> a <- read.table ( "G:/sqldata/newf4.csv", nrows= 500000, sep=' ',head=F,
> colClasses = c("NULL",
> "Date","Date","integer","factor",rep("numeric",5),rep("integer",2)),
> col.names = ("id","dt", "promdt","term", "termfac",
> "commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
>
> Thanks everybody.
> --
> View this message in context: http://r.789695.n4.nabble.com/unable-to-get-bigglm-working-ATTN-Thomas-Lumley-tp2276524p2276524.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Sincere apologies and many thanks. The pasted
code from the help forum has the error and I did not see it. When I wrote my
own import I used “c” as I should.
See:
http://r.789695.n4.nabble.com/R-Example-function-for-bigglm-biglm-data-input-from-file-td816496.html#a816496
Stephen Bond
| Senior Analyst | Treasury Analytics |
416-956-3092
From: Thomas Lumley
[via R] [mailto:[hidden email]]
Sent: Friday, July 02, 2010 12:38
PM
To: Bond, Stephen
Subject: Re: unable to get bigglm
working, ATTN: Thomas Lumley
Actually, I think the problem *is* reading in the data
If I try reading in your supplied lines of data with the read.table arguments()
in your make.data() function I get your error message.
> data<-read.table(tmp<-textConnection(
+ "988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000
0 0
+ 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
+ 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
+ 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3 0
+ 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4 0
+ 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7 0
+ 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8 0
+ 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9 0
+ 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10 0
+ 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11 0
+ "), colClasses = list ("NULL",
"Date","Date", "integer",
+ "factor",rep("numeric",5),rep("integer",2)),
+ col.names = c("id","dt",
"promdt","term", "termfac",
+
"commintr","commbal","issuebal","intr","ri","loctime","resp")
+ )
Error in is(object, Class) :
trying to get slot "className" from an object of a basic
class ("list") with no slots
My guess is that the problem is that you use list() instead of c() for
constructing your colClasses argument. In the code for reading the file that
you didn't have a problem with, you used c().
-thomas
On Fri, 2 Jul 2010, stephenb wrote:
>
> I am using an example posted in this help forum to work with a file. the
head
> of the file looks like:
> 988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
> 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
> 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2
0
> 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 3
0
> 988887 2007-03-09 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 4
0
> 988887 2007-03-12 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 7
0
> 988887 2007-03-13 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 8
0
> 988887 2007-03-14 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 9
0
> 988887 2007-03-15 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 10
0
> 988887 2007-03-16 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 11
0
>
> the code is:
> make.data <- function (filename, chunksize, ...) {
> conn<-NULL;
> function (reset=FALSE) {
> if (reset) {
> if (!is.null(conn)) {
> close(conn);
> };
> conn <<- file (description=filename,
open="r");
> } else {
> rval <- read.table (conn, nrows=chunksize,sep=' ',
> skip=0, header=FALSE,...);
> if (nrow(rval)==0) {
> close(conn);
> conn<<-NULL;
> rval<-NULL;
> } else {
> rval$relage <- rval$loctime/rval$term;
>
> };
> return(rval);
> }
> }
> };
>
> a <- make.data ( filename = "G:/sqldata/newf4.csv", chunksize
= 100000,
> colClasses = list ("NULL",
"Date","Date", "integer",
> "factor",rep("numeric",5),rep("integer",2)),
> col.names = c("id","dt",
"promdt","term", "termfac",
>
"commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
> library(biglm);
>
> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
> data = a, family = binomial(link='logit'));
> ### output:
>> bigglm (formula = resp ~ poly(relage,2,raw=TRUE)+termfac+ri ,
> + data = a, family = binomial(link='logit'));
> Error in is(object, Class) :
> trying to get slot "className" from an object of a basic
class ("list")
> with no slots
>>
>
> ### the following can create a df, so the problem is not loading the data
> (maybe :-)
> a <- read.table ( "G:/sqldata/newf4.csv", nrows= 500000,
sep=' ',head=F,
> colClasses = c("NULL",
>
"Date","Date","integer","factor",rep("numeric",5),rep("integer",2)),
> col.names = ("id","dt",
"promdt","term", "termfac",
>
"commintr","commbal","issuebal","intr","ri","loctime","resp")
> )
>
> Thanks everybody.
> --
> View this message in context: http://r.789695.n4.nabble.com/unable-to-get-bigglm-working-ATTN-Thomas-Lumley-tp2276524p2276524.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle
______________________________________________
[hidden
email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
|
|
after correcting the error spotted by Thomas, I tried again and it goes to work, but there is no result after 1 hour.
is there anything I can do to debug?
the dataset has 12mln rows and SAS on a server will produce results in 15 secs. I am running this on PC with XP (hence mem limit of 2.5 gigs) fresh after restart and no other applications.
thank you.
Stephen
|
|
the model fails to converge after more than 3 hours ( I went home so don't know how long it took)
> bigglm (formula = resp ~ relage+I(relage^2)+termfac+ri ,
+ data = a, family = binomial(link='logit'));
Large data regression model: bigglm(formula = resp ~ relage + I(relage^2) + termfac + ri,
data = a, family = binomial(link = "logit"))
Sample size = 12758187
failed to converge after 8 iterations
Warning message:
In bigglm.function(formula = resp ~ relage + I(relage^2) + termfac + :
ran out of iterations and failed to converge
SAS converges
NOTE: PROC LOGISTIC is modeling the probability that resp='1'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 12758187 observations read from the data set SRVRUSER.COMMIT.
NOTE: The data set WORK.OUT3 has 11 observations and 15 variables.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 2:25.42
cpu time 1:16.79
I did not see a trace argument in bigglm. is there another way to see what is happening?
Thank you.
|
|
I decided to give it 1 more variable, which is strongly significant to help the optimization and it throws:
> bigglm (formula = resp ~ relage+relage2+termfac+ri+sn ,
+ data = a, family = binomial(link='logit'));
Error in bigglm.function(formula = resp ~ relage + relage2 + termfac + :
model matrices incompatible
|
|