Quantcast

dancing with alloc.col

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

dancing with alloc.col

Kaupas, George

I’m running into this “truelength is greater than 1000 items over-allocated” warning/error as I use := to add columns to a data.frame, e.g.:

 

tl (1346) is greater than 1000 items over-allocated (ncol = 308). If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().

 

The long preamble to this is a stackoverflow thread (http://stackoverflow.com/questions/10015544) in which I needed to update the contents of one data.table with the contents of another.

 

The solution required the columns of both data.tables to match, hence my pre-processing loop to add columns to each data.table to satisfy the identical(names(dt1),names(dt2)) criteria. I may have to re-architect this depending on what is going on with this allocation business.

 

If, for example, dt1 has 200 columns, and dt2 has 2000, and together they have 2100 unique columns, I’m going to add 1900 columns to dt1. If I set alloc.col to 2100 before my column-adding loop, I’ll get slapped because 2100 is more than 1000 greater than the 200 columns present in dt1.

 

So do I need to spoon-feed alloc.col? Every iteration through the loop set it to length(dt1)+1 before adding a column? That seems rather brutal. Alternatively checking for the delta between truelength and length, and how close that is to the magic 1000 number, and then only adjusting the setting seems fragile.

 

I did try to make sense of the help for alloc.col. Regarding the bit about “if two or more variables are bound to the same data.table”; the column addition is within a function, and only one variable references the data.table, at least in the scope of the function. The function calling that function has a variable for the data.table too, so I don’t know if that counts. Then there is mention of using copy (not sure how that helps, and BTW the hyperlink for copy goes to the page for setkey, which does mention copy, but suggests “See ?copy” which just conjures up the setkey page again), setting alloc.col, or changing datatable.alloccol (doesn’t seem to help).

 

The warning asked for sessionInfo; FWIW, here it is:

 

R version 2.15.0 (2012-03-30)

Platform: x86_64-unknown-linux-gnu (64-bit)

 

locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8

[7] LC_PAPER=C                 LC_NAME=C

[9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods

[7] base

 

other attached packages:

[1] data.table_1.8.2

 

Thanks

George

 


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: dancing with alloc.col

Matthew Dowle

:)

When the column allocation is full, there's a formula to decide how much
to grow the allocation by. The check is there (iirc) to make sure that's
not growing the table too much. If you have 1 million columns, you
probably don't want to double that to 2 million, just to add 1 column. But
if you do, then use alloc.col first. That was the thinking. But that
thinking is biting in your case.

Simplest might be to downgrade the warning to a message when verbosity is
on, then.

In the meantime, does wrapping with suppressWarning() work around it for
now? Since in your case you know that over-allocating by more than 1000 is
appropriate.

    suppressWarnings(DT[,newcol:=])

Thanks for reporting. Interesting use case.

Matthew

> I'm running into this "truelength is greater than 1000 items
> over-allocated" warning/error as I use := to add columns to a data.frame,
> e.g.:
>
> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If you
> didn't set the datatable.alloccol option very large, please report this to
> datatable-help including the result of sessionInfo().
>
> The long preamble to this is a stackoverflow thread
> (http://stackoverflow.com/questions/10015544) in which I needed to update
> the contents of one data.table with the contents of another.
>
> The solution required the columns of both data.tables to match, hence my
> pre-processing loop to add columns to each data.table to satisfy the
> identical(names(dt1),names(dt2)) criteria. I may have to re-architect this
> depending on what is going on with this allocation business.
>
> If, for example, dt1 has 200 columns, and dt2 has 2000, and together they
> have 2100 unique columns, I'm going to add 1900 columns to dt1. If I set
> alloc.col to 2100 before my column-adding loop, I'll get slapped because
> 2100 is more than 1000 greater than the 200 columns present in dt1.
>
> So do I need to spoon-feed alloc.col? Every iteration through the loop set
> it to length(dt1)+1 before adding a column? That seems rather brutal.
> Alternatively checking for the delta between truelength and length, and
> how close that is to the magic 1000 number, and then only adjusting the
> setting seems fragile.
>
> I did try to make sense of the help for alloc.col. Regarding the bit about
> "if two or more variables are bound to the same data.table"; the column
> addition is within a function, and only one variable references the
> data.table, at least in the scope of the function. The function calling
> that function has a variable for the data.table too, so I don't know if
> that counts. Then there is mention of using copy (not sure how that helps,
> and BTW the hyperlink for copy goes to the page for setkey, which does
> mention copy, but suggests "See ?copy" which just conjures up the setkey
> page again), setting alloc.col, or changing datatable.alloccol (doesn't
> seem to help).
>
> The warning asked for sessionInfo; FWIW, here it is:
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] data.table_1.8.2
>
> Thanks
> George
>
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: dancing with alloc.col

Matthew Dowle

Oh and since you're looping the := or set(), then options(warn=0) before
the loop is probably faster than repeated calls to suppressWarnings().

>
> :)
>
> When the column allocation is full, there's a formula to decide how much
> to grow the allocation by. The check is there (iirc) to make sure that's
> not growing the table too much. If you have 1 million columns, you
> probably don't want to double that to 2 million, just to add 1 column. But
> if you do, then use alloc.col first. That was the thinking. But that
> thinking is biting in your case.
>
> Simplest might be to downgrade the warning to a message when verbosity is
> on, then.
>
> In the meantime, does wrapping with suppressWarning() work around it for
> now? Since in your case you know that over-allocating by more than 1000 is
> appropriate.
>
>     suppressWarnings(DT[,newcol:=])
>
> Thanks for reporting. Interesting use case.
>
> Matthew
>
>> I'm running into this "truelength is greater than 1000 items
>> over-allocated" warning/error as I use := to add columns to a
>> data.frame,
>> e.g.:
>>
>> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If you
>> didn't set the datatable.alloccol option very large, please report this
>> to
>> datatable-help including the result of sessionInfo().
>>
>> The long preamble to this is a stackoverflow thread
>> (http://stackoverflow.com/questions/10015544) in which I needed to
>> update
>> the contents of one data.table with the contents of another.
>>
>> The solution required the columns of both data.tables to match, hence my
>> pre-processing loop to add columns to each data.table to satisfy the
>> identical(names(dt1),names(dt2)) criteria. I may have to re-architect
>> this
>> depending on what is going on with this allocation business.
>>
>> If, for example, dt1 has 200 columns, and dt2 has 2000, and together
>> they
>> have 2100 unique columns, I'm going to add 1900 columns to dt1. If I set
>> alloc.col to 2100 before my column-adding loop, I'll get slapped because
>> 2100 is more than 1000 greater than the 200 columns present in dt1.
>>
>> So do I need to spoon-feed alloc.col? Every iteration through the loop
>> set
>> it to length(dt1)+1 before adding a column? That seems rather brutal.
>> Alternatively checking for the delta between truelength and length, and
>> how close that is to the magic 1000 number, and then only adjusting the
>> setting seems fragile.
>>
>> I did try to make sense of the help for alloc.col. Regarding the bit
>> about
>> "if two or more variables are bound to the same data.table"; the column
>> addition is within a function, and only one variable references the
>> data.table, at least in the scope of the function. The function calling
>> that function has a variable for the data.table too, so I don't know if
>> that counts. Then there is mention of using copy (not sure how that
>> helps,
>> and BTW the hyperlink for copy goes to the page for setkey, which does
>> mention copy, but suggests "See ?copy" which just conjures up the setkey
>> page again), setting alloc.col, or changing datatable.alloccol (doesn't
>> seem to help).
>>
>> The warning asked for sessionInfo; FWIW, here it is:
>>
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C                 LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods
>> [7] base
>>
>> other attached packages:
>> [1] data.table_1.8.2
>>
>> Thanks
>> George
>>
>> _______________________________________________
>> datatable-help mailing list
>> [hidden email]
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: dancing with alloc.col

Kaupas, George
Thanks for the quick and patient response, as it was indeed my own fault.

In the interest of debugging I had set options(warn=3) as well as options(datatable.verbose=TRUE); setting warn=0 does indeed allow my code to run to satisfactory completion.

I have to stop ignoring the subtle (to me) messages R throws; in this case, "Error in ... (converted from warning)".

So a followup general R question is, if I use options(warn=0), my .Rout contains a line like "There were 46 warnings (use warnings() to see them)". If instead I wrap the := statements with suppressWarnings(), I don't get that. Is there a way to suppress the "There were n warnings" message?

-----Original Message-----
From: Matthew Dowle [mailto:[hidden email]]
Sent: Wednesday, August 08, 2012 5:45 AM
To: Kaupas, George
Cc: [hidden email]
Subject: Re: [datatable-help] dancing with alloc.col


Oh and since you're looping the := or set(), then options(warn=0) before the loop is probably faster than repeated calls to suppressWarnings().

>
> :)
>
> When the column allocation is full, there's a formula to decide how
> much to grow the allocation by. The check is there (iirc) to make sure
> that's not growing the table too much. If you have 1 million columns,
> you probably don't want to double that to 2 million, just to add 1
> column. But if you do, then use alloc.col first. That was the
> thinking. But that thinking is biting in your case.
>
> Simplest might be to downgrade the warning to a message when verbosity
> is on, then.
>
> In the meantime, does wrapping with suppressWarning() work around it
> for now? Since in your case you know that over-allocating by more than
> 1000 is appropriate.
>
>     suppressWarnings(DT[,newcol:=])
>
> Thanks for reporting. Interesting use case.
>
> Matthew
>
>> I'm running into this "truelength is greater than 1000 items
>> over-allocated" warning/error as I use := to add columns to a
>> data.frame,
>> e.g.:
>>
>> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If
>> you didn't set the datatable.alloccol option very large, please
>> report this to datatable-help including the result of sessionInfo().
>>
>> The long preamble to this is a stackoverflow thread
>> (http://stackoverflow.com/questions/10015544) in which I needed to
>> update the contents of one data.table with the contents of another.
>>
>> The solution required the columns of both data.tables to match, hence
>> my pre-processing loop to add columns to each data.table to satisfy
>> the
>> identical(names(dt1),names(dt2)) criteria. I may have to re-architect
>> this depending on what is going on with this allocation business.
>>
>> If, for example, dt1 has 200 columns, and dt2 has 2000, and together
>> they have 2100 unique columns, I'm going to add 1900 columns to dt1.
>> If I set alloc.col to 2100 before my column-adding loop, I'll get
>> slapped because
>> 2100 is more than 1000 greater than the 200 columns present in dt1.
>>
>> So do I need to spoon-feed alloc.col? Every iteration through the
>> loop set it to length(dt1)+1 before adding a column? That seems
>> rather brutal.
>> Alternatively checking for the delta between truelength and length,
>> and how close that is to the magic 1000 number, and then only
>> adjusting the setting seems fragile.
>>
>> I did try to make sense of the help for alloc.col. Regarding the bit
>> about "if two or more variables are bound to the same data.table";
>> the column addition is within a function, and only one variable
>> references the data.table, at least in the scope of the function. The
>> function calling that function has a variable for the data.table too,
>> so I don't know if that counts. Then there is mention of using copy
>> (not sure how that helps, and BTW the hyperlink for copy goes to the
>> page for setkey, which does mention copy, but suggests "See ?copy"
>> which just conjures up the setkey page again), setting alloc.col, or
>> changing datatable.alloccol (doesn't seem to help).
>>
>> The warning asked for sessionInfo; FWIW, here it is:
>>
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C                 LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods
>> [7] base
>>
>> other attached packages:
>> [1] data.table_1.8.2
>>
>> Thanks
>> George
>>
>> _______________________________________________
>> datatable-help mailing list
>> [hidden email]
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl
>> e-help
>
>


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: dancing with alloc.col

Matthew Dowle

> Thanks for the quick and patient response, as it was indeed my own fault.

Hardly your fault. The 1000 over-allocation check hasn't come up before
and it isn't documented.

> In the interest of debugging I had set options(warn=3)

Ah, that makes sense. Although 3 is the same as 2 afaik; i.e., 2 or larger
means warnings are turned into errors.

> as well as
> options(datatable.verbose=TRUE); setting warn=0 does indeed allow my code
> to run to satisfactory completion.

Great, glad the workaround works. I'll still look at downgrading or
removing that warning.

>
> I have to stop ignoring the subtle (to me) messages R throws; in this
> case, "Error in ... (converted from warning)".
>
> So a followup general R question is, if I use options(warn=0), my .Rout
> contains a line like "There were 46 warnings (use warnings() to see
> them)". If instead I wrap the := statements with suppressWarnings(), I
> don't get that. Is there a way to suppress the "There were n warnings"
> message?

Oops, I meant oldwarn=options(warn=-1) (or any negative value according to
?options) before the loop, to ignore the warnings. Then after the loop
setback to the old value: options(warn=oldwarn).

>
> -----Original Message-----
> From: Matthew Dowle [mailto:[hidden email]]
> Sent: Wednesday, August 08, 2012 5:45 AM
> To: Kaupas, George
> Cc: [hidden email]
> Subject: Re: [datatable-help] dancing with alloc.col
>
>
> Oh and since you're looping the := or set(), then options(warn=0) before
> the loop is probably faster than repeated calls to suppressWarnings().
>
>>
>> :)
>>
>> When the column allocation is full, there's a formula to decide how
>> much to grow the allocation by. The check is there (iirc) to make sure
>> that's not growing the table too much. If you have 1 million columns,
>> you probably don't want to double that to 2 million, just to add 1
>> column. But if you do, then use alloc.col first. That was the
>> thinking. But that thinking is biting in your case.
>>
>> Simplest might be to downgrade the warning to a message when verbosity
>> is on, then.
>>
>> In the meantime, does wrapping with suppressWarning() work around it
>> for now? Since in your case you know that over-allocating by more than
>> 1000 is appropriate.
>>
>>     suppressWarnings(DT[,newcol:=])
>>
>> Thanks for reporting. Interesting use case.
>>
>> Matthew
>>
>>> I'm running into this "truelength is greater than 1000 items
>>> over-allocated" warning/error as I use := to add columns to a
>>> data.frame,
>>> e.g.:
>>>
>>> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If
>>> you didn't set the datatable.alloccol option very large, please
>>> report this to datatable-help including the result of sessionInfo().
>>>
>>> The long preamble to this is a stackoverflow thread
>>> (http://stackoverflow.com/questions/10015544) in which I needed to
>>> update the contents of one data.table with the contents of another.
>>>
>>> The solution required the columns of both data.tables to match, hence
>>> my pre-processing loop to add columns to each data.table to satisfy
>>> the
>>> identical(names(dt1),names(dt2)) criteria. I may have to re-architect
>>> this depending on what is going on with this allocation business.
>>>
>>> If, for example, dt1 has 200 columns, and dt2 has 2000, and together
>>> they have 2100 unique columns, I'm going to add 1900 columns to dt1.
>>> If I set alloc.col to 2100 before my column-adding loop, I'll get
>>> slapped because
>>> 2100 is more than 1000 greater than the 200 columns present in dt1.
>>>
>>> So do I need to spoon-feed alloc.col? Every iteration through the
>>> loop set it to length(dt1)+1 before adding a column? That seems
>>> rather brutal.
>>> Alternatively checking for the delta between truelength and length,
>>> and how close that is to the magic 1000 number, and then only
>>> adjusting the setting seems fragile.
>>>
>>> I did try to make sense of the help for alloc.col. Regarding the bit
>>> about "if two or more variables are bound to the same data.table";
>>> the column addition is within a function, and only one variable
>>> references the data.table, at least in the scope of the function. The
>>> function calling that function has a variable for the data.table too,
>>> so I don't know if that counts. Then there is mention of using copy
>>> (not sure how that helps, and BTW the hyperlink for copy goes to the
>>> page for setkey, which does mention copy, but suggests "See ?copy"
>>> which just conjures up the setkey page again), setting alloc.col, or
>>> changing datatable.alloccol (doesn't seem to help).
>>>
>>> The warning asked for sessionInfo; FWIW, here it is:
>>>
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=C                 LC_NAME=C
>>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods
>>> [7] base
>>>
>>> other attached packages:
>>> [1] data.table_1.8.2
>>>
>>> Thanks
>>> George
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> [hidden email]
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl
>>> e-help
>>
>>
>
>
>


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Loading...