"too large for hashing"

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

"too large for hashing"

Adam D. I. Kramer-3
Hello,

  I'm doing some analysis on a rather large data set. In this case,
some simple commands are failing. For example, this one:

> x$eventtype <- factor(x$eventtype)
Error in unique.default(x) : length 1093574297 is too large for hashing

...I think this is a bug, because "hashing" should not be required for the
"factor" function. Am I right? The whole column does not need to be hashed,
only the unique keys. Sure, there is the potential to overflow the key
register, but this error should be thrown only if that occurs, no?

Cordially,
Adam D. I. Kramer, Ph.D.
Data Scientist, Facebook, Inc.
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "too large for hashing"

Duncan Murdoch-2
On 05/04/2012 2:03 PM, Adam D. I. Kramer wrote:

> Hello,
>
>   I'm doing some analysis on a rather large data set. In this case,
> some simple commands are failing. For example, this one:
>
> >  x$eventtype<- factor(x$eventtype)
> Error in unique.default(x) : length 1093574297 is too large for hashing
>
> ...I think this is a bug, because "hashing" should not be required for the
> "factor" function. Am I right? The whole column does not need to be hashed,
> only the unique keys. Sure, there is the potential to overflow the key
> register, but this error should be thrown only if that occurs, no?

It looks as though the error is coming when unique() tries to determine
the unique levels in the argument, but really there's no way to answer
your question without more information.  What type of object is
x$eventtype?  It is really 1093574297 elements long?  How many unique
values does it have?

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "too large for hashing"

Adam D. I. Kramer-3
Thanks for your response, Duncan.

x$eventtype is a "character" vector (because the same hashing error
occurred when I tried to read.table() in the first place specifying
colClasses = c(..., "factor", ...).

x really is that long:

> dim(x)
[1] 1093574297         12

...the x$eventtype field has three unique values.

(I'm currently using a workaround of making a numeric column based on a
string of ifelse() and then setting class() <- factor and then setting the
labels manually.)

--Adam

On Thu, 5 Apr 2012, Duncan Murdoch wrote:

> On 05/04/2012 2:03 PM, Adam D. I. Kramer wrote:
>> Hello,
>>
>>   I'm doing some analysis on a rather large data set. In this case,
>> some simple commands are failing. For example, this one:
>>
>> >  x$eventtype<- factor(x$eventtype)
>> Error in unique.default(x) : length 1093574297 is too large for hashing
>>
>> ...I think this is a bug, because "hashing" should not be required for the
>> "factor" function. Am I right? The whole column does not need to be hashed,
>> only the unique keys. Sure, there is the potential to overflow the key
>> register, but this error should be thrown only if that occurs, no?
>
> It looks as though the error is coming when unique() tries to determine the
> unique levels in the argument, but really there's no way to answer your
> question without more information.  What type of object is x$eventtype?  It
> is really 1093574297 elements long?  How many unique values does it have?
>
> Duncan Murdoch
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.