Word boundaries and gregexpr in R 2.2.1 (PR#8547)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

stgries
Full_Name: Stefan Th. Gries
Version: 2.2.1
OS: Windows XP (Home and Professional)
Submission from: (NULL) (68.6.34.104)


The problem is this: I have a vector of two character strings.

> text<-c("This is a first example sentence.", "And this is a second example    
 sentence.")

If I now look for word boundaries with regexpr, this is what I get:
> regexpr("\\b", text, perl=TRUE)
[1] 1 1
attr(,"match.length")
[1] 0 0

So far, so good. But with gregexpr I get:

> gregexpr("\\b", text, perl=TRUE)
Error: cannot allocate vector of size 524288 Kb
In addition: Warning messages:
1: Reached total allocation of 1015Mb: see help(memory.size)
2: Reached total allocation of 1015Mb: see help(memory.size)

Why don't I get the locations and extensions of all word boundaries?

I am using R 2.2.1 on a machine running Windows XP:
> R.version
        _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    2
minor    2.1
year     2005
month    12
day      20
svn rev  36812
language R

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Word boundaries and gregexpr in R 2.2.1 (PR#8547)

Robert Gentleman
Should be patched in R-devel, will be available shortly

[hidden email] wrote:

> Full_Name: Stefan Th. Gries
> Version: 2.2.1
> OS: Windows XP (Home and Professional)
> Submission from: (NULL) (68.6.34.104)
>
>
> The problem is this: I have a vector of two character strings.
>
>
>>text<-c("This is a first example sentence.", "And this is a second example    
>
>  sentence.")
>
> If I now look for word boundaries with regexpr, this is what I get:
>
>>regexpr("\\b", text, perl=TRUE)
>
> [1] 1 1
> attr(,"match.length")
> [1] 0 0
>
> So far, so good. But with gregexpr I get:
>
>
>>gregexpr("\\b", text, perl=TRUE)
>
> Error: cannot allocate vector of size 524288 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 1015Mb: see help(memory.size)
> 2: Reached total allocation of 1015Mb: see help(memory.size)
>
> Why don't I get the locations and extensions of all word boundaries?
>
> I am using R 2.2.1 on a machine running Windows XP:
>
>>R.version
>
>         _
> platform i386-pc-mingw32
> arch     i386
> os       mingw32
> system   i386, mingw32
> status
> major    2
> minor    2.1
> year     2005
> month    12
> day      20
> svn rev  36812
> language R
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel