strsplit, keeping delimiters

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

strsplit, keeping delimiters

hadley wickham
Hi all,

Does anyone have a version of strsplit that keeps the string that is
split by.  e.g. from
x <- "A: 123 B: 456 C: 678"

I'd like to get

c("A:", "123 ", "B: ", "456 ", "C: ", 678)

but
strsplit(x, "[A-Z]+:")

gives me
c("", " 123 ", " 456 ", " 678")

Any ideas?

Thanks,

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strsplit, keeping delimiters

Gabor Grothendieck
Try this:

> library(gsubfn)
> x <- "A: 123 B: 456 C: 678"
> strapply(x, "[^ :]+[ :]|[^ :]+$")
[[1]]
[1] "A:"   "123 " "B:"   "456 " "C:"   "678"

and check out the gsubfn home page at:

http://gsubfn.googlecode.com


On Sat, Jun 14, 2008 at 1:35 AM, hadley wickham <[hidden email]> wrote:

> Hi all,
>
> Does anyone have a version of strsplit that keeps the string that is
> split by.  e.g. from
> x <- "A: 123 B: 456 C: 678"
>
> I'd like to get
>
> c("A:", "123 ", "B: ", "456 ", "C: ", 678)
>
> but
> strsplit(x, "[A-Z]+:")
>
> gives me
> c("", " 123 ", " 456 ", " 678")
>
> Any ideas?
>
> Thanks,
>
> Hadley
>
> --
> http://had.co.nz/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strsplit, keeping delimiters

hadley wickham
On Sat, Jun 14, 2008 at 12:55 AM, Gabor Grothendieck
<[hidden email]> wrote:

> Try this:
>
>> library(gsubfn)
>> x <- "A: 123 B: 456 C: 678"
>> strapply(x, "[^ :]+[ :]|[^ :]+$")
> [[1]]
> [1] "A:"   "123 " "B:"   "456 " "C:"   "678"
>
> and check out the gsubfn home page at:
>
> http://gsubfn.googlecode.com

Thanks Gabor!
Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strsplit, keeping delimiters

Martin Morgan
"hadley wickham" <[hidden email]> writes:
n
> On Sat, Jun 14, 2008 at 12:55 AM, Gabor Grothendieck
> <[hidden email]> wrote:
>> Try this:
>>
>>> library(gsubfn)
>>> x <- "A: 123 B: 456 C: 678"
>>> strapply(x, "[^ :]+[ :]|[^ :]+$")
>> [[1]]
>> [1] "A:"   "123 " "B:"   "456 " "C:"   "678"

Also

> strsplit(x, "(?<=[0-9:] )", perl=TRUE)
[[1]]
[1] "A: "  "123 " "B: "  "456 " "C: "  "678"

which uses perl's zero-length lookbehind to match "" preceed by a
digit or : and then a space. This is not quite what you asked for

> I'd like to get

> c("A:", "123 ", "B: ", "456 ", "C: ", 678)

(no space after A:) or what Gabor offered (no spaces after :) but maybe
what you intended?

Martin

>>
>> and check out the gsubfn home page at:
>>
>> http://gsubfn.googlecode.com
>
> Thanks Gabor!
> Hadley
>
> --
> http://had.co.nz/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strsplit, keeping delimiters

hadley wickham
On Sat, Jun 14, 2008 at 10:20 AM, Martin Morgan <[hidden email]> wrote:

> "hadley wickham" <[hidden email]> writes:
> n
>> On Sat, Jun 14, 2008 at 12:55 AM, Gabor Grothendieck
>> <[hidden email]> wrote:
>>> Try this:
>>>
>>>> library(gsubfn)
>>>> x <- "A: 123 B: 456 C: 678"
>>>> strapply(x, "[^ :]+[ :]|[^ :]+$")
>>> [[1]]
>>> [1] "A:"   "123 " "B:"   "456 " "C:"   "678"
>
> Also
>
>> strsplit(x, "(?<=[0-9:] )", perl=TRUE)
> [[1]]
> [1] "A: "  "123 " "B: "  "456 " "C: "  "678"
>
> which uses perl's zero-length lookbehind to match "" preceed by a
> digit or : and then a space. This is not quite what you asked for

My real example is actually a little more complicated

x <- "AC: 123 BDEF: 456 CADSDFSDFSF: 6sdf:78"

so the look-ahead approach doesn't work (and neither does a
look-behind because it has to be fixed length).

>> I'd like to get
>
>> c("A:", "123 ", "B: ", "456 ", "C: ", 678)
>
> (no space after A:) or what Gabor offered (no spaces after :) but maybe
> what you intended?

Either way is fine, since I'll be stripping off the spaces later anyway.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strsplit, keeping delimiters

Gabor Grothendieck
On Sat, Jun 14, 2008 at 11:46 AM, hadley wickham <[hidden email]> wrote:

> On Sat, Jun 14, 2008 at 10:20 AM, Martin Morgan <[hidden email]> wrote:
>> "hadley wickham" <[hidden email]> writes:
>> n
>>> On Sat, Jun 14, 2008 at 12:55 AM, Gabor Grothendieck
>>> <[hidden email]> wrote:
>>>> Try this:
>>>>
>>>>> library(gsubfn)
>>>>> x <- "A: 123 B: 456 C: 678"
>>>>> strapply(x, "[^ :]+[ :]|[^ :]+$")
>>>> [[1]]
>>>> [1] "A:"   "123 " "B:"   "456 " "C:"   "678"
>>
>
> Either way is fine, since I'll be stripping off the spaces later anyway.
>

Note that if you intend to strip off the delimiters anyways but still
want them to examine them you might want to make  use of the
other arguments of strapply too:

> x <- "AC: 123 BDEF: 456 CADSDFSDFSF: 6sdf:78"

> strapply(x, "([^ :]+)([ :]|$)", ~ c(...), b= -2)
[[1]]
 [1] "AC"          ":"           "123"         " "           "BDEF"
 [6] ":"           "456"         " "           "CADSDFSDFSF" ":"
[11] "6sdf"        ":"           "78"          ""

That returns the match followed by the delimiter as separate
strings which can be reshaped into an n x 2 matrix.

Or, all in one strapply:

> strapply(x, "([^ :]+)([ :]|$)", FUN = ~ c(...), b= -2, simplify = ~ matrix(x, nc = 2, byrow = TRUE))
     [,1]          [,2]
[1,] "AC"          ":"
[2,] "123"         " "
[3,] "BDEF"        ":"
[4,] "456"         " "
[5,] "CADSDFSDFSF" ":"
[6,] "6sdf"        ":"
[7,] "78"          ""

Here b is short for backref and b = -2 says pass only the 2 back
references (minus means only) to FUN.  It then applies the function
whose body is given by the formula, FUN, and simplifies
the result using the function whose body is given by the formula,
simlify.  It uses the free variables in the two formulae (... in the
first case and x in the second case) to construct the formal
arguments of these functions.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.