A weird behaviour of strsplit?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

A weird behaviour of strsplit?

IAGO GINÉ VÁZQUEZ
Hi all,

In the help of strsplit one can read

split   character vector (or object which can be coerced to such) containing regular expression<http://127.0.0.1:39783/help/library/base/help/regular%20expression>(s) (unless fixed = TRUE) to use for splitting. If empty matches occur, in particular if split has length 0, x is split into single characters. Ifsplit has length greater than 1, it is re-cycled along x.

Taking into account that split is said to be a vector (not a length 1 vector) and the last claim above, I would expect that the output of


strsplit("3:4", split = c(",",":"), fixed = TRUE)

was the same than the output of

strsplit("3:4", split = c(":"), fixed = TRUE)

that is, splitting by "," (without effect in this example) and also by ":"

[[1]]
[1] "3" "4"

But, instead, I get
[[1]]
[1] "3:4"

Am I wrongly understanding the help? Is it an expected output?
I tried with R 3.6.1 for Windows (10).

Thank you!
Iago


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A weird behaviour of strsplit?

Duncan Murdoch-2
On 18/12/2019 9:42 a.m., IAGO GINÉ VÁZQUEZ wrote:

> Hi all,
>
> In the help of strsplit one can read
>
> split   character vector (or object which can be coerced to such) containing regular expression<http://127.0.0.1:39783/help/library/base/help/regular%20expression>(s) (unless fixed = TRUE) to use for splitting. If empty matches occur, in particular if split has length 0, x is split into single characters. Ifsplit has length greater than 1, it is re-cycled along x.
>
> Taking into account that split is said to be a vector (not a length 1 vector) and the last claim above, I would expect that the output of
>
>
> strsplit("3:4", split = c(",",":"), fixed = TRUE)
>
> was the same than the output of
>
> strsplit("3:4", split = c(":"), fixed = TRUE)
>
> that is, splitting by "," (without effect in this example) and also by ":"
>
> [[1]]
> [1] "3" "4"
>
> But, instead, I get
> [[1]]
> [1] "3:4"
>
> Am I wrongly understanding the help? Is it an expected output?
> I tried with R 3.6.1 for Windows (10).

Yes, you are misunderstanding the help.  Your input x has length 1, so
only the first element of split will be used.  If you wanted to use
both, you would need a longer x.  For example,

 > strsplit(c("1:2", "3:4"), split=c(",", ":"), fixed=TRUE)
[[1]]
[1] "1:2"

[[2]]
[1] "3" "4"

The first element is split using "," -- since there are none, there's no
splitting done.  The second element is split using ":".

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A weird behaviour of strsplit?

Pages, Herve
The fact that strsplit() doesn't say anything about 'split' being longer
than 'x' adds to the confusion:

   > strsplit(c("xAy", "xxByB", "xCyCCz"), split=c("A", "B", "C", "D"))
   [[1]]
   [1] "x" "y"

   [[2]]
   [1] "xx" "y"

   [[3]]
   [1] "x" "y" ""  "z"

A warning (or error) would go a long way in helping the user realize
they're doing something wrong.

No warning either when 'split' is shorter than 'x' but the length of the
latter is not a multiple of the length of the former:

   > strsplit(c("xAy", "xxByB", "xCyCCz"), split=c("A", "B"))
   [[1]]
   [1] "x" "y"

   [[2]]
   [1] "xx" "y"

   [[3]]
   [1] "xCyCCz"

Which is also unexpected given that most binary operations do issue a
warning in this case (e.g. 11:13 * 1:2).

H.


On 12/18/19 06:48, Duncan Murdoch wrote:

> On 18/12/2019 9:42 a.m., IAGO GINÉ VÁZQUEZ wrote:
>> Hi all,
>>
>> In the help of strsplit one can read
>>
>> split   character vector (or object which can be coerced to such)
>> containing regular
>> expression<https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A39783_help_library_base_help_regular-2520expression&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8oX1lQmqWY3lK0RSHzCrjkg95jmR7nr4Q0GU3Nw13qA&s=Tfpsttj1v1lIOp9QlfoqGJ1UsKCFOndwgmaNd6XT64s&e= 
>> >(s) (unless fixed = TRUE) to use for splitting. If empty matches
>> occur, in particular if split has length 0, x is split into single
>> characters. Ifsplit has length greater than 1, it is re-cycled along x.
>>
>> Taking into account that split is said to be a vector (not a length 1
>> vector) and the last claim above, I would expect that the output of
>>
>>
>> strsplit("3:4", split = c(",",":"), fixed = TRUE)
>>
>> was the same than the output of
>>
>> strsplit("3:4", split = c(":"), fixed = TRUE)
>>
>> that is, splitting by "," (without effect in this example) and also by
>> ":"
>>
>> [[1]]
>> [1] "3" "4"
>>
>> But, instead, I get
>> [[1]]
>> [1] "3:4"
>>
>> Am I wrongly understanding the help? Is it an expected output?
>> I tried with R 3.6.1 for Windows (10).
>
> Yes, you are misunderstanding the help.  Your input x has length 1, so
> only the first element of split will be used.  If you wanted to use
> both, you would need a longer x.  For example,
>
>  > strsplit(c("1:2", "3:4"), split=c(",", ":"), fixed=TRUE)
> [[1]]
> [1] "1:2"
>
> [[2]]
> [1] "3" "4"
>
> The first element is split using "," -- since there are none, there's no
> splitting done.  The second element is split using ":".
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8oX1lQmqWY3lK0RSHzCrjkg95jmR7nr4Q0GU3Nw13qA&s=9m5muon8TUVCJdnvZtnyuxUQ88pc7qHCUsC6JGDF1qM&e= 
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel