extracting characters from string

classic Classic list List threaded Threaded
6 messages Options
yan
Reply | Threaded
Open this post in threaded view
|

extracting characters from string

yan
Dear R gurus,

 

If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
how could I extract "12ab3", which is the characters after second
underscore and before the third underscore?

 

Tons of thanks

 

yan

 

 


**********************************************************************
This email and any files transmitted with it are confide...{{dropped:10}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: extracting characters from string

Henrique Dallazuanna
Try this:

gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")

On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <[hidden email]> wrote:

> Dear R gurus,
>
>
>
> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
> how could I extract "12ab3", which is the characters after second
> underscore and before the third underscore?
>
>
>
> Tons of thanks
>
>
>
> yan
>
>
>
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confide...{{dropped:10}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: extracting characters from string

Soumendra
Hi Henrique,

I believe your solution is wrong as it is fitted to find 12ab3,
whereas Yan seems to be asking for the characters after the second
underscore and before the third underscore.

For example, gsub(".*_.*_(.*)_.*", "\\1",
"abcd_efgh_XXXXX_12ab3_dfsfd") would still yield 12ab3 even though, as
I understand it, it should have output XXXXX.

I think a straightforward solution would do the job:

strsplit("abcd_efgh_12ab3_dfsfd", "_")[[1]][3]

strsplit("abcd_efgh_XXXXX_12ab3_dfsfd", "_")[[1]][3] has the output
XXXXX, for example.

Of course, I would be wrong if Yan specifically wanted to find the
string 12ab3. But in that case, he would have been asking for matching
(and locating) that substring instead of extracting it.

Regards,

Soumendra


--
Soumendra Prasad Dhanee
Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.

[hidden email], [hidden email], [hidden email]
+91-7498076111, +91-8100428686

--
"When you understand why you dismiss all the other possible gods, you
will understand why I dismiss yours." - Stephen Roberts



On 10 February 2011 11:52, Henrique Dallazuanna <[hidden email]> wrote:

> Try this:
>
> gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")
>
> On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <[hidden email]> wrote:
>
>> Dear R gurus,
>>
>>
>>
>> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
>> how could I extract "12ab3", which is the characters after second
>> underscore and before the third underscore?
>>
>>
>>
>> Tons of thanks
>>
>>
>>
>> yan
>>
>>
>>
>>
>>
>>
>> **********************************************************************
>> This email and any files transmitted with it are confide...{{dropped:10}}
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: extracting characters from string

Henrique Dallazuanna
So, a way could be:

gsub("(.*)_(.*)_(.*)_.*", "\\3",  "abcd_efgh_XXXXX_12ab3_dfsfd")

On Thu, Feb 10, 2011 at 3:47 PM, Soumendra <[hidden email]> wrote:

> Hi Henrique,
>
> I believe your solution is wrong as it is fitted to find 12ab3,
> whereas Yan seems to be asking for the characters after the second
> underscore and before the third underscore.
>
> For example, gsub(".*_.*_(.*)_.*", "\\1",
> "abcd_efgh_XXXXX_12ab3_dfsfd") would still yield 12ab3 even though, as
> I understand it, it should have output XXXXX.
>
> I think a straightforward solution would do the job:
>
> strsplit("abcd_efgh_12ab3_dfsfd", "_")[[1]][3]
>
> strsplit("abcd_efgh_XXXXX_12ab3_dfsfd", "_")[[1]][3] has the output
> XXXXX, for example.
>
> Of course, I would be wrong if Yan specifically wanted to find the
> string 12ab3. But in that case, he would have been asking for matching
> (and locating) that substring instead of extracting it.
>
> Regards,
>
> Soumendra
>
>
> --
> Soumendra Prasad Dhanee
> Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.
>
> [hidden email], [hidden email], [hidden email]
> +91-7498076111, +91-8100428686
>
> --
> "When you understand why you dismiss all the other possible gods, you
> will understand why I dismiss yours." - Stephen Roberts
>
>
>
> On 10 February 2011 11:52, Henrique Dallazuanna <[hidden email]> wrote:
> > Try this:
> >
> > gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")
> >
> > On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <[hidden email]> wrote:
> >
> >> Dear R gurus,
> >>
> >>
> >>
> >> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
> >> how could I extract "12ab3", which is the characters after second
> >> underscore and before the third underscore?
> >>
> >>
> >>
> >> Tons of thanks
> >>
> >>
> >>
> >> yan
> >>
> >>
> >>
> >>
> >>
> >>
> >> **********************************************************************
> >> This email and any files transmitted with it are
> confide...{{dropped:10}}
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
> >        [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: extracting characters from string

jholtman
A safer way to make sure you don't match the underscore:

> gsub("[^_]*_[^_]*_([^_]*).*", "\\1",  "abcd_efgh_XXXXX_12ab3_dfsfd")
[1] "XXXXX"


On Thu, Feb 10, 2011 at 2:06 PM, Henrique Dallazuanna <[hidden email]> wrote:

> So, a way could be:
>
> gsub("(.*)_(.*)_(.*)_.*", "\\3",  "abcd_efgh_XXXXX_12ab3_dfsfd")
>
> On Thu, Feb 10, 2011 at 3:47 PM, Soumendra <[hidden email]> wrote:
>
>> Hi Henrique,
>>
>> I believe your solution is wrong as it is fitted to find 12ab3,
>> whereas Yan seems to be asking for the characters after the second
>> underscore and before the third underscore.
>>
>> For example, gsub(".*_.*_(.*)_.*", "\\1",
>> "abcd_efgh_XXXXX_12ab3_dfsfd") would still yield 12ab3 even though, as
>> I understand it, it should have output XXXXX.
>>
>> I think a straightforward solution would do the job:
>>
>> strsplit("abcd_efgh_12ab3_dfsfd", "_")[[1]][3]
>>
>> strsplit("abcd_efgh_XXXXX_12ab3_dfsfd", "_")[[1]][3] has the output
>> XXXXX, for example.
>>
>> Of course, I would be wrong if Yan specifically wanted to find the
>> string 12ab3. But in that case, he would have been asking for matching
>> (and locating) that substring instead of extracting it.
>>
>> Regards,
>>
>> Soumendra
>>
>>
>> --
>> Soumendra Prasad Dhanee
>> Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.
>>
>> [hidden email], [hidden email], [hidden email]
>> +91-7498076111, +91-8100428686
>>
>> --
>> "When you understand why you dismiss all the other possible gods, you
>> will understand why I dismiss yours." - Stephen Roberts
>>
>>
>>
>> On 10 February 2011 11:52, Henrique Dallazuanna <[hidden email]> wrote:
>> > Try this:
>> >
>> > gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")
>> >
>> > On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <[hidden email]> wrote:
>> >
>> >> Dear R gurus,
>> >>
>> >>
>> >>
>> >> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
>> >> how could I extract "12ab3", which is the characters after second
>> >> underscore and before the third underscore?
>> >>
>> >>
>> >>
>> >> Tons of thanks
>> >>
>> >>
>> >>
>> >> yan
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> **********************************************************************
>> >> This email and any files transmitted with it are
>> confide...{{dropped:10}}
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >
>> >
>> > --
>> > Henrique Dallazuanna
>> > Curitiba-Paraná-Brasil
>> > 25° 25' 40" S 49° 16' 22" O
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: extracting characters from string

Soumendra
Well, I believe, given the original statement of the problem, that it
is philosophically wrong to use the gsub approach. What if there are
50 underscores instead of 5, and you want to extract the characters
after the 23rd underscore? By using gsub, you are trying to fight
against the pattern of underscores. By using strsplit, we are using
that pattern to our advantage. Kind of. :)

Besides, breaking it up using strsplit will also give us the option to
iterate through it, though it is not relevant it here.




--
Soumendra Prasad Dhanee
Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.

[hidden email], [hidden email], [hidden email]
+91-7498076111, +91-8100428686

--
"When you understand why you dismiss all the other possible gods, you
will understand why I dismiss yours." - Stephen Roberts



On 11 February 2011 05:55, jim holtman <[hidden email]> wrote:

> A safer way to make sure you don't match the underscore:
>
>> gsub("[^_]*_[^_]*_([^_]*).*", "\\1",  "abcd_efgh_XXXXX_12ab3_dfsfd")
> [1] "XXXXX"
>
>
> On Thu, Feb 10, 2011 at 2:06 PM, Henrique Dallazuanna <[hidden email]> wrote:
>> So, a way could be:
>>
>> gsub("(.*)_(.*)_(.*)_.*", "\\3",  "abcd_efgh_XXXXX_12ab3_dfsfd")
>>
>> On Thu, Feb 10, 2011 at 3:47 PM, Soumendra <[hidden email]> wrote:
>>
>>> Hi Henrique,
>>>
>>> I believe your solution is wrong as it is fitted to find 12ab3,
>>> whereas Yan seems to be asking for the characters after the second
>>> underscore and before the third underscore.
>>>
>>> For example, gsub(".*_.*_(.*)_.*", "\\1",
>>> "abcd_efgh_XXXXX_12ab3_dfsfd") would still yield 12ab3 even though, as
>>> I understand it, it should have output XXXXX.
>>>
>>> I think a straightforward solution would do the job:
>>>
>>> strsplit("abcd_efgh_12ab3_dfsfd", "_")[[1]][3]
>>>
>>> strsplit("abcd_efgh_XXXXX_12ab3_dfsfd", "_")[[1]][3] has the output
>>> XXXXX, for example.
>>>
>>> Of course, I would be wrong if Yan specifically wanted to find the
>>> string 12ab3. But in that case, he would have been asking for matching
>>> (and locating) that substring instead of extracting it.
>>>
>>> Regards,
>>>
>>> Soumendra
>>>
>>>
>>> --
>>> Soumendra Prasad Dhanee
>>> Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.
>>>
>>> [hidden email], [hidden email], [hidden email]
>>> +91-7498076111, +91-8100428686
>>>
>>> --
>>> "When you understand why you dismiss all the other possible gods, you
>>> will understand why I dismiss yours." - Stephen Roberts
>>>
>>>
>>>
>>> On 10 February 2011 11:52, Henrique Dallazuanna <[hidden email]> wrote:
>>> > Try this:
>>> >
>>> > gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")
>>> >
>>> > On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <[hidden email]> wrote:
>>> >
>>> >> Dear R gurus,
>>> >>
>>> >>
>>> >>
>>> >> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
>>> >> how could I extract "12ab3", which is the characters after second
>>> >> underscore and before the third underscore?
>>> >>
>>> >>
>>> >>
>>> >> Tons of thanks
>>> >>
>>> >>
>>> >>
>>> >> yan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> **********************************************************************
>>> >> This email and any files transmitted with it are
>>> confide...{{dropped:10}}
>>> >>
>>> >> ______________________________________________
>>> >> [hidden email] mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Henrique Dallazuanna
>>> > Curitiba-Paraná-Brasil
>>> > 25° 25' 40" S 49° 16' 22" O
>>> >
>>> >        [[alternative HTML version deleted]]
>>> >
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.