Help with regular expressions.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with regular expressions.

Rolf Turner

I want to deal with strings of the form "a.b.c" and to change (using
sub() or whatever is appropriate) the second "." to a "-", i.e. to
change "a.b.c" to "a.b-c".  I want to leave the first "." as-is.

I guess I could do a gsub(), changing all "."s to "-"s, and then do
a sub() changing the first "-" back to a ".".  But this seems very
kludgy.  There must be a sexier way.  Mustn't there?  Is there regular
expression syntax for picking out the second occurence of a particular
string?

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with regular expressions.

Bert Gunter-2
> gsub("(.*\\.[^.]*)\\.(.*)","\\1-\\2", "aa.bcv.cdg")
[1] "aa.bcv-cdg"

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Feb 8, 2021 at 6:29 PM Rolf Turner <[hidden email]> wrote:

>
> I want to deal with strings of the form "a.b.c" and to change (using
> sub() or whatever is appropriate) the second "." to a "-", i.e. to
> change "a.b.c" to "a.b-c".  I want to leave the first "." as-is.
>
> I guess I could do a gsub(), changing all "."s to "-"s, and then do
> a sub() changing the first "-" back to a ".".  But this seems very
> kludgy.  There must be a sexier way.  Mustn't there?  Is there regular
> expression syntax for picking out the second occurence of a particular
> string?
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with regular expressions.

R help mailing list-2
In reply to this post by Rolf Turner
There are many ways, Rolf. You need to look into the syntax of regular
expressions. It depends on how sure you are that the formats are exactly as
needed. Escaping the period with one or more backslashes is one way. Using
string functions is another.

Suggestion. See if you can make a regular expression that is greedy and will
match everything up to period then a period then the rest and keep the first
and third parts and replace the middle with a minus sign. Or, match five
things. Everything up to a single period, the period, everything between,
the second period, and the rest, and keep the needed parts as above.

Periods and dashes must be used carefully though. A period means match one
of almost anything so a good way to catch it is [.] which matches a single
character of only a period. Put parens around that: "([.])" and you have a
replaceable item. In your case, you may want the parens around everything
else before and after, perhaps ([^.]*[.][^.]) then [.] then  ([^.]*) as one
long string and replace it with \1-\2 or some similar notation.

There are many other variation on this theme and some are simpler if the
exact format is consistent such as 'a' being a single character or the
string being a fixed length. If you are sure the period in "a.b.c" is always
the fourth character, no RE is needed. Use string methods. Even if not, you
can use string methods to search for a period from the end backwards or
search forward once to find the first and second time starting just past it.
Then replace. Fairly straightforward and very possibly much faster.
-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Rolf Turner
Sent: Monday, February 8, 2021 9:29 PM
To: "[hidden email]" <[hidden email]>"@r-project.org
Subject: [R] Help with regular expressions.


I want to deal with strings of the form "a.b.c" and to change (using
sub() or whatever is appropriate) the second "." to a "-", i.e. to change
"a.b.c" to "a.b-c".  I want to leave the first "." as-is.

I guess I could do a gsub(), changing all "."s to "-"s, and then do a sub()
changing the first "-" back to a ".".  But this seems very kludgy.  There
must be a sexier way.  Mustn't there?  Is there regular expression syntax
for picking out the second occurence of a particular string?

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with regular expressions.

Bert Gunter-2
In reply to this post by Bert Gunter-2
Simpler, but would fail if there are more "."s beyond the second (it
changes the last one to a "-"):

> sub("(.*)\\.([^.]*)", "\\1-\\2", "aa.bcv.cdg")
[1] "aa.bcv-cdg"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Feb 8, 2021 at 6:49 PM Bert Gunter <[hidden email]> wrote:

> > gsub("(.*\\.[^.]*)\\.(.*)","\\1-\\2", "aa.bcv.cdg")
> [1] "aa.bcv-cdg"
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Feb 8, 2021 at 6:29 PM Rolf Turner <[hidden email]>
> wrote:
>
>>
>> I want to deal with strings of the form "a.b.c" and to change (using
>> sub() or whatever is appropriate) the second "." to a "-", i.e. to
>> change "a.b.c" to "a.b-c".  I want to leave the first "." as-is.
>>
>> I guess I could do a gsub(), changing all "."s to "-"s, and then do
>> a sub() changing the first "-" back to a ".".  But this seems very
>> kludgy.  There must be a sexier way.  Mustn't there?  Is there regular
>> expression syntax for picking out the second occurence of a particular
>> string?
>>
>> cheers,
>>
>> Rolf Turner
>>
>> --
>> Honorary Research Fellow
>> Department of Statistics
>> University of Auckland
>> Phone: +64-9-373-7599 ext. 88276
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with regular expressions.

Rolf Turner
In reply to this post by Bert Gunter-2

David Wolfskill's post solved my problem perfectly.  Thanks.

Thanks also to Bert Gunter and Avi Gross.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with regular expressions.

Rolf Turner

On Tue, 9 Feb 2021 17:34:00 +1300
Rolf Turner <[hidden email]> wrote:

>
> David Wolfskill's post solved my problem perfectly.  Thanks.
>
> Thanks also to Bert Gunter and Avi Gross.

Whoops.  David Wolfskill's message came to me off-list.
Sorry for the confusion.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.