regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

OmarGon
Hi, I need help for cleaning this:

"[2440810] / www.tinyurl.com/hgaco4fha3"

My desired output is:

"[2440810] / tinyurl".

My attemps:

stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"

b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
first dot.

b <- sub('[.].*', '', b) #clean from ".com" until the end.

b #returns ""[2440810] / www"

Thank you.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

Ulrik Stervbo-2
Hi Omar,

you are almost there.... but! Your first substitution looks 'www' as the
start of the line followed by anything (which then do nothing), so your
second substitution removes everything from the first '.' to be found
(which is the one after www).

What you want to do is
x <- "[2440810] / www.tinyurl.com/hgaco4fha3"

y <- sub('www\\.', '', x) # Note the escape of '.'
y <- sub('\\..*', '', y)
y

Altrenatively, all in one (if all addresses are .com)
gsub("(www\\.|\\.com.*)", "", x)

And the same using stringr
library(stringr)
x %>% str_replace_all("(www\\.|\\.com.*)", "")

HTH
Ulrik


On Wed, 21 Feb 2018 at 06:20 Omar André Gonzáles Díaz <
[hidden email]> wrote:

> Hi, I need help for cleaning this:
>
> "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> My desired output is:
>
> "[2440810] / tinyurl".
>
> My attemps:
>
> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
> first dot.
>
> b <- sub('[.].*', '', b) #clean from ".com" until the end.
>
> b #returns ""[2440810] / www"
>
> Thank you.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

Bert Gunter-2
In reply to this post by OmarGon
These are always kind of fun, not least because of the variety of different
replies that "work" at least somewhat. Here's mine:

> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"

> sub("^(.+)www\\.(.+)\\.com.+","\\1\\2",stringa)
[1] "[2440810] / tinyurl"

Note the use of doubled backslashes to escape the regex metacharacters. See
?regexp for details.

Cheers,
Bert





On Tue, Feb 20, 2018 at 9:19 PM, Omar André Gonzáles Díaz <
[hidden email]> wrote:

> Hi, I need help for cleaning this:
>
> "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> My desired output is:
>
> "[2440810] / tinyurl".
>
> My attemps:
>
> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
> first dot.
>
> b <- sub('[.].*', '', b) #clean from ".com" until the end.
>
> b #returns ""[2440810] / www"
>
> Thank you.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.