Help with gsub function

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with gsub function

Bill Poling
Good afternoon.

sessionInfo()
#R version 3.5.3 (2019-03-11)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows >= 8 x64 (build 9200)

I am using gsub function to remove a hyphen in a 9 character column of values in order to convert it to integer.

Works fine except where the second segment has a leading 0, then it is eliminating the 0

Example "73-0700090" becomes " 73700090"
                 "77-0633896" becomes "77633896"

Is there a remedy for this?

tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")

head(tb2a$TID,n=10)
 [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"
> head(tb2a$TID2,n=10)
 [1] "111352310" "452711804" "356001540" "77633896"  "621762545" "611029768" "73700090"  "47376604"  "47486026"  "383833117"

I have googled the problem and have not found a solution.

http://www.endmemo.com/program/R/gsub.php
http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html


Thank you.

WHP

Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

Ivan Krylov
On Fri, 15 Mar 2019 19:45:27 +0000
Bill Poling <[hidden email]> wrote:

Hello Bill,

> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")

Is the pattern supposed to mean something besides the "-" you want to
remove? For the problem you describe, pattern="-" should be enough. It
should locate all hyphens in the string and replace them with empty
strings, i.e. remove them.

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

plangfelder
In reply to this post by Bill Poling
If you want to remove just the hyphen, why not do

sub("-", "", tb2a$TID)

sub("-", "", "73-017323")
[1] "73017323"

Am I missing something?

Peter

On Fri, Mar 15, 2019 at 12:46 PM Bill Poling <[hidden email]> wrote:

>
> Good afternoon.
>
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows >= 8 x64 (build 9200)
>
> I am using gsub function to remove a hyphen in a 9 character column of values in order to convert it to integer.
>
> Works fine except where the second segment has a leading 0, then it is eliminating the 0
>
> Example "73-0700090" becomes " 73700090"
>                  "77-0633896" becomes "77633896"
>
> Is there a remedy for this?
>
> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")
>
> head(tb2a$TID,n=10)
>  [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"
> > head(tb2a$TID2,n=10)
>  [1] "111352310" "452711804" "356001540" "77633896"  "621762545" "611029768" "73700090"  "47376604"  "47486026"  "383833117"
>
> I have googled the problem and have not found a solution.
>
> http://www.endmemo.com/program/R/gsub.php
> http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html
>
>
> Thank you.
>
> WHP
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

Jeff Newmiller
In reply to this post by Bill Poling
Your pattern seems ... way overboard? Why not

gsub("-", "", tb2a$TID)

On March 15, 2019 12:45:27 PM PDT, Bill Poling <[hidden email]> wrote:

>Good afternoon.
>
>sessionInfo()
>#R version 3.5.3 (2019-03-11)
>#Platform: x86_64-w64-mingw32/x64 (64-bit)
>#Running under: Windows >= 8 x64 (build 9200)
>
>I am using gsub function to remove a hyphen in a 9 character column of
>values in order to convert it to integer.
>
>Works fine except where the second segment has a leading 0, then it is
>eliminating the 0
>
>Example "73-0700090" becomes " 73700090"
>                 "77-0633896" becomes "77633896"
>
>Is there a remedy for this?
>
>tb2a$TID2 <-
>
>head(tb2a$TID,n=10)
>[1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545"
>"61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"
>> head(tb2a$TID2,n=10)
>[1] "111352310" "452711804" "356001540" "77633896"  "621762545"
>"611029768" "73700090"  "47376604"  "47486026"  "383833117"
>
>I have googled the problem and have not found a solution.
>
>http://www.endmemo.com/program/R/gsub.php
>http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html
>
>
>Thank you.
>
>WHP
>
>Confidentiality Notice This message is sent from Zelis.
>...{{dropped:13}}
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

Bill Poling
In reply to this post by plangfelder
Good morning Peter, yes that works fine. My attempt was based on a google search that looked promising but was obviously more complicated than it needed to be.

Thank you.

WHP

From: Peter Langfelder <[hidden email]>
Sent: Friday, March 15, 2019 3:53 PM
To: Bill Poling <[hidden email]>
Cc: r-help ([hidden email]) <[hidden email]>
Subject: Re: [R] Help with gsub function

If you want to remove just the hyphen, why not do

sub("-", "", tb2a$TID)

sub("-", "", "73-017323")
[1] "73017323"

Am I missing something?

Peter

On Fri, Mar 15, 2019 at 12:46 PM Bill Poling <mailto:[hidden email]> wrote:

>
> Good afternoon.
>
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows >= 8 x64 (build 9200)
>
> I am using gsub function to remove a hyphen in a 9 character column of values in order to convert it to integer.
>
> Works fine except where the second segment has a leading 0, then it is eliminating the 0
>
> Example "73-0700090" becomes " 73700090"
> "77-0633896" becomes "77633896"
>
> Is there a remedy for this?
>
> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")
>
> head(tb2a$TID,n=10)
> [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"
> > head(tb2a$TID2,n=10)
> [1] "111352310" "452711804" "356001540" "77633896" "621762545" "611029768" "73700090" "47376604" "47486026" "383833117"
>
> I have googled the problem and have not found a solution.
>
> http://www.endmemo.com/program/R/gsub.php
> http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html
>
>
> Thank you.
>
> WHP
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

Bill Poling
In reply to this post by Jeff Newmiller
Yep, thank you Jeff, consequence of the first url I landed on asking how to do it and rushing off.

All set now.

Appreciate your help.

WHP

From: Jeff Newmiller <[hidden email]>
Sent: Friday, March 15, 2019 4:00 PM
To: [hidden email]; Bill Poling <[hidden email]>; r-help ([hidden email]) <[hidden email]>
Subject: Re: [R] Help with gsub function

Your pattern seems ... way overboard? Why not

gsub("-", "", tb2a$TID)

On March 15, 2019 12:45:27 PM PDT, Bill Poling <mailto:[hidden email]> wrote:

>Good afternoon.
>
>sessionInfo()
>#R version 3.5.3 (2019-03-11)
>#Platform: x86_64-w64-mingw32/x64 (64-bit)
>#Running under: Windows >= 8 x64 (build 9200)
>
>I am using gsub function to remove a hyphen in a 9 character column of
>values in order to convert it to integer.
>
>Works fine except where the second segment has a leading 0, then it is
>eliminating the 0
>
>Example "73-0700090" becomes " 73700090"
> "77-0633896" becomes "77633896"
>
>Is there a remedy for this?
>
>tb2a$TID2 <-
>
>head(tb2a$TID,n=10)
>[1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545"
>"61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"
>> head(tb2a$TID2,n=10)
>[1] "111352310" "452711804" "356001540" "77633896" "621762545"
>"611029768" "73700090" "47376604" "47486026" "383833117"
>
>I have googled the problem and have not found a solution.
>
>http://www.endmemo.com/program/R/gsub.php
>http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html
>
>
>Thank you.
>
>WHP
>
>Confidentiality Notice This message is sent from Zelis.
>...{{dropped:13}}
>
>______________________________________________
>mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

S Ellison-2
In reply to this post by Bill Poling
> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")

Just to add something on why this didn't work ...

It looks like you were trying to match a hyphen followed by a number up to seven digits. by mistake(?) you gave the digit range as [0-0] so it would repmatch a hyphen followed by between none and seven zeroes. When it met "-0" it matched that.
And because it was gsub, it replaced what it matched.
If you'd given it the right digit range it would have replaced the whole of the number.

If you _really_ wanted to do that kind of thing (control the following pattern), you'd have needed something like (untested)
gsub("-([0-0]{0,7})", "\\1", tb2a$TID)

#The () means 'remember this bit"; the "\\1" means "put the first thing you remember here". And it needs to be "\\1" because that becomes "\1" for the grep parser.


Steve E


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with gsub function

Bill Poling
Good morning Steve. Terrific, so kind of you to follow-up.

I will add that to my ever growing R bag of tips and tricks.

Cheers.

WHP

William H. Poling, Ph.D., MPH | Manager, Revenue Development
Data Intelligence & Analytics
Zelis Healthcare


-----Original Message-----
From: S Ellison <[hidden email]>
Sent: Monday, March 18, 2019 8:32 AM
To: Bill Poling <[hidden email]>; r-help ([hidden email]) <[hidden email]>
Subject: RE: Help with gsub function

> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")

Just to add something on why this didn't work ...

It looks like you were trying to match a hyphen followed by a number up to seven digits. by mistake(?) you gave the digit range as [0-0] so it would repmatch a hyphen followed by between none and seven zeroes. When it met "-0" it matched that.
And because it was gsub, it replaced what it matched.
If you'd given it the right digit range it would have replaced the whole of the number.

If you _really_ wanted to do that kind of thing (control the following pattern), you'd have needed something like (untested) gsub("-([0-0]{0,7})", "\\1", tb2a$TID)

#The () means 'remember this bit"; the "\\1" means "put the first thing you remember here". And it needs to be "\\1" because that becomes "\1" for the grep parser.


Steve E


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:22}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.