remove a row

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

remove a row

Ashta
Hi all,  I want to remove a row based on a condition in one of the
variables from a data frame.
When we split this string it should be composed of 3-2- 5 format (3
digits numeric, 2 characters and 5 digits  numeric).  Like
area code -region-numeric. The max length of the area code should be
3, the  max length of region be should be 2,  followed by a max length
of  5  numeric digits.  The are code  can  be 1 digit, or 2 digits or
3 digits  but not more than three digits.  So  the  max length of this
variable is 10.  Anything outside of this pattern should be excluded.
As an example

dat <-read.table(text=" rown  varx
1   9F209
2  FL250
3  2F250
4  102250
5  102FL
6   102
7  1212FL250
8  121FL50",header=TRUE,stringsAsFactors=F)

1  9F209           # keep
2  FL250           # remove, no area code
3   2F250          # keep
4  102250         # remove , no region code
5  102FL           # remove , no numeric after region code
6   102              # remove ,  no region code and numeric
7  1212FL250  #remove, area code is more than three digits
8  121FL50      # Keep

The desired output should be
1   9F209
3   2F250
8  121FL50

How do I do this in an efficient way?

Thank you in advance

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: remove a row

Bert Gunter-2
Use regular expressions.

See ?regexp  and ?grep

Using your example:

> grep("^[[:digit:]]{1,3}[[:alpha:]]{1,2}[[:digit:]]{1,5}$",dat$varx,value
= TRUE)
[1] "9F209"   "2F250"   "121FL50"

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Nov 28, 2019 at 3:17 PM Ashta <[hidden email]> wrote:

> Hi all,  I want to remove a row based on a condition in one of the
> variables from a data frame.
> When we split this string it should be composed of 3-2- 5 format (3
> digits numeric, 2 characters and 5 digits  numeric).  Like
> area code -region-numeric. The max length of the area code should be
> 3, the  max length of region be should be 2,  followed by a max length
> of  5  numeric digits.  The are code  can  be 1 digit, or 2 digits or
> 3 digits  but not more than three digits.  So  the  max length of this
> variable is 10.  Anything outside of this pattern should be excluded.
> As an example
>
> dat <-read.table(text=" rown  varx
> 1   9F209
> 2  FL250
> 3  2F250
> 4  102250
> 5  102FL
> 6   102
> 7  1212FL250
> 8  121FL50",header=TRUE,stringsAsFactors=F)
>
> 1  9F209           # keep
> 2  FL250           # remove, no area code
> 3   2F250          # keep
> 4  102250         # remove , no region code
> 5  102FL           # remove , no numeric after region code
> 6   102              # remove ,  no region code and numeric
> 7  1212FL250  #remove, area code is more than three digits
> 8  121FL50      # Keep
>
> The desired output should be
> 1   9F209
> 3   2F250
> 8  121FL50
>
> How do I do this in an efficient way?
>
> Thank you in advance
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: remove a row

Ashta
Thank you so much Bert.

Is it possible to split the varx into  three ( area code, region and
the numeric part)as a separate variable

On Thu, Nov 28, 2019 at 7:31 PM Bert Gunter <[hidden email]> wrote:

>
> Use regular expressions.
>
> See ?regexp  and ?grep
>
> Using your example:
>
> > grep("^[[:digit:]]{1,3}[[:alpha:]]{1,2}[[:digit:]]{1,5}$",dat$varx,value = TRUE)
> [1] "9F209"   "2F250"   "121FL50"
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Nov 28, 2019 at 3:17 PM Ashta <[hidden email]> wrote:
>>
>> Hi all,  I want to remove a row based on a condition in one of the
>> variables from a data frame.
>> When we split this string it should be composed of 3-2- 5 format (3
>> digits numeric, 2 characters and 5 digits  numeric).  Like
>> area code -region-numeric. The max length of the area code should be
>> 3, the  max length of region be should be 2,  followed by a max length
>> of  5  numeric digits.  The are code  can  be 1 digit, or 2 digits or
>> 3 digits  but not more than three digits.  So  the  max length of this
>> variable is 10.  Anything outside of this pattern should be excluded.
>> As an example
>>
>> dat <-read.table(text=" rown  varx
>> 1   9F209
>> 2  FL250
>> 3  2F250
>> 4  102250
>> 5  102FL
>> 6   102
>> 7  1212FL250
>> 8  121FL50",header=TRUE,stringsAsFactors=F)
>>
>> 1  9F209           # keep
>> 2  FL250           # remove, no area code
>> 3   2F250          # keep
>> 4  102250         # remove , no region code
>> 5  102FL           # remove , no numeric after region code
>> 6   102              # remove ,  no region code and numeric
>> 7  1212FL250  #remove, area code is more than three digits
>> 8  121FL50      # Keep
>>
>> The desired output should be
>> 1   9F209
>> 3   2F250
>> 8  121FL50
>>
>> How do I do this in an efficient way?
>>
>> Thank you in advance
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: remove a row

Bert Gunter-2
Of course! Use regexec() and regmatches()

>
regmatches(dat$varx,regexec("(^[[:digit:]]{1,3})([[:alpha:]]{1,2})([[:digit:]]{1,5}$)",dat$varx))
[[1]]
[1] "9F209" "9"     "F"     "209"

[[2]]
character(0)

[[3]]
[1] "2F250" "2"     "F"     "250"

[[4]]
character(0)

[[5]]
character(0)

[[6]]
character(0)

[[7]]
character(0)

[[8]]
[1] "121FL50" "121"     "FL"      "50"

The list components are character(0) for no match, otherwise a character
vector with the whole text entry first, then the 1st, 2nd, and 3rd strings
matching the 1st, 2nd, and 3rd parenthesized subexpressions of the pattern.
These correspond to area code, region code, and your 3rd numeric of course.
I leave it to you to extract what you want from this list, e.g via lapply().

For details, see the Help pages for the two functions.

-- Bert

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.