strcapture enhancement

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

strcapture enhancement

R devel mailing list
The new strcapture function in R-devel is handy, capturing
the matches to the parenthesized subpatterns in a regular
expression in the columns of a data.frame, whose column
names and classes are given by the 'proto' argument.  E.g.,

> p1 <- data.frame(Name="", Number=0)
> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
20"), proto=p1))
'data.frame':   2 obs. of  2 variables:
 $ Name  : Factor w/ 2 levels "Three","Twenty": 1 2
 $ Number: num  3 20

I think it would be even nicer if it constructed its data.frame
using the check.names=FALSE and stringsAsFactors=FALSE
arguments.  Then the names and types specified in the proto
argument would be respected instead of changing them as
in the following example

> p2 <- data.frame("The Name"="", "The Number"=0, stringsAsFactors=FALSE,
check.names=FALSE)
> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
20"), proto=p2))
'data.frame':   2 obs. of  2 variables:
 $ The.Name  : Factor w/ 2 levels "Three","Twenty": 1 2
 $ The.Number: num  3 20


Bill Dunlap
TIBCO Software
wdunlap tibco.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: strcapture enhancement

Gabor Grothendieck
Note that read.pattern in gsubfn does accept stringsAsFactors = FALSE,
e.g. using your input lines and pattern:

library(gsubfn)
Lines <- c("Three 3", "Twenty 20")
pat <- "([[:alpha:]]*) +([[:digit:]]*)"

s2 <- read.pattern(text = Lines, pattern = pat, stringsAsFactors = FALSE,
 col.names = c("Name", "Number"))

giving:

> str(s2)
'data.frame':   2 obs. of  2 variables:
 $ Name  : chr  "Three" "Twenty"
 $ Number: int  3 20


On Wed, Sep 21, 2016 at 2:06 PM, William Dunlap via R-devel
<[hidden email]> wrote:

> The new strcapture function in R-devel is handy, capturing
> the matches to the parenthesized subpatterns in a regular
> expression in the columns of a data.frame, whose column
> names and classes are given by the 'proto' argument.  E.g.,
>
>> p1 <- data.frame(Name="", Number=0)
>> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
> 20"), proto=p1))
> 'data.frame':   2 obs. of  2 variables:
>  $ Name  : Factor w/ 2 levels "Three","Twenty": 1 2
>  $ Number: num  3 20
>
> I think it would be even nicer if it constructed its data.frame
> using the check.names=FALSE and stringsAsFactors=FALSE
> arguments.  Then the names and types specified in the proto
> argument would be respected instead of changing them as
> in the following example
>
>> p2 <- data.frame("The Name"="", "The Number"=0, stringsAsFactors=FALSE,
> check.names=FALSE)
>> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
> 20"), proto=p2))
> 'data.frame':   2 obs. of  2 variables:
>  $ The.Name  : Factor w/ 2 levels "Three","Twenty": 1 2
>  $ The.Number: num  3 20
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: strcapture enhancement

Michael Lawrence-3
In reply to this post by R devel mailing list
Thanks for the suggestion. Checked in that change.

Michael

On Wed, Sep 21, 2016 at 11:06 AM, William Dunlap via R-devel
<[hidden email]> wrote:

> The new strcapture function in R-devel is handy, capturing
> the matches to the parenthesized subpatterns in a regular
> expression in the columns of a data.frame, whose column
> names and classes are given by the 'proto' argument.  E.g.,
>
>> p1 <- data.frame(Name="", Number=0)
>> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
> 20"), proto=p1))
> 'data.frame':   2 obs. of  2 variables:
>  $ Name  : Factor w/ 2 levels "Three","Twenty": 1 2
>  $ Number: num  3 20
>
> I think it would be even nicer if it constructed its data.frame
> using the check.names=FALSE and stringsAsFactors=FALSE
> arguments.  Then the names and types specified in the proto
> argument would be respected instead of changing them as
> in the following example
>
>> p2 <- data.frame("The Name"="", "The Number"=0, stringsAsFactors=FALSE,
> check.names=FALSE)
>> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
> 20"), proto=p2))
> 'data.frame':   2 obs. of  2 variables:
>  $ The.Name  : Factor w/ 2 levels "Three","Twenty": 1 2
>  $ The.Number: num  3 20
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel