how to extract strings in any column and in any row that start with

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to extract strings in any column and in any row that start with

anikaM
Hello,

I have a data frame:

> dim(tot)
[1] 502536   1093

How would I extract from it all strings that start with E10?

I know how to extract all rows that contain with E10
df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> dim(df0)
[1] 5105 1093

but I just need a vector of strings that start with E10...
it would look something like this:

[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"

Thanks
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

Jeff Newmiller
Read about regular expressions... they are extremely useful.

df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

It is bad form not to put spaces around the <- assignment.


On May 15, 2020 10:00:04 AM PDT, Ana Marija <[hidden email]> wrote:

>Hello,
>
>I have a data frame:
>
>> dim(tot)
>[1] 502536   1093
>
>How would I extract from it all strings that start with E10?
>
>I know how to extract all rows that contain with E10
>df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>> dim(df0)
>[1] 5105 1093
>
>but I just need a vector of strings that start with E10...
>it would look something like this:
>
>[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>
>Thanks
>Ana
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

anikaM
Hello,

this command was running for more than 2 hours
grep("E10",tot,value=T)
and no output

and this command
df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

gave me a subset (a data frame) of tot where ^E10

what I need is just a vector or all values in tot which start with E10.

Thanks
Ana

On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
<[hidden email]> wrote:

>
> Read about regular expressions... they are extremely useful.
>
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
> It is bad form not to put spaces around the <- assignment.
>
>
> On May 15, 2020 10:00:04 AM PDT, Ana Marija <[hidden email]> wrote:
> >Hello,
> >
> >I have a data frame:
> >
> >> dim(tot)
> >[1] 502536   1093
> >
> >How would I extract from it all strings that start with E10?
> >
> >I know how to extract all rows that contain with E10
> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> >> dim(df0)
> >[1] 5105 1093
> >
> >but I just need a vector of strings that start with E10...
> >it would look something like this:
> >
> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
> >
> >Thanks
> >Ana
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

Jeff Newmiller
If you want to treat your data frame as if it were a vector, then convert it to a vector before you give it to grep.

unlist(tot)

On May 15, 2020 12:24:17 PM PDT, Ana Marija <[hidden email]> wrote:

>Hello,
>
>this command was running for more than 2 hours
>grep("E10",tot,value=T)
>and no output
>
>and this command
>df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
>gave me a subset (a data frame) of tot where ^E10
>
>what I need is just a vector or all values in tot which start with E10.
>
>Thanks
>Ana
>
>On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
><[hidden email]> wrote:
>>
>> Read about regular expressions... they are extremely useful.
>>
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> It is bad form not to put spaces around the <- assignment.
>>
>>
>> On May 15, 2020 10:00:04 AM PDT, Ana Marija
><[hidden email]> wrote:
>> >Hello,
>> >
>> >I have a data frame:
>> >
>> >> dim(tot)
>> >[1] 502536   1093
>> >
>> >How would I extract from it all strings that start with E10?
>> >
>> >I know how to extract all rows that contain with E10
>> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>> >> dim(df0)
>> >[1] 5105 1093
>> >
>> >but I just need a vector of strings that start with E10...
>> >it would look something like this:
>> >
>> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>> >
>> >Thanks
>> >Ana
>> >
>> >______________________________________________
>> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

cpolwart-2
This is almost certainly not the most efficient way:

tot <- data.frame(v1 = paste0(LETTERS[seq(1:5)],seq(1:10)),
              v2 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by =
1)),
              v3 = paste0(LETTERS[seq(1:5)],seq(from = 111, to=120, by =
1)),
              v4 = paste0(LETTERS[seq(1:5)],seq(from = 121, to=130, by =
1)),
              v5 = paste0(LETTERS[seq(1:5)],seq(from = 131, to=140, by =
1)),
              v6 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by =
1))
              )

# set a variable to hold the result
myResult <- NULL

# iterate through each variable
for (v in 1:length(tot[1,])) {
   thisResult <- as.character(tot[grepl ('^E10', tot[,v]),v])
   myResult <- c(myResult, thisResult)
}

myResult <- unique( myResult )


===

Indeed as I wrote this Jeff has popped along with unlist!

Using my example above:

unique ( as.character( unlist (tot) )[grepl ('^E10', as.character(
unlist (tot) ) )] )

does what you wanted (you may not need the as.characters if you are on R
4.o, or if your df has chars rather than factors.

On 2020-05-15 21:34, Jeff Newmiller wrote:

> If you want to treat your data frame as if it were a vector, then
> convert it to a vector before you give it to grep.
>
> unlist(tot)
>
> On May 15, 2020 12:24:17 PM PDT, Ana Marija
> <[hidden email]> wrote:
>> Hello,
>>
>> this command was running for more than 2 hours
>> grep("E10",tot,value=T)
>> and no output
>>
>> and this command
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> gave me a subset (a data frame) of tot where ^E10
>>
>> what I need is just a vector or all values in tot which start with
>> E10.
>>
>> Thanks
>> Ana
>>
>> On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
>> <[hidden email]> wrote:
>>>
>>> Read about regular expressions... they are extremely useful.
>>>
>>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>>
>>> It is bad form not to put spaces around the <- assignment.
>>>
>>>
>>> On May 15, 2020 10:00:04 AM PDT, Ana Marija
>> <[hidden email]> wrote:
>>> >Hello,
>>> >
>>> >I have a data frame:
>>> >
>>> >> dim(tot)
>>> >[1] 502536   1093
>>> >
>>> >How would I extract from it all strings that start with E10?
>>> >
>>> >I know how to extract all rows that contain with E10
>>> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>>> >> dim(df0)
>>> >[1] 5105 1093
>>> >
>>> >but I just need a vector of strings that start with E10...
>>> >it would look something like this:
>>> >
>>> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>>> >
>>> >Thanks
>>> >Ana
>>> >
>>> >______________________________________________
>>> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> >https://stat.ethz.ch/mailman/listinfo/r-help
>>> >PLEASE do read the posting guide
>>> >http://www.R-project.org/posting-guide.html
>>> >and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

Abby Spurdle
In reply to this post by anikaM
> How would I extract from it all strings that start with E10?

Hi Ana,

Here's a simple solution:

    x <- c ("P24601", "E101", "E102", "3.141593",
        "E101", "xE101", "e103", " E104 ")

    x [substring (x, 1, 3) == "E10"]

You' will need to replace x with another *character vector*.
(As touched on earlier, a data.frame may cause some problems).

Here's some variations:

    unique (x [substring (x, 1, 3) == "E10"])

    y <- toupper (x)
    y [substring (y, 1, 3) == "E10"]

    y <- trimws (x)
    y [substring (y, 1, 3) == "E10"]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

Rui Barradas
In reply to this post by anikaM
Hello,

I have tried several options and with large dataframes this one was the
fastest (in my tests, of the ones I have tried).


s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))


Then unlist(s1).
A close second (15% slower) was


s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]


grep/unlist was 3.7 times slower:


grep("^E10", unlist(tot), value = TRUE)


Hope this helps,

Rui Barradas

Às 20:24 de 15/05/20, Ana Marija escreveu:

> Hello,
>
> this command was running for more than 2 hours
> grep("E10",tot,value=T)
> and no output
>
> and this command
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
> gave me a subset (a data frame) of tot where ^E10
>
> what I need is just a vector or all values in tot which start with E10.
>
> Thanks
> Ana
>
> On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> <[hidden email]> wrote:
>>
>> Read about regular expressions... they are extremely useful.
>>
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> It is bad form not to put spaces around the <- assignment.
>>
>>
>> On May 15, 2020 10:00:04 AM PDT, Ana Marija <[hidden email]> wrote:
>>> Hello,
>>>
>>> I have a data frame:
>>>
>>>> dim(tot)
>>> [1] 502536   1093
>>>
>>> How would I extract from it all strings that start with E10?
>>>
>>> I know how to extract all rows that contain with E10
>>> df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>>>> dim(df0)
>>> [1] 5105 1093
>>>
>>> but I just need a vector of strings that start with E10...
>>> it would look something like this:
>>>
>>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract strings in any column and in any row that start with

anikaM
Hi Rui,

thank you so much that is exactly what I needed!

Cheers,
Ana

On Fri, May 15, 2020 at 5:12 PM Rui Barradas <[hidden email]> wrote:

>
> Hello,
>
> I have tried several options and with large dataframes this one was the
> fastest (in my tests, of the ones I have tried).
>
>
> s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))
>
>
> Then unlist(s1).
> A close second (15% slower) was
>
>
> s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]
>
>
> grep/unlist was 3.7 times slower:
>
>
> grep("^E10", unlist(tot), value = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 20:24 de 15/05/20, Ana Marija escreveu:
> > Hello,
> >
> > this command was running for more than 2 hours
> > grep("E10",tot,value=T)
> > and no output
> >
> > and this command
> > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> >
> > gave me a subset (a data frame) of tot where ^E10
> >
> > what I need is just a vector or all values in tot which start with E10.
> >
> > Thanks
> > Ana
> >
> > On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> > <[hidden email]> wrote:
> >>
> >> Read about regular expressions... they are extremely useful.
> >>
> >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> >>
> >> It is bad form not to put spaces around the <- assignment.
> >>
> >>
> >> On May 15, 2020 10:00:04 AM PDT, Ana Marija <[hidden email]> wrote:
> >>> Hello,
> >>>
> >>> I have a data frame:
> >>>
> >>>> dim(tot)
> >>> [1] 502536   1093
> >>>
> >>> How would I extract from it all strings that start with E10?
> >>>
> >>> I know how to extract all rows that contain with E10
> >>> df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> >>>> dim(df0)
> >>> [1] 5105 1093
> >>>
> >>> but I just need a vector of strings that start with E10...
> >>> it would look something like this:
> >>>
> >>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.