how to select all columns that contain in any of their rows a partial match for a string?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

how to select all columns that contain in any of their rows a partial match for a string?

anikaM
Hello,

I have a data frame tot which has many columns and many rows.

I am trying to find all columns that have say a value in any of their
rows that STARTS WITH: "E94"

for example there are columns like this:

> unique(tot$diagnoses_icd9_f41271_0_44)
[1] NA      "E9420"

I tried:
s=select(tot,starts_with("E94"))

but this didn't return me anything. Data type in those columns is character.

Thanks
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to select all columns that contain in any of their rows a partial match for a string?

Rui Barradas
Hello,

Try the following

cols <- sapply(tot, function(x) any(grepl("^E94", x)))

To have the column numbers,

which(cols)


Hope this helps,

Rui Barradas

Às 19:50 de 05/10/19, Ana Marija escreveu:

> Hello,
>
> I have a data frame tot which has many columns and many rows.
>
> I am trying to find all columns that have say a value in any of their
> rows that STARTS WITH: "E94"
>
> for example there are columns like this:
>
>> unique(tot$diagnoses_icd9_f41271_0_44)
> [1] NA      "E9420"
>
> I tried:
> s=select(tot,starts_with("E94"))
>
> but this didn't return me anything. Data type in those columns is character.
>
> Thanks
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to select all columns that contain in any of their rows a partial match for a string?

Rui Barradas
Hello,

Please CC the list.

The following code does what you want.

tot <- data.frame(a = c("E10123", "F123", "G4567"),
                   b = c("a123", "E112345", "b456"))

e10 <- sapply(tot, function(x) grepl("^E10", x))
e10 <- rowSums(e10) > 0
e11 <- sapply(tot, function(x) grepl("^E11", x))
e11 <- rowSums(e11) > 0

tot$newcol <- -9
tot$newcol[e10] <- 1
tot$newcol[e11] <- 2


On both cases the 2 lines sapply/rowSums can be made one with

rowSums(sapply(...)) > 0


Hope this helps,

Rui Barradas

Às 20:52 de 05/10/19, Ana Marija escreveu:

> Hi Rui,
>
> thank you so much for getting back to me.
>
> I did what you told me:
> cols <- sapply(tot, function(x) any(grepl("^E10", x)))
> a=which(cols)
>
> so this gives me name of 49 columns that have that particular string
>
> But how do I create a new column in my tot data frame (the column
> would be called "TD") which has 1 in the row where the subject
> (designated in the "eid" column) has a string which starts with "E10"
> and it has 2 if it starts with "E11" and otherwise it is -9.
>
>> head(tot)[1:3,1:3]
>        eid          sex_f31_0_0         year_of_birth_f34_0_0
> 1 1000017      Female                  1938
> 2 1000025      Female                  1951
> 3 1000038        Male                     1961
>
> Thanks you so much!
>
>
> On Sat, Oct 5, 2019 at 2:24 PM Rui Barradas <[hidden email]> wrote:
>>
>> Hello,
>>
>> Try the following
>>
>> cols <- sapply(tot, function(x) any(grepl("^E94", x)))
>>
>> To have the column numbers,
>>
>> which(cols)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Às 19:50 de 05/10/19, Ana Marija escreveu:
>>> Hello,
>>>
>>> I have a data frame tot which has many columns and many rows.
>>>
>>> I am trying to find all columns that have say a value in any of their
>>> rows that STARTS WITH: "E94"
>>>
>>> for example there are columns like this:
>>>
>>>> unique(tot$diagnoses_icd9_f41271_0_44)
>>> [1] NA      "E9420"
>>>
>>> I tried:
>>> s=select(tot,starts_with("E94"))
>>>
>>> but this didn't return me anything. Data type in those columns is character.
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to select all columns that contain in any of their rows a partial match for a string?

anikaM
Thank you so much this worked wonderfully!

On Sat, Oct 5, 2019 at 4:05 PM Rui Barradas <[hidden email]> wrote:

>
> Hello,
>
> Please CC the list.
>
> The following code does what you want.
>
> tot <- data.frame(a = c("E10123", "F123", "G4567"),
>                    b = c("a123", "E112345", "b456"))
>
> e10 <- sapply(tot, function(x) grepl("^E10", x))
> e10 <- rowSums(e10) > 0
> e11 <- sapply(tot, function(x) grepl("^E11", x))
> e11 <- rowSums(e11) > 0
>
> tot$newcol <- -9
> tot$newcol[e10] <- 1
> tot$newcol[e11] <- 2
>
>
> On both cases the 2 lines sapply/rowSums can be made one with
>
> rowSums(sapply(...)) > 0
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 20:52 de 05/10/19, Ana Marija escreveu:
> > Hi Rui,
> >
> > thank you so much for getting back to me.
> >
> > I did what you told me:
> > cols <- sapply(tot, function(x) any(grepl("^E10", x)))
> > a=which(cols)
> >
> > so this gives me name of 49 columns that have that particular string
> >
> > But how do I create a new column in my tot data frame (the column
> > would be called "TD") which has 1 in the row where the subject
> > (designated in the "eid" column) has a string which starts with "E10"
> > and it has 2 if it starts with "E11" and otherwise it is -9.
> >
> >> head(tot)[1:3,1:3]
> >        eid          sex_f31_0_0         year_of_birth_f34_0_0
> > 1 1000017      Female                  1938
> > 2 1000025      Female                  1951
> > 3 1000038        Male                     1961
> >
> > Thanks you so much!
> >
> >
> > On Sat, Oct 5, 2019 at 2:24 PM Rui Barradas <[hidden email]> wrote:
> >>
> >> Hello,
> >>
> >> Try the following
> >>
> >> cols <- sapply(tot, function(x) any(grepl("^E94", x)))
> >>
> >> To have the column numbers,
> >>
> >> which(cols)
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Às 19:50 de 05/10/19, Ana Marija escreveu:
> >>> Hello,
> >>>
> >>> I have a data frame tot which has many columns and many rows.
> >>>
> >>> I am trying to find all columns that have say a value in any of their
> >>> rows that STARTS WITH: "E94"
> >>>
> >>> for example there are columns like this:
> >>>
> >>>> unique(tot$diagnoses_icd9_f41271_0_44)
> >>> [1] NA      "E9420"
> >>>
> >>> I tried:
> >>> s=select(tot,starts_with("E94"))
> >>>
> >>> but this didn't return me anything. Data type in those columns is character.
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.