partial matches across rows not columns

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

partial matches across rows not columns

RCulloch
Hi R users,

I am trying to omit rows of data based on partial matches an example of my data (seal_dist) is below:

A quick break down of my coding and why I need to answer this - I am dealing with a colony of seals where for example A1 is a female with pup and A1.1 is that female's pup, the important part of the data here is DIST which tells the distance between one seal (ID) and another (TO_ID). What I want to do is take a mean for these data for a nearest neighbour analysis but I want to omit any cases where there is the distance between a female and her pup, i.e. in the previous e.g. omit rows where A1 and A1.1 occur.

I have looked at grep and pmatch but these appear to work across columns and don't appear to do what I'm looking to do,

If anyone can point me in the right direction, I'd be most greatful,

Best wishes,

Ross


    FROM TO     DIST    ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL
2      1  2  4.81803    A1  1 30  9  9      1   1 MALE1        12
3      1  3  2.53468    A1  1 30  9  9      1   1    A2         3
4      1  4  7.57332    A1  1 30  9  9      1   1  A1.1         7
5      1  1  7.57332  A1.1  1 30  9  9      7   1    A1         1
6      1  2  7.89665  A1.1  1 30  9  9      7   1 MALE1        12
7      1  3  6.47847  A1.1  1 30  9  9      7   1    A2         3
9      1  1  2.53468    A2  1 30  9  9      3   1    A1         1
10     1  2  2.59051    A2  1 30  9  9      3   1 MALE1        12
12     1  4  6.47847    A2  1 30  9  9      3   1  A1.1         7
13     1  1  4.81803 MALE1  1 30  9  9     12   1    A1         1
15     1  3  2.59051 MALE1  1 30  9  9     12   1    A2         3
16     1  4  7.89665 MALE1  1 30  9  9     12   1  A1.1         7
17     1  1  3.85359    A1  2 30  9  9      1   1 MALE1        12
19     1  3  4.88826    A1  2 30  9  9      1   1    A2         3
20     1  4  7.25773    A1  2 30  9  9      1   1  A1.1         7
21     1  1  9.96431  A1.1  2 30  9  9      7   1 MALE1        12
22     1  2  7.25773  A1.1  2 30  9  9      7   1    A1         1
23     1  3  5.71725  A1.1  2 30  9  9      7   1    A2         3
25     1  1  8.73759    A2  2 30  9  9      3   1 MALE1        12
26     1  2  4.88826    A2  2 30  9  9      3   1    A1         1
28     1  4  5.71725    A2  2 30  9  9      3   1  A1.1         7
30     1  2  3.85359 MALE1  2 30  9  9     12   1    A1         1
31     1  3  8.73759 MALE1  2 30  9  9     12   1    A2         3
32     1  4  9.96431 MALE1  2 30  9  9     12   1  A1.1         7
33     1  1  7.95399    A1  3 30  9  9      1   1 MALE1        12
35     1  3  0.60443    A1  3 30  9  9      1   1  A1.1         7
36     1  4  1.91136    A1  3 30  9  9      1   1    A2         3
37     1  1  8.29967  A1.1  3 30  9  9      7   1 MALE1        12
38     1  2  0.60443  A1.1  3 30  9  9      7   1    A1         1
40     1  4  1.43201  A1.1  3 30  9  9      7   1    A2         3
41     1  1  9.71659    A2  3 30  9  9      3   1 MALE1        12
42     1  2  1.91136    A2  3 30  9  9      3   1    A1         1
43     1  3  1.43201    A2  3 30  9  9      3   1  A1.1         7
46     1  2  7.95399 MALE1  3 30  9  9     12   1    A1         1
47     1  3  8.29967 MALE1  3 30  9  9     12   1  A1.1         7
48     1  4  9.71659 MALE1  3 30  9  9     12   1    A2         3
Reply | Threaded
Open this post in threaded view
|

Re: partial matches across rows not columns

jholtman
Is this what you are looking for:

> # assume females start with "A"
> # extract first part if female from ID
> x.id <- sub("(A[[:digit:]]+).*", "\\1", x$ID)
> # now see if this pattern matches first part of TO_ID
> x.match <- x.id == substring(x$TO_ID, 1, nchar(x.id))
> # here are the ones that would be eliminated
> x[x.match,]
   FROM TO    DIST   ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL
4     1  4 7.57332   A1  1 30  9  9      1   1  A1.1         7
5     1  1 7.57332 A1.1  1 30  9  9      7   1    A1         1
20    1  4 7.25773   A1  2 30  9  9      1   1  A1.1         7
22    1  2 7.25773 A1.1  2 30  9  9      7   1    A1         1
35    1  3 0.60443   A1  3 30  9  9      1   1  A1.1         7
38    1  2 0.60443 A1.1  3 30  9  9      7   1    A1         1
>
>


On Tue, Jun 8, 2010 at 1:43 PM, RCulloch <[hidden email]> wrote:

>
> Hi R users,
>
> I am trying to omit rows of data based on partial matches an example of my
> data (seal_dist) is below:
>
> A quick break down of my coding and why I need to answer this - I am dealing
> with a colony of seals where for example A1 is a female with pup and A1.1 is
> that female's pup, the important part of the data here is DIST which tells
> the distance between one seal (ID) and another (TO_ID). What I want to do is
> take a mean for these data for a nearest neighbour analysis but I want to
> omit any cases where there is the distance between a female and her pup,
> i.e. in the previous e.g. omit rows where A1 and A1.1 occur.
>
> I have looked at grep and pmatch but these appear to work across columns and
> don't appear to do what I'm looking to do,
>
> If anyone can point me in the right direction, I'd be most greatful,
>
> Best wishes,
>
> Ross
>
>
>    FROM TO     DIST    ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL
> 2      1  2  4.81803    A1  1 30  9  9      1   1 MALE1        12
> 3      1  3  2.53468    A1  1 30  9  9      1   1    A2         3
> 4      1  4  7.57332    A1  1 30  9  9      1   1  A1.1         7
> 5      1  1  7.57332  A1.1  1 30  9  9      7   1    A1         1
> 6      1  2  7.89665  A1.1  1 30  9  9      7   1 MALE1        12
> 7      1  3  6.47847  A1.1  1 30  9  9      7   1    A2         3
> 9      1  1  2.53468    A2  1 30  9  9      3   1    A1         1
> 10     1  2  2.59051    A2  1 30  9  9      3   1 MALE1        12
> 12     1  4  6.47847    A2  1 30  9  9      3   1  A1.1         7
> 13     1  1  4.81803 MALE1  1 30  9  9     12   1    A1         1
> 15     1  3  2.59051 MALE1  1 30  9  9     12   1    A2         3
> 16     1  4  7.89665 MALE1  1 30  9  9     12   1  A1.1         7
> 17     1  1  3.85359    A1  2 30  9  9      1   1 MALE1        12
> 19     1  3  4.88826    A1  2 30  9  9      1   1    A2         3
> 20     1  4  7.25773    A1  2 30  9  9      1   1  A1.1         7
> 21     1  1  9.96431  A1.1  2 30  9  9      7   1 MALE1        12
> 22     1  2  7.25773  A1.1  2 30  9  9      7   1    A1         1
> 23     1  3  5.71725  A1.1  2 30  9  9      7   1    A2         3
> 25     1  1  8.73759    A2  2 30  9  9      3   1 MALE1        12
> 26     1  2  4.88826    A2  2 30  9  9      3   1    A1         1
> 28     1  4  5.71725    A2  2 30  9  9      3   1  A1.1         7
> 30     1  2  3.85359 MALE1  2 30  9  9     12   1    A1         1
> 31     1  3  8.73759 MALE1  2 30  9  9     12   1    A2         3
> 32     1  4  9.96431 MALE1  2 30  9  9     12   1  A1.1         7
> 33     1  1  7.95399    A1  3 30  9  9      1   1 MALE1        12
> 35     1  3  0.60443    A1  3 30  9  9      1   1  A1.1         7
> 36     1  4  1.91136    A1  3 30  9  9      1   1    A2         3
> 37     1  1  8.29967  A1.1  3 30  9  9      7   1 MALE1        12
> 38     1  2  0.60443  A1.1  3 30  9  9      7   1    A1         1
> 40     1  4  1.43201  A1.1  3 30  9  9      7   1    A2         3
> 41     1  1  9.71659    A2  3 30  9  9      3   1 MALE1        12
> 42     1  2  1.91136    A2  3 30  9  9      3   1    A1         1
> 43     1  3  1.43201    A2  3 30  9  9      3   1  A1.1         7
> 46     1  2  7.95399 MALE1  3 30  9  9     12   1    A1         1
> 47     1  3  8.29967 MALE1  3 30  9  9     12   1  A1.1         7
> 48     1  4  9.71659 MALE1  3 30  9  9     12   1    A2         3
> --
> View this message in context: http://r.789695.n4.nabble.com/partial-matches-across-rows-not-columns-tp2247757p2247757.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial matches across rows not columns

jannis-2
In reply to this post by RCulloch
I did not go too deep into your zoology problem ;-) but as far as I
understood you, you want to omit all rows where
ID and TO_ID are A1 and A1.1, (or A2....) correct?

If the data you send us is all the data and if there do not occour any
different situations the following should be sufficient:

Transfer the vectors ID an TO_ID to values without the . and the number
following it (e.g. A1.1 -> A1):

ID.clean<-gsub("^.*[?]| .*$", "",data$ID)
TO_ID.clean<-gsub("^.*[?]| .*$", "",data$TO_ID)


And then use logical indexing:
data.clean = data[ID.clean==TO_ID.clean,]


HTH
Jannis


RCulloch schrieb:

> Hi R users,
>
> I am trying to omit rows of data based on partial matches an example of my
> data (seal_dist) is below:
>
> A quick break down of my coding and why I need to answer this - I am dealing
> with a colony of seals where for example A1 is a female with pup and A1.1 is
> that female's pup, the important part of the data here is DIST which tells
> the distance between one seal (ID) and another (TO_ID). What I want to do is
> take a mean for these data for a nearest neighbour analysis but I want to
> omit any cases where there is the distance between a female and her pup,
> i.e. in the previous e.g. omit rows where A1 and A1.1 occur.
>
> I have looked at grep and pmatch but these appear to work across columns and
> don't appear to do what I'm looking to do,
>
> If anyone can point me in the right direction, I'd be most greatful,
>
> Best wishes,
>
> Ross
>
>
>     FROM TO     DIST    ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL
> 2      1  2  4.81803    A1  1 30  9  9      1   1 MALE1        12
> 3      1  3  2.53468    A1  1 30  9  9      1   1    A2         3
> 4      1  4  7.57332    A1  1 30  9  9      1   1  A1.1         7
> 5      1  1  7.57332  A1.1  1 30  9  9      7   1    A1         1
> 6      1  2  7.89665  A1.1  1 30  9  9      7   1 MALE1        12
> 7      1  3  6.47847  A1.1  1 30  9  9      7   1    A2         3
> 9      1  1  2.53468    A2  1 30  9  9      3   1    A1         1
> 10     1  2  2.59051    A2  1 30  9  9      3   1 MALE1        12
> 12     1  4  6.47847    A2  1 30  9  9      3   1  A1.1         7
> 13     1  1  4.81803 MALE1  1 30  9  9     12   1    A1         1
> 15     1  3  2.59051 MALE1  1 30  9  9     12   1    A2         3
> 16     1  4  7.89665 MALE1  1 30  9  9     12   1  A1.1         7
> 17     1  1  3.85359    A1  2 30  9  9      1   1 MALE1        12
> 19     1  3  4.88826    A1  2 30  9  9      1   1    A2         3
> 20     1  4  7.25773    A1  2 30  9  9      1   1  A1.1         7
> 21     1  1  9.96431  A1.1  2 30  9  9      7   1 MALE1        12
> 22     1  2  7.25773  A1.1  2 30  9  9      7   1    A1         1
> 23     1  3  5.71725  A1.1  2 30  9  9      7   1    A2         3
> 25     1  1  8.73759    A2  2 30  9  9      3   1 MALE1        12
> 26     1  2  4.88826    A2  2 30  9  9      3   1    A1         1
> 28     1  4  5.71725    A2  2 30  9  9      3   1  A1.1         7
> 30     1  2  3.85359 MALE1  2 30  9  9     12   1    A1         1
> 31     1  3  8.73759 MALE1  2 30  9  9     12   1    A2         3
> 32     1  4  9.96431 MALE1  2 30  9  9     12   1  A1.1         7
> 33     1  1  7.95399    A1  3 30  9  9      1   1 MALE1        12
> 35     1  3  0.60443    A1  3 30  9  9      1   1  A1.1         7
> 36     1  4  1.91136    A1  3 30  9  9      1   1    A2         3
> 37     1  1  8.29967  A1.1  3 30  9  9      7   1 MALE1        12
> 38     1  2  0.60443  A1.1  3 30  9  9      7   1    A1         1
> 40     1  4  1.43201  A1.1  3 30  9  9      7   1    A2         3
> 41     1  1  9.71659    A2  3 30  9  9      3   1 MALE1        12
> 42     1  2  1.91136    A2  3 30  9  9      3   1    A1         1
> 43     1  3  1.43201    A2  3 30  9  9      3   1  A1.1         7
> 46     1  2  7.95399 MALE1  3 30  9  9     12   1    A1         1
> 47     1  3  8.29967 MALE1  3 30  9  9     12   1  A1.1         7
> 48     1  4  9.71659 MALE1  3 30  9  9     12   1    A2         3
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial matches across rows not columns

RCulloch
In reply to this post by RCulloch
Hi Jim and Hi Jannis,

Thanks very much to both of you for your help! Both methods work perfectly! Always good to know that there is more than one way to skin a cat when it comes to R! I will just need to get a grip on the regular expressions, it would seem.

Many thanks again for you r help,

much appreciated,

Ross