help matching rows of a data frame

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

help matching rows of a data frame

Therneau, Terry M., Ph.D.
This question likely has a 1 line answer, I'm just not seeing it.  (2, 3, or 10 lines is
fine too.)

For a vector I can do group  <- match(x, unqiue(x)) to get a vector that labels each
element of x.
What is an equivalent if x is a data frame?

The result does not have to be fast: the data set will have < 100 elements.  Since this is
inside the survival package, and that package is on  the 'recommended' list, I can't
depend on any package outside the recommended list.

Terry T.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help matching rows of a data frame

Eric Berger
Hi Terry,
I take your question to mean how to label distinct rows of a data frame. If
that is not your question please clarify.
I found the row.match() function in the package prodlim that can be used to
solve this.
However since your request requires no additional dependencies I borrowed
the relevant code from the row.match function.
Here is some obfuscated code to provide your answer in one line, per your
request. (less obfuscated code just below that.

Assuming your data frame is called 'df':

df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep =
"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) )

The last column of df now contains the 'label' i.e. the row number of the
first row in df that is the same as the given row.

Somewhat less obfuscated

getLabels <- function(df) {
                          match( do.call("paste", c(df[, , drop = FALSE],
sep = "\\r")),
                                     do.call("paste", c(unique(df)[, , drop
= FALSE], sep = "\\r")) )
                     }

myDataFrame$label <- getLabels(myDataFrame)


HTH,

Eric


On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
[hidden email]> wrote:

> This question likely has a 1 line answer, I'm just not seeing it.  (2, 3,
> or 10 lines is fine too.)
>
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector that
> labels each element of x.
> What is an equivalent if x is a data frame?
>
> The result does not have to be fast: the data set will have < 100
> elements.  Since this is inside the survival package, and that package is
> on  the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> Terry T.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help matching rows of a data frame

Jeff Newmiller
"Label" is not a clear term for data frames,  but most data frames have rownames. If dta is a data frame, not a tibble,

rownames( dta )[ !duplicated( dta ) ]

Or could use row indexes directly

which( !duplicated( dta ) )
--
Sent from my phone. Please excuse my brevity.

On September 18, 2017 6:54:29 AM PDT, Eric Berger <[hidden email]> wrote:

>Hi Terry,
>I take your question to mean how to label distinct rows of a data
>frame. If
>that is not your question please clarify.
>I found the row.match() function in the package prodlim that can be
>used to
>solve this.
>However since your request requires no additional dependencies I
>borrowed
>the relevant code from the row.match function.
>Here is some obfuscated code to provide your answer in one line, per
>your
>request. (less obfuscated code just below that.
>
>Assuming your data frame is called 'df':
>
>df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep
>=
>"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r"))
>)
>
>The last column of df now contains the 'label' i.e. the row number of
>the
>first row in df that is the same as the given row.
>
>Somewhat less obfuscated
>
>getLabels <- function(df) {
>                        match( do.call("paste", c(df[, , drop = FALSE],
>sep = "\\r")),
>                                 do.call("paste", c(unique(df)[, , drop
>= FALSE], sep = "\\r")) )
>                     }
>
>myDataFrame$label <- getLabels(myDataFrame)
>
>
>HTH,
>
>Eric
>
>
>On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
>[hidden email]> wrote:
>
>> This question likely has a 1 line answer, I'm just not seeing it.
>(2, 3,
>> or 10 lines is fine too.)
>>
>> For a vector I can do group  <- match(x, unqiue(x)) to get a vector
>that
>> labels each element of x.
>> What is an equivalent if x is a data frame?
>>
>> The result does not have to be fast: the data set will have < 100
>> elements.  Since this is inside the survival package, and that
>package is
>> on  the 'recommended' list, I can't depend on any package outside the
>> recommended list.
>>
>> Terry T.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help matching rows of a data frame

K. Elo-2
In reply to this post by Therneau, Terry M., Ph.D.
Hi!
2017-09-18 07:13 -0500, Therneau, Terry M., Ph.D. wrote:
> This question likely has a 1 line answer, I'm just not seeing
> it.  (2, 3, or 10 lines is 
> fine too.)
>
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector
> that labels each 
> element of x.

Actually, you get a vector of indices matching 'unique(x)', not a
labelled vector.

> x<-c("A","B","C","A","C","D")
> group<-match(x, unique(x))
> group
[1] 1 2 3 1 3 4

> What is an equivalent if x is a data frame?

So you will generate an index where duplicated rows have the row index
of the first occurrence, right? This could work:

> x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
> group<-rownames(x)
> for (i in 1:(nrow(x)-1)) { 
     for (j in (i+1):nrow(x)) { 
        if (sum(as.numeric(x[i,]==x[j,]))==ncol(x)) { 
           group[j]<-group[i] }
     }
   }
>  group
[1] "1" "2" "3" "3" "5" "1"

HTH,
Kimmo

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help matching rows of a data frame

R help mailing list-2
In reply to this post by Therneau, Terry M., Ph.D.
You could use merge() with an ID column pasted onto the table of names, as
in

> tbl <- data.frame(FirstName=c("Abe","Abe","Bob","Chuck","Chuck"),
Surname=c("Xavier","Yates","Yates","Yates","Zapf"), Id=paste0("P",101:105))
> tbl
  FirstName Surname   Id
1       Abe  Xavier P101
2       Abe   Yates P102
3       Bob   Yates P103
4     Chuck   Yates P104
5     Chuck    Zapf P105
> merge(data.frame(FirstName=c("Abe","Chuck","Dave"),
Surname=rep("Yates",3)), tbl, all.x=TRUE)
  FirstName Surname   Id
1       Abe   Yates P102
2     Chuck   Yates P104
3      Dave   Yates <NA>


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Sep 18, 2017 at 5:13 AM, Therneau, Terry M., Ph.D. <
[hidden email]> wrote:

> This question likely has a 1 line answer, I'm just not seeing it.  (2, 3,
> or 10 lines is fine too.)
>
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector that
> labels each element of x.
> What is an equivalent if x is a data frame?
>
> The result does not have to be fast: the data set will have < 100
> elements.  Since this is inside the survival package, and that package is
> on  the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> Terry T.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help matching rows of a data frame

David Winsemius
In reply to this post by Therneau, Terry M., Ph.D.

> On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <[hidden email]> wrote:
>
> This question likely has a 1 line answer, I'm just not seeing it.  (2, 3, or 10 lines is fine too.)
>
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector that labels each element of x.
> What is an equivalent if x is a data frame?
>

In the past I've use apply with past to generate "group" identifiers:


x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))

apply(x, 1, paste, collapse=".")
[1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"


> The result does not have to be fast: the data set will have < 100 elements.  Since this is inside the survival package, and that package is on  the 'recommended' list, I can't depend on any package outside the recommended list.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help matching rows of a data frame

Bert Gunter-2
Yes. My understanding is that you want the identifier to have the same
number of rows as the data frame. A slight variant of David's solution
would then be:

do.call(paste0,x)


-- Bert



On Mon, Sep 18, 2017 at 8:29 AM, David Winsemius <[hidden email]>
wrote:

>
> > On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <
> [hidden email]> wrote:
> >
> > This question likely has a 1 line answer, I'm just not seeing it.  (2,
> 3, or 10 lines is fine too.)
> >
> > For a vector I can do group  <- match(x, unqiue(x)) to get a vector that
> labels each element of x.
> > What is an equivalent if x is a data frame?
> >
>
> In the past I've use apply with past to generate "group" identifiers:
>
>
> x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
>
> apply(x, 1, paste, collapse=".")
> [1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"
>
>
> > The result does not have to be fast: the data set will have < 100
> elements.  Since this is inside the survival package, and that package is
> on  the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.