

This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is
fine too.)
For a vector I can do group < match(x, unqiue(x)) to get a vector that labels each
element of x.
What is an equivalent if x is a data frame?
The result does not have to be fast: the data set will have < 100 elements. Since this is
inside the survival package, and that package is on the 'recommended' list, I can't
depend on any package outside the recommended list.
Terry T.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Terry,
I take your question to mean how to label distinct rows of a data frame. If
that is not your question please clarify.
I found the row.match() function in the package prodlim that can be used to
solve this.
However since your request requires no additional dependencies I borrowed
the relevant code from the row.match function.
Here is some obfuscated code to provide your answer in one line, per your
request. (less obfuscated code just below that.
Assuming your data frame is called 'df':
df[,ncol(df)+1] < match( do.call("paste", c(df[, , drop = FALSE], sep =
"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) )
The last column of df now contains the 'label' i.e. the row number of the
first row in df that is the same as the given row.
Somewhat less obfuscated
getLabels < function(df) {
match( do.call("paste", c(df[, , drop = FALSE],
sep = "\\r")),
do.call("paste", c(unique(df)[, , drop
= FALSE], sep = "\\r")) )
}
myDataFrame$label < getLabels(myDataFrame)
HTH,
Eric
On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
[hidden email]> wrote:
> This question likely has a 1 line answer, I'm just not seeing it. (2, 3,
> or 10 lines is fine too.)
>
> For a vector I can do group < match(x, unqiue(x)) to get a vector that
> labels each element of x.
> What is an equivalent if x is a data frame?
>
> The result does not have to be fast: the data set will have < 100
> elements. Since this is inside the survival package, and that package is
> on the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> Terry T.
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posti> ngguide.html
> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


"Label" is not a clear term for data frames, but most data frames have rownames. If dta is a data frame, not a tibble,
rownames( dta )[ !duplicated( dta ) ]
Or could use row indexes directly
which( !duplicated( dta ) )

Sent from my phone. Please excuse my brevity.
On September 18, 2017 6:54:29 AM PDT, Eric Berger < [hidden email]> wrote:
>Hi Terry,
>I take your question to mean how to label distinct rows of a data
>frame. If
>that is not your question please clarify.
>I found the row.match() function in the package prodlim that can be
>used to
>solve this.
>However since your request requires no additional dependencies I
>borrowed
>the relevant code from the row.match function.
>Here is some obfuscated code to provide your answer in one line, per
>your
>request. (less obfuscated code just below that.
>
>Assuming your data frame is called 'df':
>
>df[,ncol(df)+1] < match( do.call("paste", c(df[, , drop = FALSE], sep
>=
>"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r"))
>)
>
>The last column of df now contains the 'label' i.e. the row number of
>the
>first row in df that is the same as the given row.
>
>Somewhat less obfuscated
>
>getLabels < function(df) {
> match( do.call("paste", c(df[, , drop = FALSE],
>sep = "\\r")),
> do.call("paste", c(unique(df)[, , drop
>= FALSE], sep = "\\r")) )
> }
>
>myDataFrame$label < getLabels(myDataFrame)
>
>
>HTH,
>
>Eric
>
>
>On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
> [hidden email]> wrote:
>
>> This question likely has a 1 line answer, I'm just not seeing it.
>(2, 3,
>> or 10 lines is fine too.)
>>
>> For a vector I can do group < match(x, unqiue(x)) to get a vector
>that
>> labels each element of x.
>> What is an equivalent if x is a data frame?
>>
>> The result does not have to be fast: the data set will have < 100
>> elements. Since this is inside the survival package, and that
>package is
>> on the 'recommended' list, I can't depend on any package outside the
>> recommended list.
>>
>> Terry T.
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/posti>> ngguide.html
>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html>and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


In reply to this post by Therneau, Terry M., Ph.D.
Hi!
20170918 07:13 0500, Therneau, Terry M., Ph.D. wrote:
> This question likely has a 1 line answer, I'm just not seeing
> it. (2, 3, or 10 lines is
> fine too.)
>
> For a vector I can do group < match(x, unqiue(x)) to get a vector
> that labels each
> element of x.
Actually, you get a vector of indices matching 'unique(x)', not a
labelled vector.
> x<c("A","B","C","A","C","D")
> group<match(x, unique(x))
> group
[1] 1 2 3 1 3 4
> What is an equivalent if x is a data frame?
So you will generate an index where duplicated rows have the row index
of the first occurrence, right? This could work:
> x<data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
> group<rownames(x)
> for (i in 1:(nrow(x)1)) {
for (j in (i+1):nrow(x)) {
if (sum(as.numeric(x[i,]==x[j,]))==ncol(x)) {
group[j]<group[i] }
}
}
> group
[1] "1" "2" "3" "3" "5" "1"
HTH,
Kimmo
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


In reply to this post by Therneau, Terry M., Ph.D.
You could use merge() with an ID column pasted onto the table of names, as
in
> tbl < data.frame(FirstName=c("Abe","Abe","Bob","Chuck","Chuck"),
Surname=c("Xavier","Yates","Yates","Yates","Zapf"), Id=paste0("P",101:105))
> tbl
FirstName Surname Id
1 Abe Xavier P101
2 Abe Yates P102
3 Bob Yates P103
4 Chuck Yates P104
5 Chuck Zapf P105
> merge(data.frame(FirstName=c("Abe","Chuck","Dave"),
Surname=rep("Yates",3)), tbl, all.x=TRUE)
FirstName Surname Id
1 Abe Yates P102
2 Chuck Yates P104
3 Dave Yates <NA>
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Mon, Sep 18, 2017 at 5:13 AM, Therneau, Terry M., Ph.D. <
[hidden email]> wrote:
> This question likely has a 1 line answer, I'm just not seeing it. (2, 3,
> or 10 lines is fine too.)
>
> For a vector I can do group < match(x, unqiue(x)) to get a vector that
> labels each element of x.
> What is an equivalent if x is a data frame?
>
> The result does not have to be fast: the data set will have < 100
> elements. Since this is inside the survival package, and that package is
> on the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> Terry T.
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posti> ngguide.html
> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


In reply to this post by Therneau, Terry M., Ph.D.
> On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. < [hidden email]> wrote:
>
> This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.)
>
> For a vector I can do group < match(x, unqiue(x)) to get a vector that labels each element of x.
> What is an equivalent if x is a data frame?
>
In the past I've use apply with past to generate "group" identifiers:
x<data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
apply(x, 1, paste, collapse=".")
[1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"
> The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' Gehm's Corollary to Clarke's Third Law
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Yes. My understanding is that you want the identifier to have the same
number of rows as the data frame. A slight variant of David's solution
would then be:
do.call(paste0,x)
 Bert
On Mon, Sep 18, 2017 at 8:29 AM, David Winsemius < [hidden email]>
wrote:
>
> > On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <
> [hidden email]> wrote:
> >
> > This question likely has a 1 line answer, I'm just not seeing it. (2,
> 3, or 10 lines is fine too.)
> >
> > For a vector I can do group < match(x, unqiue(x)) to get a vector that
> labels each element of x.
> > What is an equivalent if x is a data frame?
> >
>
> In the past I've use apply with past to generate "group" identifiers:
>
>
> x<data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
>
> apply(x, 1, paste, collapse=".")
> [1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"
>
>
> > The result does not have to be fast: the data set will have < 100
> elements. Since this is inside the survival package, and that package is
> on the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
> Gehm's Corollary to Clarke's Third Law
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/> postingguide.html
> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

