Trying to understand how to sort a DF on two columns

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Trying to understand how to sort a DF on two columns

Sorkin, John
I want to sort a DF, temp, on two columns, patid and time. I have searched the internet and found code that I was able to modify to get my data sorted. Unfortunately I don't understand how the code works. I would appreciate it if someone could explain to me how the code works. Among other questions, despite reading, I don't understand how with() works, nor what it does in the current setting.

code:
data4xsort<-temp[
  with( temp, order(temp[,"patid"], temp[,"time"])),
]

Thank you,
John





John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand how to sort a DF on two columns

Bert Gunter-2
https://stackoverflow.com/questions/2315601/understanding-the-order-function

Do a web search on "How does order() work R" or similar for more.

I can't explain with() any better than the docs: saying that it evaluates
the expression argument in the data argument environment -- a data frame
for the data frame method -- probably won't help you.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 12, 2019 at 7:20 PM Sorkin, John <[hidden email]>
wrote:

> I want to sort a DF, temp, on two columns, patid and time. I have searched
> the internet and found code that I was able to modify to get my data
> sorted. Unfortunately I don't understand how the code works. I would
> appreciate it if someone could explain to me how the code works. Among
> other questions, despite reading, I don't understand how with() works, nor
> what it does in the current setting.
>
> code:
> data4xsort<-temp[
>   with( temp, order(temp[,"patid"], temp[,"time"])),
> ]
>
> Thank you,
> John
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand how to sort a DF on two columns

David Winsemius
In reply to this post by Sorkin, John
On 8/12/19 7:20 PM, Sorkin, John wrote:
> I want to sort a DF, temp, on two columns, patid and time. I have searched the internet and found code that I was able to modify to get my data sorted. Unfortunately I don't understand how the code works. I would appreciate it if someone could explain to me how the code works. Among other questions, despite reading, I don't understand how with() works, nor what it does in the current setting.
>
> code:
> data4xsort<-temp[
>    with( temp, order(temp[,"patid"], temp[,"time"])),
> ]
>
> Thank you,
> John


The `order`-function returns a numeric vector which is the length of its
inputs (and there is recycling when the inputs are of different length).
The numbers are the order in which the items would be if they were
sorted smallest to largest. There are arguments that let you control the
behavior in the case of ties. So when used in an indexing application as
seen here, it results in the dataframe returned with its rows in
ascending order based primarily on its first argument, patid,  and in
case of ties on the second argument, time. If you put a minus sign in
from of the argument it the ordering is largest to smallest.


If that is code you are getting from elsewhere, you should realize that
it is somewhat redundant and you should question the level of R skills
of its author.  In this code it is doing absolutely nothing. The `with(
...) is not needed because the arguments already have an unambiguous
place to get the column names.  A more compact expression if you were
going to use `with` would be:

data4xsort<-temp[ with( temp, order(patid, time)), ]

But using `with` carries risks because there are sometimes confusion about which environment it will find the named objects or columns.

Safer would be to not using it in this situation.

Your headers suggest you are using Outlook. Surely there must be a way to specify a plain text format for outgoing emails. This is a plain text mailing list.

David.

>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand how to sort a DF on two columns

Sorkin, John
In reply to this post by Bert Gunter-2
Bert,

Thank you for your reply (and the many other questions to the list that you answer).

I understand how order works when ordering based on a single column. What I don’t understand is how the code I included with my email works. I believe my problem is a lack of understanding of what with does. I have read about the with function, but I must be missing something.

Thank you,
John

From: Bert Gunter <[hidden email]>
Sent: Monday, August 12, 2019 10:36 PM
To: Sorkin, John <[hidden email]>
Cc: [hidden email] ([hidden email]) <[hidden email]>
Subject: Re: [R] Trying to understand how to sort a DF on two columns

https://stackoverflow.com/questions/2315601/understanding-the-order-function

Do a web search on "How does order() work R" or similar for more.

I can't explain with() any better than the docs: saying that it evaluates the expression argument in the data argument environment -- a data frame for the data frame method -- probably won't help you.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 12, 2019 at 7:20 PM Sorkin, John <[hidden email]<mailto:[hidden email]>> wrote:
I want to sort a DF, temp, on two columns, patid and time. I have searched the internet and found code that I was able to modify to get my data sorted. Unfortunately I don't understand how the code works. I would appreciate it if someone could explain to me how the code works. Among other questions, despite reading, I don't understand how with() works, nor what it does in the current setting.

code:
data4xsort<-temp[
  with( temp, order(temp[,"patid"], temp[,"time"])),
]

Thank you,
John





John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand how to sort a DF on two columns

Jeff Newmiller
With this as an example, no wonder you don't understand it. This is horrible.

`with` is very much like the `subset` function... it alleviates the need to re-type the containing object name repeatedly.

data4xsort <- temp[ with( temp, order( patid, time ) ), ]

is the same as

data4xsort <- temp[ order( temp$patid, temp$time ), ]

The example you gave makes no use of the `with` function.

On August 12, 2019 7:44:29 PM PDT, "Sorkin, John" <[hidden email]> wrote:

>Bert,
>
>Thank you for your reply (and the many other questions to the list that
>you answer).
>
>I understand how order works when ordering based on a single column.
>What I don’t understand is how the code I included with my email works.
>I believe my problem is a lack of understanding of what with does. I
>have read about the with function, but I must be missing something.
>
>Thank you,
>John
>
>From: Bert Gunter <[hidden email]>
>Sent: Monday, August 12, 2019 10:36 PM
>To: Sorkin, John <[hidden email]>
>Cc: [hidden email] ([hidden email]) <[hidden email]>
>Subject: Re: [R] Trying to understand how to sort a DF on two columns
>
>https://stackoverflow.com/questions/2315601/understanding-the-order-function
>
>Do a web search on "How does order() work R" or similar for more.
>
>I can't explain with() any better than the docs: saying that it
>evaluates the expression argument in the data argument environment -- a
>data frame for the data frame method -- probably won't help you.
>
>-- Bert
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Mon, Aug 12, 2019 at 7:20 PM Sorkin, John
><[hidden email]<mailto:[hidden email]>> wrote:
>I want to sort a DF, temp, on two columns, patid and time. I have
>searched the internet and found code that I was able to modify to get
>my data sorted. Unfortunately I don't understand how the code works. I
>would appreciate it if someone could explain to me how the code works.
>Among other questions, despite reading, I don't understand how with()
>works, nor what it does in the current setting.
>
>code:
>data4xsort<-temp[
>  with( temp, order(temp[,"patid"], temp[,"time"])),
>]
>
>Thank you,
>John
>
>
>
>
>
>John David Sorkin M.D., Ph.D.
>Professor of Medicine
>Chief, Biostatistics and Informatics
>University of Maryland School of Medicine Division of Gerontology and
>Geriatric Medicine
>Baltimore VA Medical Center
>10 North Greene Street
>GRECC (BT/18/GR)
>Baltimore, MD 21201-1524
>(Phone) 410-605-7119
>(Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>        [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email]<mailto:[hidden email]> mailing list -- To
>UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand how to sort a DF on two columns

Rui Barradas
In reply to this post by Sorkin, John
Hello,

Though good answers were already given, I would like to say something.

1.
If you are lazy (typing), use with, if you prefer to play safe, don't.
I am lazy many times, but in interactive mode only.

2.
I find it better in the long run *not* to take advantage of R's
one-liners, they tend to be less readable. Instead of putting everything
in the same instruction why not

i <- order( temp$patid, temp$time )
data4xsort <- temp[ i, ]


This has the disadvantage of creating an extra variable but are you
really having memory problems? If not, use the clearer code. Besides, if
this goes into a function all temporary variables will be gone and the
memory released, in which case there will be no problem.

(The with equivalent is i <- with(temp, order(patid, time)), btw.)

Hope this helps,

Rui Barradas

Às 03:20 de 13/08/19, Sorkin, John escreveu:

> I want to sort a DF, temp, on two columns, patid and time. I have searched the internet and found code that I was able to modify to get my data sorted. Unfortunately I don't understand how the code works. I would appreciate it if someone could explain to me how the code works. Among other questions, despite reading, I don't understand how with() works, nor what it does in the current setting.
>
> code:
> data4xsort<-temp[
>    with( temp, order(temp[,"patid"], temp[,"time"])),
> ]
>
> Thank you,
> John
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand how to sort a DF on two columns

S Ellison-2
In reply to this post by Sorkin, John

> I want to sort a DF, temp, on two columns, patid and time. I have searched
> the internet and found code that I was able to modify to get my data sorted.
> Unfortunately I don't understand how the code works. I would appreciate it
> if someone could explain to me how the code works. Among other
> questions, despite reading, I don't understand how with() works, nor what it
> does in the current setting.
>
> code:
 data4xsort<-temp[
   with( temp, order(temp[,"patid"], temp[,"time"])),
 ]

With apologies for brevity-induced brusqueness:

1) You don't need 'with' in the code. You could say
data4xsort<- temp[order(temp[,"patid"], temp[,"time"]), ]
or
data4xsort<- temp[order(temp$patid, temp$time), ]

2) If you _did_ use 'with', you could say
data4xsort<- temp[with(temp, order(patid,time)), ]

Basically, 'with(x, ...)' says 'look in x first for anything in '...'.

3. order. order is a bit of a mindbender. It gives you the numeric indices you need to convert an unsorted object into a sorted obbject.
If we said
a <- c(2,3,1)  
order(a)
by default, we get back
# [1] 3 1 2

These are indexes into a that put the elements of a in ascending order. a[3] is 1, a[1] is 2 and so on.
So if we say
oo <- order(a)
a[oo]

we get
[1] 1 2 3
... which is a, in ascending order. And to do that, we used oo as indexes in a.

4. For a data frame, you generally want to sort rows into a particular order. So let's say we have a data frame like
d <- data.frame(a=c(2,3,1,3,1,2), b=c(1,2,2,1,1,2))
d
  a b
1 2 1
2 3 2
3 1 2
4 3 1
5 1 1
6 2 2

We can say
oo.d <- with(d, order(a, b)) #which says 'look in 'd' to find 'a' and 'b'
        #We could also have said oo.d <- order(d$a, d$b)

This gives us the row numbers of d, arranged to give us the row ordering we asked 'order' to generate.
Now, if we say
d[oo.d, ]     #where we need the empty second index so that the first is treated as a row index
# we get d, with rows sorted by a first and then b:
  a b
5 1 1
3 1 2
1 2 1
6 2 2
4 3 1
2 3 2

#You might notice that the default row numbers from d - the left hand colum above - are now identical to oo.d;
# this is particular to default row numbers, though.

5. If you want to pack that into one line without assigning the ordering to oo.d, it goes (for example)
d[ with(d, order(a, b)), ]

... which is pretty much what your code is doing.

The only thing I've missed is that when you wrap something like
order(temp[,"patid"], temp[,"time"])
in 'with', 'with' is not doing anything useful for you.
temp[,"patid"] has already told R where to look for patid,
so R doesn’t need to look anywhere else.


Does that help?

Steve Ellison


*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If
you have received this message in error, please notify the sender
immediately via +44(0)20 8943 7000 or notify [hidden email]
and delete this message and any copies from your computer and network.
LGC Limited. Registered in England 2991879.
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.