Why is merge sorting even when sort = F?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Why is merge sorting even when sort = F?

Dimitri Liakhovitski-2
Hello!
I have a vector 'grades' and a data frame 'info':

grades2 <- data.frame(grade = c(1,2,2,3,1))
info <- data.frame(
  grade = 3:1,
  desc = c("Excellent", "Good", "Poor"),
  fail = c(F, F, T)
)

I want to get the info for all grades I have in info:

This solution resorts everything in the order of column 'grade':
merge(grades2, info, by = "grade", all.x = T, all.y = F)

Could you please explain why this solution also resorts - despite sort = FALSE?
merge(grades2, info, by = "grade", all.x = T, all.y = F, sort = FALSE)

Thanks a lot!
--
Dimitri Liakhovitski

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is merge sorting even when sort = F?

Jeff Newmiller
Merging is not necessarily an order-preserving operation, but sorting can make the operation more efficient. The sort=TRUE argument forces the result to be sorted, but sort=FALSE is in not a promise that order will be preserved. (I think the imperfect sorting occurs when there are multiple keys but am not sure.) You can add columns to the input data that let you restore some semblance of the original ordering afterward, or you can roll your own possibly-less-efficient merge using match and indexing:

info[ match( grades2$grade, info$grade ), ]
--
Sent from my phone. Please excuse my brevity.

On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski <[hidden email]> wrote:

>Hello!
>I have a vector 'grades' and a data frame 'info':
>
>grades2 <- data.frame(grade = c(1,2,2,3,1))
>info <- data.frame(
>  grade = 3:1,
>  desc = c("Excellent", "Good", "Poor"),
>  fail = c(F, F, T)
>)
>
>I want to get the info for all grades I have in info:
>
>This solution resorts everything in the order of column 'grade':
>merge(grades2, info, by = "grade", all.x = T, all.y = F)
>
>Could you please explain why this solution also resorts - despite sort
>= FALSE?
>merge(grades2, info, by = "grade", all.x = T, all.y = F, sort = FALSE)
>
>Thanks a lot!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is merge sorting even when sort = F?

Dimitri Liakhovitski-2
Thank you. I was just curious what sort=FALSE had no impact.
Wondering what it is there for then...

On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller
<[hidden email]> wrote:

> Merging is not necessarily an order-preserving operation, but sorting can make the operation more efficient. The sort=TRUE argument forces the result to be sorted, but sort=FALSE is in not a promise that order will be preserved. (I think the imperfect sorting occurs when there are multiple keys but am not sure.) You can add columns to the input data that let you restore some semblance of the original ordering afterward, or you can roll your own possibly-less-efficient merge using match and indexing:
>
> info[ match( grades2$grade, info$grade ), ]
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski <[hidden email]> wrote:
>>Hello!
>>I have a vector 'grades' and a data frame 'info':
>>
>>grades2 <- data.frame(grade = c(1,2,2,3,1))
>>info <- data.frame(
>>  grade = 3:1,
>>  desc = c("Excellent", "Good", "Poor"),
>>  fail = c(F, F, T)
>>)
>>
>>I want to get the info for all grades I have in info:
>>
>>This solution resorts everything in the order of column 'grade':
>>merge(grades2, info, by = "grade", all.x = T, all.y = F)
>>
>>Could you please explain why this solution also resorts - despite sort
>>= FALSE?
>>merge(grades2, info, by = "grade", all.x = T, all.y = F, sort = FALSE)
>>
>>Thanks a lot!



--
Dimitri Liakhovitski

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is merge sorting even when sort = F?

Jeff Newmiller
If you are still wondering, try re-reading my answer. FALSE is more efficient, TRUE is sorted. Lack of sorting has nothing to do with preserving order.
--
Sent from my phone. Please excuse my brevity.

On March 8, 2017 8:55:06 AM PST, Dimitri Liakhovitski <[hidden email]> wrote:

>Thank you. I was just curious what sort=FALSE had no impact.
>Wondering what it is there for then...
>
>On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller
><[hidden email]> wrote:
>> Merging is not necessarily an order-preserving operation, but sorting
>can make the operation more efficient. The sort=TRUE argument forces
>the result to be sorted, but sort=FALSE is in not a promise that order
>will be preserved. (I think the imperfect sorting occurs when there are
>multiple keys but am not sure.) You can add columns to the input data
>that let you restore some semblance of the original ordering afterward,
>or you can roll your own possibly-less-efficient merge using match and
>indexing:
>>
>> info[ match( grades2$grade, info$grade ), ]
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski
><[hidden email]> wrote:
>>>Hello!
>>>I have a vector 'grades' and a data frame 'info':
>>>
>>>grades2 <- data.frame(grade = c(1,2,2,3,1))
>>>info <- data.frame(
>>>  grade = 3:1,
>>>  desc = c("Excellent", "Good", "Poor"),
>>>  fail = c(F, F, T)
>>>)
>>>
>>>I want to get the info for all grades I have in info:
>>>
>>>This solution resorts everything in the order of column 'grade':
>>>merge(grades2, info, by = "grade", all.x = T, all.y = F)
>>>
>>>Could you please explain why this solution also resorts - despite
>sort
>>>= FALSE?
>>>merge(grades2, info, by = "grade", all.x = T, all.y = F, sort =
>FALSE)
>>>
>>>Thanks a lot!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is merge sorting even when sort = F?

Rui Barradas
In reply to this post by Dimitri Liakhovitski-2
Hello,

If you need to preserve the order you can do it like this.

inx <- order(grades2$grade)
result <- merge(grades2, info, by = "grade", all.x = T, all.y = F, sort
= FALSE)
result[order(inx), ]

Hope this helps,

Rui Barradas

Em 08-03-2017 16:55, Dimitri Liakhovitski escreveu:

> Thank you. I was just curious what sort=FALSE had no impact.
> Wondering what it is there for then...
>
> On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller
> <[hidden email]> wrote:
>> Merging is not necessarily an order-preserving operation, but sorting can make the operation more efficient. The sort=TRUE argument forces the result to be sorted, but sort=FALSE is in not a promise that order will be preserved. (I think the imperfect sorting occurs when there are multiple keys but am not sure.) You can add columns to the input data that let you restore some semblance of the original ordering afterward, or you can roll your own possibly-less-efficient merge using match and indexing:
>>
>> info[ match( grades2$grade, info$grade ), ]
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski <[hidden email]> wrote:
>>> Hello!
>>> I have a vector 'grades' and a data frame 'info':
>>>
>>> grades2 <- data.frame(grade = c(1,2,2,3,1))
>>> info <- data.frame(
>>>   grade = 3:1,
>>>   desc = c("Excellent", "Good", "Poor"),
>>>   fail = c(F, F, T)
>>> )
>>>
>>> I want to get the info for all grades I have in info:
>>>
>>> This solution resorts everything in the order of column 'grade':
>>> merge(grades2, info, by = "grade", all.x = T, all.y = F)
>>>
>>> Could you please explain why this solution also resorts - despite sort
>>> = FALSE?
>>> merge(grades2, info, by = "grade", all.x = T, all.y = F, sort = FALSE)
>>>
>>> Thanks a lot!
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is merge sorting even when sort = F?

Dimitri Liakhovitski-2
In reply to this post by Jeff Newmiller
I understood your answer.
The point is that sort = TRUE that doesn't sort is plain confusing.
Instead, the option should have been something like efficient = TRUE
or FALSE. At least then no one would stupidly expect sort = TRUE to
sort and sort = FALSE to NOT sort.

On Wed, Mar 8, 2017 at 12:51 PM, Jeff Newmiller
<[hidden email]> wrote:

> If you are still wondering, try re-reading my answer. FALSE is more efficient, TRUE is sorted. Lack of sorting has nothing to do with preserving order.
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 8, 2017 8:55:06 AM PST, Dimitri Liakhovitski <[hidden email]> wrote:
>>Thank you. I was just curious what sort=FALSE had no impact.
>>Wondering what it is there for then...
>>
>>On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller
>><[hidden email]> wrote:
>>> Merging is not necessarily an order-preserving operation, but sorting
>>can make the operation more efficient. The sort=TRUE argument forces
>>the result to be sorted, but sort=FALSE is in not a promise that order
>>will be preserved. (I think the imperfect sorting occurs when there are
>>multiple keys but am not sure.) You can add columns to the input data
>>that let you restore some semblance of the original ordering afterward,
>>or you can roll your own possibly-less-efficient merge using match and
>>indexing:
>>>
>>> info[ match( grades2$grade, info$grade ), ]
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski
>><[hidden email]> wrote:
>>>>Hello!
>>>>I have a vector 'grades' and a data frame 'info':
>>>>
>>>>grades2 <- data.frame(grade = c(1,2,2,3,1))
>>>>info <- data.frame(
>>>>  grade = 3:1,
>>>>  desc = c("Excellent", "Good", "Poor"),
>>>>  fail = c(F, F, T)
>>>>)
>>>>
>>>>I want to get the info for all grades I have in info:
>>>>
>>>>This solution resorts everything in the order of column 'grade':
>>>>merge(grades2, info, by = "grade", all.x = T, all.y = F)
>>>>
>>>>Could you please explain why this solution also resorts - despite
>>sort
>>>>= FALSE?
>>>>merge(grades2, info, by = "grade", all.x = T, all.y = F, sort =
>>FALSE)
>>>>
>>>>Thanks a lot!



--
Dimitri Liakhovitski

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is merge sorting even when sort = F?

DIGHE, NILESH [AG/2362]
Using the "join" function from the plyr package preserves the data order
library(plyr)
join(grades2, info, by="grade", type="left", match="all")

Nilesh
-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Dimitri Liakhovitski
Sent: Wednesday, March 08, 2017 12:45 PM
To: Jeff Newmiller <[hidden email]>
Cc: r-help <[hidden email]>
Subject: Re: [R] Why is merge sorting even when sort = F?

I understood your answer.
The point is that sort = TRUE that doesn't sort is plain confusing.
Instead, the option should have been something like efficient = TRUE or FALSE. At least then no one would stupidly expect sort = TRUE to sort and sort = FALSE to NOT sort.

On Wed, Mar 8, 2017 at 12:51 PM, Jeff Newmiller <[hidden email]> wrote:

> If you are still wondering, try re-reading my answer. FALSE is more efficient, TRUE is sorted. Lack of sorting has nothing to do with preserving order.
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 8, 2017 8:55:06 AM PST, Dimitri Liakhovitski <[hidden email]> wrote:
>>Thank you. I was just curious what sort=FALSE had no impact.
>>Wondering what it is there for then...
>>
>>On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller
>><[hidden email]> wrote:
>>> Merging is not necessarily an order-preserving operation, but
>>> sorting
>>can make the operation more efficient. The sort=TRUE argument forces
>>the result to be sorted, but sort=FALSE is in not a promise that order
>>will be preserved. (I think the imperfect sorting occurs when there
>>are multiple keys but am not sure.) You can add columns to the input
>>data that let you restore some semblance of the original ordering
>>afterward, or you can roll your own possibly-less-efficient merge
>>using match and
>>indexing:
>>>
>>> info[ match( grades2$grade, info$grade ), ]
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski
>><[hidden email]> wrote:
>>>>Hello!
>>>>I have a vector 'grades' and a data frame 'info':
>>>>
>>>>grades2 <- data.frame(grade = c(1,2,2,3,1)) info <- data.frame(
>>>>  grade = 3:1,
>>>>  desc = c("Excellent", "Good", "Poor"),
>>>>  fail = c(F, F, T)
>>>>)
>>>>
>>>>I want to get the info for all grades I have in info:
>>>>
>>>>This solution resorts everything in the order of column 'grade':
>>>>merge(grades2, info, by = "grade", all.x = T, all.y = F)
>>>>
>>>>Could you please explain why this solution also resorts - despite
>>sort
>>>>= FALSE?
>>>>merge(grades2, info, by = "grade", all.x = T, all.y = F, sort =
>>FALSE)
>>>>
>>>>Thanks a lot!



--
Dimitri Liakhovitski

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This email and any attachments were sent from a Monsanto email account and may contain confidential and/or privileged information. If you are not the intended recipient, please contact the sender and delete this email and any attachments immediately. Any unauthorized use, including disclosing, printing, storing, copying or distributing this email, is prohibited. All emails and attachments sent to or from Monsanto email accounts may be subject to monitoring, reading, and archiving by Monsanto, including its affiliates and subsidiaries, as permitted by applicable law. Thank you.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.