sorting a data.frame (df) by a vector (which is not contained in the df) - unexpected behaviour of match and factor

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

sorting a data.frame (df) by a vector (which is not contained in the df) - unexpected behaviour of match and factor

drflxms
Dear R colleagues,

consider my data.frame named "df" with 3 columns - being level,
prevalence and sensitivity - and 7 rows of data (see dump below).

df <-
structure(list(level = structure(1:7, .Label = c("0", "1", "10",
"100", "1010", "11", "110"), class = "factor"), prevalence =
structure(c(4L,
2L, 3L, 5L, 6L, 1L, 7L), .Label = c("0.488", "0.5", "0.754",
"0.788", "0.803", "0.887", "0.905"), class = "factor"), sensitivity =
structure(c(6L,
1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0", "0.05", "0.091", "0.123",
"0.327", "0.933"), class = "factor")), .Names = c("level", "prevalence",
"sensitivity"), class = "data.frame", row.names = c(NA, -7L))

I'd like to order df by a vector which is NOT contained in the
data.frame. Let's call this vector desiredOrder (see dump below).

desiredOrder <- c("0", "1", "10", "100", "11", "110", "1010")

So after sorting, the order of the level column (df$level) should be in
the order of the vector desiredOrder (as well a the associated data in
the other columns).
I know that this is not an easy task to achieve by order(...) as the
order of desiredOrder isn't a natural one. But I would expect both of
the following to work:

## using match
df[match(df$level,desiredOrder),]

## using factor
df[factor(df$level,levels=desiredOrder),]

Unfortunately the result isn't what I expected: I get a data.frame with
the level column in the order 0,1,10,100,110,1010,11 instead of the
order in desiredOrder (0,1,10,100,11,110,1010).

Does anybody see, what I am doing wrong?
I'd appreciate any kind of help very much!
Best regards, Felix

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sorting a data.frame (df) by a vector (which is not contained in the df) - unexpected behaviour of match and factor

Jeff Newmiller
Your desiredOrder vector is a vector of strings. Convert it to numeric and it should work.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

drflxms <[hidden email]> wrote:

>Dear R colleagues,
>
>consider my data.frame named "df" with 3 columns - being level,
>prevalence and sensitivity - and 7 rows of data (see dump below).
>
>df <-
>structure(list(level = structure(1:7, .Label = c("0", "1", "10",
>"100", "1010", "11", "110"), class = "factor"), prevalence =
>structure(c(4L,
>2L, 3L, 5L, 6L, 1L, 7L), .Label = c("0.488", "0.5", "0.754",
>"0.788", "0.803", "0.887", "0.905"), class = "factor"), sensitivity =
>structure(c(6L,
>1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0", "0.05", "0.091", "0.123",
>"0.327", "0.933"), class = "factor")), .Names = c("level",
>"prevalence",
>"sensitivity"), class = "data.frame", row.names = c(NA, -7L))
>
>I'd like to order df by a vector which is NOT contained in the
>data.frame. Let's call this vector desiredOrder (see dump below).
>
>desiredOrder <- c("0", "1", "10", "100", "11", "110", "1010")
>
>So after sorting, the order of the level column (df$level) should be in
>the order of the vector desiredOrder (as well a the associated data in
>the other columns).
>I know that this is not an easy task to achieve by order(...) as the
>order of desiredOrder isn't a natural one. But I would expect both of
>the following to work:
>
>## using match
>df[match(df$level,desiredOrder),]
>
>## using factor
>df[factor(df$level,levels=desiredOrder),]
>
>Unfortunately the result isn't what I expected: I get a data.frame with
>the level column in the order 0,1,10,100,110,1010,11 instead of the
>order in desiredOrder (0,1,10,100,11,110,1010).
>
>Does anybody see, what I am doing wrong?
>I'd appreciate any kind of help very much!
>Best regards, Felix
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sorting a data.frame (df) by a vector (which is not contained in the df) - unexpected behaviour of match and factor

drflxms
Jeff,

thanks a lot for your quick reply and the hint!

Meanwhile I found a solution that works - at least for my case ;)
The code to get the job done is

df[order(match(df$level,desiredOrder)),]

So we seem in need of one order statement more. I found this solution
doing it stepwise:

## sorting the levels of the level column in the data.frame
df$level <- factor(df$level,levels=desiredOrder)
## sorting the data frame by the newly sorted level column
df[order(df$level),]

Maybe this solution is of a help for someone else as well?

But honestly I still do not exactly understand why
df[match(df$level,desiredOrder),] doesn't work...

Cheers, Felix

Am 29.12.11 10:58, schrieb Jeff Newmiller:

> Your desiredOrder vector is a vector of strings. Convert it to numeric and it should work.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> drflxms <[hidden email]> wrote:
>
>> Dear R colleagues,
>>
>> consider my data.frame named "df" with 3 columns - being level,
>> prevalence and sensitivity - and 7 rows of data (see dump below).
>>
>> df <-
>> structure(list(level = structure(1:7, .Label = c("0", "1", "10",
>> "100", "1010", "11", "110"), class = "factor"), prevalence =
>> structure(c(4L,
>> 2L, 3L, 5L, 6L, 1L, 7L), .Label = c("0.488", "0.5", "0.754",
>> "0.788", "0.803", "0.887", "0.905"), class = "factor"), sensitivity =
>> structure(c(6L,
>> 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0", "0.05", "0.091", "0.123",
>> "0.327", "0.933"), class = "factor")), .Names = c("level",
>> "prevalence",
>> "sensitivity"), class = "data.frame", row.names = c(NA, -7L))
>>
>> I'd like to order df by a vector which is NOT contained in the
>> data.frame. Let's call this vector desiredOrder (see dump below).
>>
>> desiredOrder <- c("0", "1", "10", "100", "11", "110", "1010")
>>
>> So after sorting, the order of the level column (df$level) should be in
>> the order of the vector desiredOrder (as well a the associated data in
>> the other columns).
>> I know that this is not an easy task to achieve by order(...) as the
>> order of desiredOrder isn't a natural one. But I would expect both of
>> the following to work:
>>
>> ## using match
>> df[match(df$level,desiredOrder),]
>>
>> ## using factor
>> df[factor(df$level,levels=desiredOrder),]
>>
>> Unfortunately the result isn't what I expected: I get a data.frame with
>> the level column in the order 0,1,10,100,110,1010,11 instead of the
>> order in desiredOrder (0,1,10,100,11,110,1010).
>>
>> Does anybody see, what I am doing wrong?
>> I'd appreciate any kind of help very much!
>> Best regards, Felix
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sorting a data.frame (df) by a vector (which is not contained in the df) - unexpected behaviour of match and factor

Berend Hasselman
In reply to this post by drflxms
drflxms wrote
Dear R colleagues,

consider my data.frame named "df" with 3 columns - being level,
prevalence and sensitivity - and 7 rows of data (see dump below).

df <-
structure(list(level = structure(1:7, .Label = c("0", "1", "10",
"100", "1010", "11", "110"), class = "factor"), prevalence =
structure(c(4L,
2L, 3L, 5L, 6L, 1L, 7L), .Label = c("0.488", "0.5", "0.754",
"0.788", "0.803", "0.887", "0.905"), class = "factor"), sensitivity =
structure(c(6L,
1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0", "0.05", "0.091", "0.123",
"0.327", "0.933"), class = "factor")), .Names = c("level", "prevalence",
"sensitivity"), class = "data.frame", row.names = c(NA, -7L))

I'd like to order df by a vector which is NOT contained in the
data.frame. Let's call this vector desiredOrder (see dump below).

desiredOrder <- c("0", "1", "10", "100", "11", "110", "1010")

So after sorting, the order of the level column (df$level) should be in
the order of the vector desiredOrder (as well a the associated data in
the other columns).
I know that this is not an easy task to achieve by order(...) as the
order of desiredOrder isn't a natural one. But I would expect both of
the following to work:

## using match
df[match(df$level,desiredOrder),]

## using factor
df[factor(df$level,levels=desiredOrder),]

Unfortunately the result isn't what I expected: I get a data.frame with
the level column in the order 0,1,10,100,110,1010,11 instead of the
order in desiredOrder (0,1,10,100,11,110,1010).

Does anybody see, what I am doing wrong?
Try this:

df[match(desiredOrder,df$level),]

Berend
Reply | Threaded
Open this post in threaded view
|

Re: sorting a data.frame (df) by a vector (which is not contained in the df) - unexpected behaviour of match and factor

Berend Hasselman
In reply to this post by drflxms
drflxms wrote
Jeff,

thanks a lot for your quick reply and the hint!

Meanwhile I found a solution that works - at least for my case ;)
The code to get the job done is

df[order(match(df$level,desiredOrder)),]

So we seem in need of one order statement more. I found this solution
doing it stepwise:

## sorting the levels of the level column in the data.frame
df$level <- factor(df$level,levels=desiredOrder)
## sorting the data frame by the newly sorted level column
df[order(df$level),]

Maybe this solution is of a help for someone else as well?

But honestly I still do not exactly understand why
df[match(df$level,desiredOrder),] doesn't work...
Read carefully

?match

and  then do

match(df$level,desiredOrder)
match(desiredOrder,df$level)

and look carefully at the results. Then it should be clear.

Berend