Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

Al Koholic

I have a data frame each with 10 variables of integer data for various
  attributes about each row of data, and I need to know the highest 5  
variables related to each of
  row in this data frame and output that to a new data frame. In addition to
  the 5 highest variable names, I also need to know the corresponding 5
  highest variable values for each row.

  A simple code example to generate a sample data frame for this is:

  set.seed(1)
  DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
  DF <- as.data.frame.matrix(DF)


This would result in an example data frame like this:

  #   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
  # 1  3  2  5  6  5  2  6  8  1   3
  # 2  1  4  7  8  7  7  3  4  2   9
  # 3  2  3  4  7  5  8  9  1  3   5
  # 4  3  8  3  4  5  6  7  4  6   5
  # 5  6  2  3  7  2  1  8  3  2   4
  # 6  8  2  4  8  3  2  9  7  6   5
  # 7  1  5  3  6  8  3  8  9  1   3
  # 8  9  3  5  8  4  9  7  8  1   2
  # 9  1  2  4  8  3  2  1  2  5   6


  My ideal output would be something like this:


  #      V1   V2   V3   V4   V5
  # 1  V2:9 V7:8 V8:7 V4:6 V3:5
  # 2  V9:9 V3:8 V5:7 V7:6 V4:5
  # 3  V5:9 V3:8 V2:7 V9:6 V7:5
  # 4  V8:9 V4:8 V2:7 V5:6 V9:5
  # 5  V9:9 V1:8 V6:7 V3:6 V5:5
  # 6  V8:9 V1:8 V5:7 V9:6 V4:5
  # 7  V2:9 V8:8 V7:7 V5:6 V9:5
  # 8  V4:9 V7:8 V9:7 V2:6 V8:5
  # 9  V3:9 V7:8 V8:7 V4:6 V5:5
  # 10 V6:9 V8:8 V1:7 V9:6 V4:5


  I was trying to use code, but this doesn't seem to work:

  out <- t(apply(DF, 1, function(x){
    o <- head(order(-x), 5)
    paste0(names(x[o]), ':', x[o])
  }))
  as.data.frame(out)



  Thanks everyone!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

David Carlson
There are some problems with your example. Your code does not produce anything like your example data frame because you draw only 9 values without replacement. Your code produces 10 columns, each with the same permutation of the values 1:9.

Then your desired output does not make sense in terms of your example data. The first entry is V2:9 but 9 does not appear in row 1.

Using your posted example:
DF <- structure(list(V1 = c(3L, 1L, 2L, 3L, 6L, 8L, 1L, 9L, 1L),
V2 = c(2L, 4L, 3L, 8L, 2L, 2L, 5L, 3L, 2L), V3 = c(5L, 7L, 4L, 3L,
3L, 4L, 3L, 5L, 4L), V4 = c(6L, 8L, 7L, 4L, 7L, 8L, 6L, 8L, 8L),
V5 = c(5L, 7L, 5L, 5L, 2L, 3L, 8L, 4L, 3L), V6 = c(2L, 7L, 8L, 6L,
1L, 2L, 3L, 9L, 2L), V7 = c(6L, 3L, 9L, 7L, 8L, 9L, 8L, 7L, 1L),
V8 = c(8L, 4L, 1L, 4L, 3L, 7L, 9L, 8L, 2L), V9 = c(1L, 2L, 3L, 6L,
2L, 6L, 1L, 1L, 5L), V10 = c(3L, 9L, 5L, 5L, 4L, 5L, 3L, 2L, 6L)),
class = "data.frame", row.names = c(NA, -9L))

Your code produces:

     V1    V2   V3    V4    V5
1  V8:8  V4:6 V7:6  V3:5  V5:5
2 V10:9  V4:8 V3:7  V5:7  V6:7
3  V7:9  V6:8 V4:7  V5:5 V10:5
4  V2:8  V7:7 V6:6  V9:6  V5:5
5  V7:8  V4:7 V1:6 V10:4  V3:3
6  V7:9  V1:8 V4:8  V8:7  V9:6
7  V8:9  V5:8 V7:8  V4:6  V2:5
8  V1:9  V6:9 V4:8  V8:8  V7:7
9  V4:8 V10:6 V9:5  V3:4  V5:3

Which seems to be what you wanted.

---------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University


-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Tom Woolman
Sent: Monday, December 17, 2018 11:34 AM
To: [hidden email]
Subject: [R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R


I have a data frame each with 10 variables of integer data for various
  attributes about each row of data, and I need to know the highest 5  
variables related to each of
  row in this data frame and output that to a new data frame. In addition to
  the 5 highest variable names, I also need to know the corresponding 5
  highest variable values for each row.

  A simple code example to generate a sample data frame for this is:

  set.seed(1)
  DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
  DF <- as.data.frame.matrix(DF)


This would result in an example data frame like this:

  #   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
  # 1  3  2  5  6  5  2  6  8  1   3
  # 2  1  4  7  8  7  7  3  4  2   9
  # 3  2  3  4  7  5  8  9  1  3   5
  # 4  3  8  3  4  5  6  7  4  6   5
  # 5  6  2  3  7  2  1  8  3  2   4
  # 6  8  2  4  8  3  2  9  7  6   5
  # 7  1  5  3  6  8  3  8  9  1   3
  # 8  9  3  5  8  4  9  7  8  1   2
  # 9  1  2  4  8  3  2  1  2  5   6


  My ideal output would be something like this:


  #      V1   V2   V3   V4   V5
  # 1  V2:9 V7:8 V8:7 V4:6 V3:5
  # 2  V9:9 V3:8 V5:7 V7:6 V4:5
  # 3  V5:9 V3:8 V2:7 V9:6 V7:5
  # 4  V8:9 V4:8 V2:7 V5:6 V9:5
  # 5  V9:9 V1:8 V6:7 V3:6 V5:5
  # 6  V8:9 V1:8 V5:7 V9:6 V4:5
  # 7  V2:9 V8:8 V7:7 V5:6 V9:5
  # 8  V4:9 V7:8 V9:7 V2:6 V8:5
  # 9  V3:9 V7:8 V8:7 V4:6 V5:5
  # 10 V6:9 V8:8 V1:7 V9:6 V4:5


  I was trying to use code, but this doesn't seem to work:

  out <- t(apply(DF, 1, function(x){
    o <- head(order(-x), 5)
    paste0(names(x[o]), ':', x[o])
  }))
  as.data.frame(out)



  Thanks everyone!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

PIKAL Petr
In reply to this post by Al Koholic
Hi

generated DF is not what you expect it is

>   set.seed(1)
>   DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
> DF
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    3    3    3    3    3    3    3    3    3     3
 [2,]    9    9    9    9    9    9    9    9    9     9
 [3,]    5    5    5    5    5    5    5    5    5     5
 [4,]    6    6    6    6    6    6    6    6    6     6
 [5,]    2    2    2    2    2    2    2    2    2     2
 [6,]    4    4    4    4    4    4    4    4    4     4
 [7,]    8    8    8    8    8    8    8    8    8     8
 [8,]    7    7    7    7    7    7    7    7    7     7
 [9,]    1    1    1    1    1    1    1    1    1     1
>

with slight input modification

> set.seed(1)
> DF <- matrix(sample(1:9,90, replace=T), ncol=10, nrow=9)
>   DF <- as.data.frame.matrix(DF)
>

> out <- t(apply(DF, 1, function(x){
+     o <- head(order(-x), 5)
+     paste0(names(x[o]), ':', x[o])
+   }))
>   as.data.frame(out)
    V1   V2    V3   V4    V5
1 V5:8 V6:8 V10:7 V3:4  V4:4
2 V4:8 V3:7  V8:6 V1:4  V9:4
3 V3:9 V5:7  V1:6 V6:5  V9:5
4 V1:9 V9:9  V2:7 V6:7 V10:7
5 V5:8 V9:8  V6:7 V8:7  V3:6
6 V1:9 V2:7 V10:7 V5:6  V4:5
7 V1:9 V7:9  V5:8 V6:8  V8:8
8 V9:9 V4:8  V2:7 V1:6  V5:5
9 V2:9 V8:8  V4:7 V1:6  V5:5

your code seems to work.
Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Tom Woolman
> Sent: Monday, December 17, 2018 6:34 PM
> To: [hidden email]
> Subject: [R] Trying to fix code that will find highest 5 column names and their
> associated values for each row in a data frame in R
>
>
> I have a data frame each with 10 variables of integer data for various
>   attributes about each row of data, and I need to know the highest 5 variables
> related to each of
>   row in this data frame and output that to a new data frame. In addition to
>   the 5 highest variable names, I also need to know the corresponding 5
>   highest variable values for each row.
>
>   A simple code example to generate a sample data frame for this is:
>
>   set.seed(1)
>   DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
>   DF <- as.data.frame.matrix(DF)
>
>
> This would result in an example data frame like this:
>
>   #   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
>   # 1  3  2  5  6  5  2  6  8  1   3
>   # 2  1  4  7  8  7  7  3  4  2   9
>   # 3  2  3  4  7  5  8  9  1  3   5
>   # 4  3  8  3  4  5  6  7  4  6   5
>   # 5  6  2  3  7  2  1  8  3  2   4
>   # 6  8  2  4  8  3  2  9  7  6   5
>   # 7  1  5  3  6  8  3  8  9  1   3
>   # 8  9  3  5  8  4  9  7  8  1   2
>   # 9  1  2  4  8  3  2  1  2  5   6
>
>
>   My ideal output would be something like this:
>
>
>   #      V1   V2   V3   V4   V5
>   # 1  V2:9 V7:8 V8:7 V4:6 V3:5
>   # 2  V9:9 V3:8 V5:7 V7:6 V4:5
>   # 3  V5:9 V3:8 V2:7 V9:6 V7:5
>   # 4  V8:9 V4:8 V2:7 V5:6 V9:5
>   # 5  V9:9 V1:8 V6:7 V3:6 V5:5
>   # 6  V8:9 V1:8 V5:7 V9:6 V4:5
>   # 7  V2:9 V8:8 V7:7 V5:6 V9:5
>   # 8  V4:9 V7:8 V9:7 V2:6 V8:5
>   # 9  V3:9 V7:8 V8:7 V4:6 V5:5
>   # 10 V6:9 V8:8 V1:7 V9:6 V4:5
>
>
>   I was trying to use code, but this doesn't seem to work:
>
>   out <- t(apply(DF, 1, function(x){
>     o <- head(order(-x), 5)
>     paste0(names(x[o]), ':', x[o])
>   }))
>   as.data.frame(out)
>
>
>
>   Thanks everyone!
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.