How to rank matrix data by deciles?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to rank matrix data by deciles?

vincent.deluard

Hi R users,

I have a matrix of data similar to:

> y=matrix(rnorm(55),ncol=5)


I would like to know to which decile each number belongs compared to the numbers in its column.

Say y[1,1] is the third decile among y[1:11,1] and y[2,1] is in the second decile
I would like get a matrix that would return their ranks in decile, i.e.,

y[1,1] -> 3
y[2,1] -> 2

Your help is much appreciated!
Reply | Threaded
Open this post in threaded view
|

Re: How to rank matrix data by deciles?

Phil Spector
Vincent -
    I think

  apply(y,2,function(x)
            cut(x,quantile(x,(0:10)/10),label=FALSE,include.lowest=TRUE))

will give you what you want (although you didn't use set.seed so I
can't verify it against your example.)

  - Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [hidden email]


On Thu, 6 May 2010, vincent.deluard wrote:

>
>
> Hi R users,
>
> I have a matrix of data similar to:
>
>> y=matrix(rnorm(55),ncol=5)
>
>
> I would like to know to which decile each number belongs compared to the
> numbers in its column.
>
> Say y[1,1] is the third decile among y[1:11,1] and y[2,1] is in the second
> decile
> I would like get a matrix that would return their ranks in decile, i.e.,
>
> y[1,1] -> 3
> y[2,1] -> 2
>
> Your help is much appreciated!
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-rank-matrix-data-by-deciles-tp2133496p2133496.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to rank matrix data by deciles?

vincent.deluard
Dear Phil,

You helped me with a request to rand matrix columns by deciles two weeks
ago.

This really unblocked me on this project but I found a little bug.

As in before, my data is in a matrix:

> madebt[1:16,1:2]
       X4.19.2010  X4.16.2010
 [1,] 26.61197531 26.58950617
 [2,]  5.72765432  5.73074074
 [3,]  5.95839506  5.96222222
 [4,]  5.64333333  5.64777778
 [5,] 20.93814815 20.95728395
 [6,]  0.00000000  0.00000000
 [7,]  0.07000000  0.07000000
 [8,] 12.87802469 12.86888889
 [9,]  3.64407407  3.64543210
[10,]  0.05037037  0.05049383
[11,] 25.59024691 25.60888889
[12,]  3.47987654  3.53246914
[13,]  0.00000000  0.00000000
[14,] 31.39037037 31.39049383
[15,]  3.78296296  3.77641975
[16,] 13.17876543 13.19617284

The apply function will work for this sample of my data:

debtdeciles = apply(madebt[1:16,1:2],2,function(x)
            cut(x,quantile(x,(0:10)/10,
na.rm=TRUE),label=FALSE,include.lowest=TRUE))

debtdeciles

     X4.19.2010 X4.16.2010
 [1,]         10         10
 [2,]          6          6
 [3,]          6          6
 [4,]          5          5
 [5,]          8          8
 [6,]          1          1
 [7,]          2          2
 [8,]          7          7
 [9,]          4          4
[10,]          2          2
[11,]          9          9
[12,]          3          3
[13,]          1          1
[14,]         10         10
[15,]          4          4
[16,]          8          8

However, it will fail for

> madebt[1:17,1:2]
       X4.19.2010  X4.16.2010
 [1,] 26.61197531 26.58950617
 [2,]  5.72765432  5.73074074
 [3,]  5.95839506  5.96222222
 [4,]  5.64333333  5.64777778
 [5,] 20.93814815 20.95728395
 [6,]  0.00000000  0.00000000
 [7,]  0.07000000  0.07000000
 [8,] 12.87802469 12.86888889
 [9,]  3.64407407  3.64543210
[10,]  0.05037037  0.05049383
[11,] 25.59024691 25.60888889
[12,]  3.47987654  3.53246914
[13,]  0.00000000  0.00000000
[14,] 31.39037037 31.39049383
[15,]  3.78296296  3.77641975
[16,] 13.17876543 13.19617284
[17,]  0.00000000  0.00000000


> debtdeciles = apply(madebt[1:17,1:2],2,function(x)
+             cut(x,quantile(x,(0:10)/10,
na.rm=TRUE),label=FALSE,include.lowest=TRUE))
Error in cut.default(x, quantile(x, (0:10)/10, na.rm = TRUE), label = FALSE,
:
  'breaks' are not unique

My guess is that we now have 3 "zeros" in each column. For each decile, we
cannot have more than 2 elements (total of 17 numbers in each column) and I
believe R cannot determine where to put the third "zero". Do you have any
solution for this problem?

Many thanks,


--------------------------------------------
Vincent Deluard
[hidden email]
Global Equity Strategist, CFA Charter Award Pending
TrimTabs Investment Research
40 Wall Street, 28th Floor
New York, NY 10005
Phone: (+1) 646-512-5616

-----Original Message-----
From: Phil Spector [mailto:[hidden email]]
Sent: Thursday, May 06, 2010 7:46 PM
To: vincent.deluard
Cc: [hidden email]
Subject: Re: [R] How to rank matrix data by deciles?

Vincent -
    I think

  apply(y,2,function(x)
            cut(x,quantile(x,(0:10)/10),label=FALSE,include.lowest=TRUE))

will give you what you want (although you didn't use set.seed so I
can't verify it against your example.)

  - Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [hidden email]


On Thu, 6 May 2010, vincent.deluard wrote:

>
>
> Hi R users,
>
> I have a matrix of data similar to:
>
>> y=matrix(rnorm(55),ncol=5)
>
>
> I would like to know to which decile each number belongs compared to the
> numbers in its column.
>
> Say y[1,1] is the third decile among y[1:11,1] and y[2,1] is in the second
> decile
> I would like get a matrix that would return their ranks in decile, i.e.,
>
> y[1,1] -> 3
> y[2,1] -> 2
>
> Your help is much appreciated!
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-to-rank-matrix-data-by-deciles-tp2133496p2
133496.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to rank matrix data by deciles?

vincent.deluard
In reply to this post by Phil Spector


Dear Phil,

You helped me with a request to rand matrix columns by deciles two weeks ago.

This really un-blocked me on this project but I found a little bug.

As in before, my data is in a matrix:

> madebt[1:16,1:2]
       X4.19.2010  X4.16.2010
 [1,] 26.61197531 26.58950617
 [2,]  5.72765432  5.73074074
 [3,]  5.95839506  5.96222222
 [4,]  5.64333333  5.64777778
 [5,] 20.93814815 20.95728395
 [6,]  0.00000000  0.00000000
 [7,]  0.07000000  0.07000000
 [8,] 12.87802469 12.86888889
 [9,]  3.64407407  3.64543210
[10,]  0.05037037  0.05049383
[11,] 25.59024691 25.60888889
[12,]  3.47987654  3.53246914
[13,]  0.00000000  0.00000000
[14,] 31.39037037 31.39049383
[15,]  3.78296296  3.77641975
[16,] 13.17876543 13.19617284

The apply function will work for this sample of my data:

debtdeciles = apply(madebt[1:16,1:2],2,function(x)
            cut(x,quantile(x,(0:10)/10, na.rm=TRUE),label=FALSE,include.lowest=TRUE))

debtdeciles

     X4.19.2010 X4.16.2010
 [1,]         10         10
 [2,]          6          6
 [3,]          6          6
 [4,]          5          5
 [5,]          8          8
 [6,]          1          1
 [7,]          2          2
 [8,]          7          7
 [9,]          4          4
[10,]          2          2
[11,]          9          9
[12,]          3          3
[13,]          1          1
[14,]         10         10
[15,]          4          4
[16,]          8          8

However, it will fail for

> madebt[1:17,1:2]
       X4.19.2010  X4.16.2010
 [1,] 26.61197531 26.58950617
 [2,]  5.72765432  5.73074074
 [3,]  5.95839506  5.96222222
 [4,]  5.64333333  5.64777778
 [5,] 20.93814815 20.95728395
 [6,]  0.00000000  0.00000000
 [7,]  0.07000000  0.07000000
 [8,] 12.87802469 12.86888889
 [9,]  3.64407407  3.64543210
[10,]  0.05037037  0.05049383
[11,] 25.59024691 25.60888889
[12,]  3.47987654  3.53246914
[13,]  0.00000000  0.00000000
[14,] 31.39037037 31.39049383
[15,]  3.78296296  3.77641975
[16,] 13.17876543 13.19617284
[17,]  0.00000000  0.00000000


> debtdeciles = apply(madebt[1:17,1:2],2,function(x)
+             cut(x,quantile(x,(0:10)/10, na.rm=TRUE),label=FALSE,include.lowest=TRUE))
Error in cut.default(x, quantile(x, (0:10)/10, na.rm = TRUE), label = FALSE,  :
  'breaks' are not unique

My guess is that we now have 3 "zeros" in each column. For each decile, we cannot have more than 2 elements (total of 17 numbers in each column) and I believe R cannot determine where to put the third "zero". Do you have any solution for this problem?

Many thanks,
Reply | Threaded
Open this post in threaded view
|

Re: How to rank matrix data by deciles?

Phil Spector
Vincent -
    I'm afraid there's no solution other than artificially modifying
the zeroes:

> vec
  [1] 26.58950617  5.73074074  5.96222222  5.64777778 20.95728395  0.00000000  0.07000000 12.86888889
  [9]  3.64543210  0.05049383 25.60888889  3.53246914  0.00000000 31.39049383  3.77641975 13.19617284
[17]  0.00000000
> cut(vec,quantile(vec,(0:10)/10),include.lowest=TRUE,label=FALSE)
Error in cut.default(vec, quantile(vec, (0:10)/10), include.lowest = TRUE,  :
   'breaks' are not unique
> vec[vec==0] = jitter(vec[vec==0])
> cut(vec,quantile(vec,(0:10)/10),include.lowest=TRUE,label=FALSE)
  [1] 10  6  7  5  9  1  3  7  4  2  9  4  2 10  5  8  1

It gives an answer, but it may not make sense for all data.

                                                           - Phil

On Thu, 13 May 2010, vincent.deluard wrote:

>
>
>
> Dear Phil,
>
> You helped me with a request to rand matrix columns by deciles two weeks
> ago.
>
> This really un-blocked me on this project but I found a little bug.
>
> As in before, my data is in a matrix:
>
>> madebt[1:16,1:2]
>       X4.19.2010  X4.16.2010
> [1,] 26.61197531 26.58950617
> [2,]  5.72765432  5.73074074
> [3,]  5.95839506  5.96222222
> [4,]  5.64333333  5.64777778
> [5,] 20.93814815 20.95728395
> [6,]  0.00000000  0.00000000
> [7,]  0.07000000  0.07000000
> [8,] 12.87802469 12.86888889
> [9,]  3.64407407  3.64543210
> [10,]  0.05037037  0.05049383
> [11,] 25.59024691 25.60888889
> [12,]  3.47987654  3.53246914
> [13,]  0.00000000  0.00000000
> [14,] 31.39037037 31.39049383
> [15,]  3.78296296  3.77641975
> [16,] 13.17876543 13.19617284
>
> The apply function will work for this sample of my data:
>
> debtdeciles = apply(madebt[1:16,1:2],2,function(x)
>            cut(x,quantile(x,(0:10)/10,
> na.rm=TRUE),label=FALSE,include.lowest=TRUE))
>
> debtdeciles
>
>     X4.19.2010 X4.16.2010
> [1,]         10         10
> [2,]          6          6
> [3,]          6          6
> [4,]          5          5
> [5,]          8          8
> [6,]          1          1
> [7,]          2          2
> [8,]          7          7
> [9,]          4          4
> [10,]          2          2
> [11,]          9          9
> [12,]          3          3
> [13,]          1          1
> [14,]         10         10
> [15,]          4          4
> [16,]          8          8
>
> However, it will fail for
>
>> madebt[1:17,1:2]
>       X4.19.2010  X4.16.2010
> [1,] 26.61197531 26.58950617
> [2,]  5.72765432  5.73074074
> [3,]  5.95839506  5.96222222
> [4,]  5.64333333  5.64777778
> [5,] 20.93814815 20.95728395
> [6,]  0.00000000  0.00000000
> [7,]  0.07000000  0.07000000
> [8,] 12.87802469 12.86888889
> [9,]  3.64407407  3.64543210
> [10,]  0.05037037  0.05049383
> [11,] 25.59024691 25.60888889
> [12,]  3.47987654  3.53246914
> [13,]  0.00000000  0.00000000
> [14,] 31.39037037 31.39049383
> [15,]  3.78296296  3.77641975
> [16,] 13.17876543 13.19617284
> [17,]  0.00000000  0.00000000
>
>
>> debtdeciles = apply(madebt[1:17,1:2],2,function(x)
> +             cut(x,quantile(x,(0:10)/10,
> na.rm=TRUE),label=FALSE,include.lowest=TRUE))
> Error in cut.default(x, quantile(x, (0:10)/10, na.rm = TRUE), label = FALSE,
> :
>  'breaks' are not unique
>
> My guess is that we now have 3 "zeros" in each column. For each decile, we
> cannot have more than 2 elements (total of 17 numbers in each column) and I
> believe R cannot determine where to put the third "zero". Do you have any
> solution for this problem?
>
> Many thanks,
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-rank-matrix-data-by-deciles-tp2133496p2215945.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to rank matrix data by deciles?

Peter Ehlers
On 2010-05-13 17:50, Phil Spector wrote:

> Vincent -
> I'm afraid there's no solution other than artificially modifying
> the zeroes:
>
>> vec
> [1] 26.58950617 5.73074074 5.96222222 5.64777778 20.95728395 0.00000000
> 0.07000000 12.86888889
> [9] 3.64543210 0.05049383 25.60888889 3.53246914 0.00000000 31.39049383
> 3.77641975 13.19617284
> [17] 0.00000000
>> cut(vec,quantile(vec,(0:10)/10),include.lowest=TRUE,label=FALSE)
> Error in cut.default(vec, quantile(vec, (0:10)/10), include.lowest =
> TRUE, :
> 'breaks' are not unique
>> vec[vec==0] = jitter(vec[vec==0])
>> cut(vec,quantile(vec,(0:10)/10),include.lowest=TRUE,label=FALSE)
> [1] 10 6 7 5 9 1 3 7 4 2 9 4 2 10 5 8 1
>
> It gives an answer, but it may not make sense for all data.
>
> - Phil
>

The problem is that quantile() produces multiple values
for the breaks used in cut(). Phil's suggestion modifies
the data. It might be preferable to modify the breaks:

   eps <- .Machine$double.eps  #or use something like 1e-10
   brks <- quantile(vec, (0:10)/10) + eps*(0:10)
   cut(vec, brks, include.lowest=TRUE, labels=FALSE)
   #[1] 10  6  7  5  9  1  3  7  4  2  9  4  1 10  5  8  1

  -Peter Ehlers

> On Thu, 13 May 2010, vincent.deluard wrote:
>
>>
>>
>>
>> Dear Phil,
>>
>> You helped me with a request to rand matrix columns by deciles two weeks
>> ago.
>>
>> This really un-blocked me on this project but I found a little bug.
>>
>> As in before, my data is in a matrix:
>>
>>> madebt[1:16,1:2]
>> X4.19.2010 X4.16.2010
>> [1,] 26.61197531 26.58950617
>> [2,] 5.72765432 5.73074074
>> [3,] 5.95839506 5.96222222
>> [4,] 5.64333333 5.64777778
>> [5,] 20.93814815 20.95728395
>> [6,] 0.00000000 0.00000000
>> [7,] 0.07000000 0.07000000
>> [8,] 12.87802469 12.86888889
>> [9,] 3.64407407 3.64543210
>> [10,] 0.05037037 0.05049383
>> [11,] 25.59024691 25.60888889
>> [12,] 3.47987654 3.53246914
>> [13,] 0.00000000 0.00000000
>> [14,] 31.39037037 31.39049383
>> [15,] 3.78296296 3.77641975
>> [16,] 13.17876543 13.19617284
>>
>> The apply function will work for this sample of my data:
>>
>> debtdeciles = apply(madebt[1:16,1:2],2,function(x)
>> cut(x,quantile(x,(0:10)/10,
>> na.rm=TRUE),label=FALSE,include.lowest=TRUE))
>>
>> debtdeciles
>>
>> X4.19.2010 X4.16.2010
>> [1,] 10 10
>> [2,] 6 6
>> [3,] 6 6
>> [4,] 5 5
>> [5,] 8 8
>> [6,] 1 1
>> [7,] 2 2
>> [8,] 7 7
>> [9,] 4 4
>> [10,] 2 2
>> [11,] 9 9
>> [12,] 3 3
>> [13,] 1 1
>> [14,] 10 10
>> [15,] 4 4
>> [16,] 8 8
>>
>> However, it will fail for
>>
>>> madebt[1:17,1:2]
>> X4.19.2010 X4.16.2010
>> [1,] 26.61197531 26.58950617
>> [2,] 5.72765432 5.73074074
>> [3,] 5.95839506 5.96222222
>> [4,] 5.64333333 5.64777778
>> [5,] 20.93814815 20.95728395
>> [6,] 0.00000000 0.00000000
>> [7,] 0.07000000 0.07000000
>> [8,] 12.87802469 12.86888889
>> [9,] 3.64407407 3.64543210
>> [10,] 0.05037037 0.05049383
>> [11,] 25.59024691 25.60888889
>> [12,] 3.47987654 3.53246914
>> [13,] 0.00000000 0.00000000
>> [14,] 31.39037037 31.39049383
>> [15,] 3.78296296 3.77641975
>> [16,] 13.17876543 13.19617284
>> [17,] 0.00000000 0.00000000
>>
>>
>>> debtdeciles = apply(madebt[1:17,1:2],2,function(x)
>> + cut(x,quantile(x,(0:10)/10,
>> na.rm=TRUE),label=FALSE,include.lowest=TRUE))
>> Error in cut.default(x, quantile(x, (0:10)/10, na.rm = TRUE), label =
>> FALSE,
>> :
>> 'breaks' are not unique
>>
>> My guess is that we now have 3 "zeros" in each column. For each
>> decile, we
>> cannot have more than 2 elements (total of 17 numbers in each column)
>> and I
>> believe R cannot determine where to put the third "zero". Do you have any
>> solution for this problem?
>>
>> Many thanks,
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/How-to-rank-matrix-data-by-deciles-tp2133496p2215945.html
>>
>> Sent from the R help mailing list archive at Nabble.com.
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.