(no subject)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

(no subject)

stefano iacus
Suppose X is a data.frame with n obs and k vars, all variables are  
factors.

tab <- table(X)

containes a k-dim array

I would like to get a list from tab. This list is such that, each  
element contain the indexes corresponding to the observations which  
are in the same cell of this k-dim array. Of course, only for non  
empty cell.

E.g.

 > set.seed(123)
 > X <- as.data.frame(matrix(rnorm(5000),100,5))
 > X$V1 <- cut(X$V1, br=5)
 > X$V2 <- cut(X$V2, br=5)
 > X$V3 <- cut(X$V3, br=5)
 > X$V4 <- cut(X$V4, br=5)
 > X$V5 <- cut(X$V5, br=5)
 > tab <- table(X)
 > which(tab>0) -> cells
 > length(cells)
[1] 94

thus, of course, 94 cells over 5^5 = 3125 are non empty.
I would like a smart way (without reimplementing table/tabulate) to  
get the list of length 94 which contains the indexes of the obs in  
each cell
Or, viceversa, a vector of length n which tells, observation by  
observation,  which cell (out of the 3125) the observation is in.
stefano

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Cutting up a k-D space (no subject)

Brian Ripley
Stefano,

Try this

XX <- as.numeric(X[[1]])
for (i in 2:length(X)) XX <- 10*XX + as.numeric(X[[i]])
split(seq(along=XX), XX)

You can read off the cell from the decimal expansion of the label.
And XX goes from observations to cells.

The hard work is done by unique() under the skin (split makes XX into a
factor).

Brian

On Wed, 1 Feb 2006, stefano iacus wrote:

> Suppose X is a data.frame with n obs and k vars, all variables are
> factors.
>
> tab <- table(X)
>
> containes a k-dim array
>
> I would like to get a list from tab. This list is such that, each
> element contain the indexes corresponding to the observations which
> are in the same cell of this k-dim array. Of course, only for non
> empty cell.
>
> E.g.
>
> > set.seed(123)
> > X <- as.data.frame(matrix(rnorm(5000),100,5))
> > X$V1 <- cut(X$V1, br=5)
> > X$V2 <- cut(X$V2, br=5)
> > X$V3 <- cut(X$V3, br=5)
> > X$V4 <- cut(X$V4, br=5)
> > X$V5 <- cut(X$V5, br=5)
> > tab <- table(X)
> > which(tab>0) -> cells
> > length(cells)
> [1] 94
>
> thus, of course, 94 cells over 5^5 = 3125 are non empty.
> I would like a smart way (without reimplementing table/tabulate) to
> get the list of length 94 which contains the indexes of the obs in
> each cell
> Or, viceversa, a vector of length n which tells, observation by
> observation,  which cell (out of the 3125) the observation is in.
> stefano
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

stefano iacus
In reply to this post by stefano iacus
Apologizies, I forgot the subject.
Btw, I found it
stefano

Il giorno 01/feb/06, alle ore 18:25, stefano iacus ha scritto:

> Suppose X is a data.frame with n obs and k vars, all variables are
> factors.
>
> tab <- table(X)
>
> containes a k-dim array
>
> I would like to get a list from tab. This list is such that, each
> element contain the indexes corresponding to the observations which
> are in the same cell of this k-dim array. Of course, only for non
> empty cell.
>
> E.g.
>
>> set.seed(123)
>> X <- as.data.frame(matrix(rnorm(5000),100,5))
>> X$V1 <- cut(X$V1, br=5)
>> X$V2 <- cut(X$V2, br=5)
>> X$V3 <- cut(X$V3, br=5)
>> X$V4 <- cut(X$V4, br=5)
>> X$V5 <- cut(X$V5, br=5)
>> tab <- table(X)
>> which(tab>0) -> cells
>> length(cells)
> [1] 94
>
> thus, of course, 94 cells over 5^5 = 3125 are non empty.
> I would like a smart way (without reimplementing table/tabulate) to
> get the list of length 94 which contains the indexes of the obs in
> each cell
> Or, viceversa, a vector of length n which tells, observation by
> observation,  which cell (out of the 3125) the observation is in.
> stefano
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Cutting up a k-D space (no subject)

stefano iacus
In reply to this post by Brian Ripley
Thanks Brian,
stefano
Il giorno 01/feb/06, alle ore 19:00, Prof Brian Ripley ha scritto:

> Stefano,
>
> Try this
>
> XX <- as.numeric(X[[1]])
> for (i in 2:length(X)) XX <- 10*XX + as.numeric(X[[i]])
> split(seq(along=XX), XX)
>
> You can read off the cell from the decimal expansion of the label.
> And XX goes from observations to cells.
>
> The hard work is done by unique() under the skin (split makes XX  
> into a factor).
>
> Brian
>
> On Wed, 1 Feb 2006, stefano iacus wrote:
>
>> Suppose X is a data.frame with n obs and k vars, all variables are
>> factors.
>>
>> tab <- table(X)
>>
>> containes a k-dim array
>>
>> I would like to get a list from tab. This list is such that, each
>> element contain the indexes corresponding to the observations which
>> are in the same cell of this k-dim array. Of course, only for non
>> empty cell.
>>
>> E.g.
>>
>> > set.seed(123)
>> > X <- as.data.frame(matrix(rnorm(5000),100,5))
>> > X$V1 <- cut(X$V1, br=5)
>> > X$V2 <- cut(X$V2, br=5)
>> > X$V3 <- cut(X$V3, br=5)
>> > X$V4 <- cut(X$V4, br=5)
>> > X$V5 <- cut(X$V5, br=5)
>> > tab <- table(X)
>> > which(tab>0) -> cells
>> > length(cells)
>> [1] 94
>>
>> thus, of course, 94 cells over 5^5 = 3125 are non empty.
>> I would like a smart way (without reimplementing table/tabulate) to
>> get the list of length 94 which contains the indexes of the obs in
>> each cell
>> Or, viceversa, a vector of length n which tells, observation by
>> observation,  which cell (out of the 3125) the observation is in.
>> stefano
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel