yet another vectorization question

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

yet another vectorization question

Adrian Dusa-2
Dear R-helpers,

I'm trying to develop a function which specifies all possible expressions that
can be formed using a certain number of variables. For example, with three
variables A, B and C we can have
- presence/absence of A; B and C
- presence/absence of combinations of two of them
- presence/absence of all three

    A   B   C
1   0
2   1
3       0
4       1
5           0
6           1
7   0   0
8   0   1
9   1   0
10  1   1
11  0       0
12  0       1
13  1       0
14  1       1
15      0   0
16      0   1
17      1   0
18      1   1
19  0   0   0
20  0   0   1
21  0   1   0
22  0   1   1
23  1   0   0
24  1   0   1
25  1   1   0
26  1   1   1

My function (pasted below) while producing the desired result, still needs
some more vectorizing; in particular, I can't figure out how could one modify
the element of a matrix using apply on a different matrix...
To produce the above outcome, I use:
> all.expr(LETTERS[1:3])

"all.expr" <-
function(column.names) {
    ncolumns <- length(column.names)
    return.matrix <- matrix(NA, nrow=(3^ncolumns - 1), ncol=ncolumns)
    colnames(return.matrix) <- column.names
    rownames(return.matrix) <- 1:nrow(return.matrix)
    start.row <- 1
    all.combn <- sapply(1:ncolumns, function(idx) {
                                        as.matrix(combn(ncolumns, idx))
                                    }, simplify=FALSE)
    for (j in 1:length(all.combn)) {
        idk <- all.combn[[j]]
        tt <- matrix(NA, ncol=nrow(idk), nrow=2^nrow(idk))
        for (i in 1:nrow(idk)) {
            tt[,i] <- c(rep(0, 2^(nrow(idk) - i)), rep(1, 2^(nrow(idk) - i)))
        }

        ## This is _slow_ part, where I don't know how to vectorize:
        for (k in 1:ncol(idk)) {
            end.row <- start.row + nrow(tt) - 1
            return.matrix[start.row:end.row, idk[ , k]] <- tt
            start.row <- end.row + 1
        }
        ## How can one modify "return.matrix" using apply on "idk"?
    }
        return.matrix[is.na(return.matrix)] <- ""
        return.matrix
    }
}

Thank you in advance,
Adrian

--
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

dusadrian
Adrian DUSA <adi <at> roda.ro> writes:
>
> I'm trying to develop a function [...snip...]

Sorry for the traffic, I forgot to say that I'm using
library(combinat)
for the "combn" function...

Thank you,
Adrian

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Jacques VESLOT
In reply to this post by Adrian Dusa-2
this looks similar:
do.call(expand.grid,split(t(replicate(3,c(0,1,NA))),1:3))


Adrian DUSA a écrit :

>Dear R-helpers,
>
>I'm trying to develop a function which specifies all possible expressions that
>can be formed using a certain number of variables. For example, with three
>variables A, B and C we can have
>- presence/absence of A; B and C
>- presence/absence of combinations of two of them
>- presence/absence of all three
>
>    A   B   C
>1   0
>2   1
>3       0
>4       1
>5           0
>6           1
>7   0   0
>8   0   1
>9   1   0
>10  1   1
>11  0       0
>12  0       1
>13  1       0
>14  1       1
>15      0   0
>16      0   1
>17      1   0
>18      1   1
>19  0   0   0
>20  0   0   1
>21  0   1   0
>22  0   1   1
>23  1   0   0
>24  1   0   1
>25  1   1   0
>26  1   1   1
>
>My function (pasted below) while producing the desired result, still needs
>some more vectorizing; in particular, I can't figure out how could one modify
>the element of a matrix using apply on a different matrix...
>To produce the above outcome, I use:
>  
>
>>all.expr(LETTERS[1:3])
>>    
>>
>
>"all.expr" <-
>function(column.names) {
>    ncolumns <- length(column.names)
>    return.matrix <- matrix(NA, nrow=(3^ncolumns - 1), ncol=ncolumns)
>    colnames(return.matrix) <- column.names
>    rownames(return.matrix) <- 1:nrow(return.matrix)
>    start.row <- 1
>    all.combn <- sapply(1:ncolumns, function(idx) {
>                                        as.matrix(combn(ncolumns, idx))
>                                    }, simplify=FALSE)
>    for (j in 1:length(all.combn)) {
>        idk <- all.combn[[j]]
>        tt <- matrix(NA, ncol=nrow(idk), nrow=2^nrow(idk))
>        for (i in 1:nrow(idk)) {
>            tt[,i] <- c(rep(0, 2^(nrow(idk) - i)), rep(1, 2^(nrow(idk) - i)))
>        }
>
>        ## This is _slow_ part, where I don't know how to vectorize:
>        for (k in 1:ncol(idk)) {
>            end.row <- start.row + nrow(tt) - 1
>            return.matrix[start.row:end.row, idk[ , k]] <- tt
>            start.row <- end.row + 1
>        }
>        ## How can one modify "return.matrix" using apply on "idk"?
>    }
>        return.matrix[is.na(return.matrix)] <- ""
>        return.matrix
>    }
>}
>
>Thank you in advance,
>Adrian
>
>  
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Philippe Grosjean
Hello,

Not exactly the same. By the way, why do you use do.call()? Couldn't you
do simply:

expand.grid(split(t(replicate(3, c(0, 1, NA))), 1:3))

Best,

Philippe Grosjean


Jacques VESLOT wrote:

> this looks similar:
> do.call(expand.grid,split(t(replicate(3,c(0,1,NA))),1:3))
>
>
> Adrian DUSA a écrit :
>
>
>>Dear R-helpers,
>>
>>I'm trying to develop a function which specifies all possible expressions that
>>can be formed using a certain number of variables. For example, with three
>>variables A, B and C we can have
>>- presence/absence of A; B and C
>>- presence/absence of combinations of two of them
>>- presence/absence of all three
>>
>>   A   B   C
>>1   0
>>2   1
>>3       0
>>4       1
>>5           0
>>6           1
>>7   0   0
>>8   0   1
>>9   1   0
>>10  1   1
>>11  0       0
>>12  0       1
>>13  1       0
>>14  1       1
>>15      0   0
>>16      0   1
>>17      1   0
>>18      1   1
>>19  0   0   0
>>20  0   0   1
>>21  0   1   0
>>22  0   1   1
>>23  1   0   0
>>24  1   0   1
>>25  1   1   0
>>26  1   1   1
>>
>>My function (pasted below) while producing the desired result, still needs
>>some more vectorizing; in particular, I can't figure out how could one modify
>>the element of a matrix using apply on a different matrix...
>>To produce the above outcome, I use:
>>
>>
>>
>>>all.expr(LETTERS[1:3])
>>>  
>>>
>>
>>"all.expr" <-
>>function(column.names) {
>>   ncolumns <- length(column.names)
>>   return.matrix <- matrix(NA, nrow=(3^ncolumns - 1), ncol=ncolumns)
>>   colnames(return.matrix) <- column.names
>>   rownames(return.matrix) <- 1:nrow(return.matrix)
>>   start.row <- 1
>>   all.combn <- sapply(1:ncolumns, function(idx) {
>>                                       as.matrix(combn(ncolumns, idx))
>>                                   }, simplify=FALSE)
>>   for (j in 1:length(all.combn)) {
>>       idk <- all.combn[[j]]
>>       tt <- matrix(NA, ncol=nrow(idk), nrow=2^nrow(idk))
>>       for (i in 1:nrow(idk)) {
>>           tt[,i] <- c(rep(0, 2^(nrow(idk) - i)), rep(1, 2^(nrow(idk) - i)))
>>       }
>>
>>       ## This is _slow_ part, where I don't know how to vectorize:
>>       for (k in 1:ncol(idk)) {
>>           end.row <- start.row + nrow(tt) - 1
>>           return.matrix[start.row:end.row, idk[ , k]] <- tt
>>           start.row <- end.row + 1
>>       }
>>       ## How can one modify "return.matrix" using apply on "idk"?
>>   }
>>       return.matrix[is.na(return.matrix)] <- ""
>>       return.matrix
>>   }
>>}
>>
>>Thank you in advance,
>>Adrian
>>
>>
>>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Adrian Dusa-2
On Monday 30 January 2006 14:40, Philippe Grosjean wrote:

> Hello,
> Not exactly the same. By the way, why do you use do.call()? Couldn't you
> do simply:
> expand.grid(split(t(replicate(3, c(0, 1, NA))), 1:3))
> Best,
> Philippe Grosjean
>
> Jacques VESLOT wrote:
> > this looks similar:
> > do.call(expand.grid,split(t(replicate(3,c(0,1,NA))),1:3))

Sigh, what a pity. It is indeed not the same...
So close to a one-liner though.

I come back to my original question: is it possible to modify the content of a
matrix, using apply on a different matrix?
In my original function, the slow part is:
## ...
for (k in 1:ncol(idk)) {
   end.row <- start.row + nrow(tt) - 1
   return.matrix[start.row:end.row, idk[ , k]] <- tt
   start.row <- end.row + 1
}
## ...

I'd like to use apply on the "idk" matrix (to get rid of the for loop) and
write the contents of "tt" in the "result.matrix"...

Best,
Adrian

--
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Adrian Dusa-2
In reply to this post by Philippe Grosjean
On Monday 30 January 2006 14:40, Philippe Grosjean wrote:
> Hello,
> Not exactly the same. By the way, why do you use do.call()? Couldn't you
> do simply:
> expand.grid(split(t(replicate(3, c(0, 1, NA))), 1:3))

Just for the sake of it, the above can be even more simple with:

expand.grid(lapply(1:3, function(x) c(0, 1, NA)))

Best,
Adrian

--
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Patrick Burns
I tried to let this pass, but failed:

lapply(1:3, function(x) c(0, 1, NA))

might more clearly be written as

rep(list(c(0, 1, NA)), 3)



Patrick Burns
[hidden email]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Adrian Dusa wrote:

>On Monday 30 January 2006 14:40, Philippe Grosjean wrote:
>  
>
>>Hello,
>>Not exactly the same. By the way, why do you use do.call()? Couldn't you
>>do simply:
>>expand.grid(split(t(replicate(3, c(0, 1, NA))), 1:3))
>>    
>>
>
>Just for the sake of it, the above can be even more simple with:
>
>expand.grid(lapply(1:3, function(x) c(0, 1, NA)))
>
>Best,
>Adrian
>
>  
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Adrian Dusa-2
On Monday 30 January 2006 21:44, Patrick Burns wrote:
> I tried to let this pass, but failed:
>
> lapply(1:3, function(x) c(0, 1, NA))
>
> might more clearly be written as
>
> rep(list(c(0, 1, NA)), 3)

Indeed! Excellent, thanks :)

Hmm, I was just thinking perhaps my first example was too cluttered to spot an
immediate solution.
With your permission, I came up with a simpler example (I hope I don't upset
anybody being too persistent):

set.seed(5)
aa <- matrix(sample(10, 15, replace=T), ncol=5)
bb <- matrix(NA, ncol=10, nrow=5)
for (i in 1:ncol(aa)) bb[i, aa[, i]] <- c(0, 1, 0)

Is there any possibility to vectorize this "for" loop?
(sometimes I have hundreds of columns in the "aa" matrix)

Many big thanks in advance,
Adrian

--
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Patricia Hawkins
>>>>> "AD" == Adrian Dusa <[hidden email]> writes:

AD> set.seed(5)
AD> aa <- matrix(sample(10, 15, replace=T), ncol=5)
AD> bb <- matrix(NA, ncol=10, nrow=5)
AD> for (i in 1:ncol(aa)) bb[i, aa[, i]] <- c(0, 1, 0)

AD> Is there any possibility to vectorize this "for" loop?
AD> (sometimes I have hundreds of columns in the "aa" matrix)

Well, coming from ignorance of R, I came up with the below.  However,
it means creating another vector that's the size of aa, so it's not
clear that it's a win:

#Problem:  Indexing bb correctly when vectorized
#Solution:  Add the following matrix to aa:
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    0   10   20   30   40
#[2,]    0   10   20   30   40
#[3,]    0   10   20   30   40
#
# or its vector equivalent:
#
# rep(0:(ncol(aa)-1)*ncol(bb), each=nrow(aa))
# > [1]  0  0  0 10 10 10 20 20 20 30 30 30 40 40 40

bb <- matrix(1:50, ncol=10, nrow=5, byrow=TRUE)
bv <- as.vector(bb)
ai <- as.vector(aa) + rep(0:4*10, each=3)
bv[ai] <- c(0,1,0)
bb <- matrix(bv, ncol=10, nrow=5, byrow=TRUE)
bb

#which generalizes to:

bb <- matrix(1:50, ncol=10, nrow=5, byrow=TRUE)
bv <- as.vector(bb)
ai <- as.vector(aa) + rep((1:nrow(aa)-1)*10, each=3)
bv[ai] <- c(0,1,0)
bb <- matrix(bv, ncol=10, nrow=5, byrow=TRUE)
bb



--
Patricia J. Hawkins
Hawkins Internet Applications
www.hawkinsia.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Gabor Grothendieck
On 1/30/06, Patricia J. Hawkins <[hidden email]> wrote:

> >>>>> "AD" == Adrian Dusa <[hidden email]> writes:
>
> AD> set.seed(5)
> AD> aa <- matrix(sample(10, 15, replace=T), ncol=5)
> AD> bb <- matrix(NA, ncol=10, nrow=5)
> AD> for (i in 1:ncol(aa)) bb[i, aa[, i]] <- c(0, 1, 0)
>
> AD> Is there any possibility to vectorize this "for" loop?
> AD> (sometimes I have hundreds of columns in the "aa" matrix)
>
> Well, coming from ignorance of R, I came up with the below.  However,
> it means creating another vector that's the size of aa, so it's not
> clear that it's a win:
>
> #Problem:  Indexing bb correctly when vectorized
> #Solution:  Add the following matrix to aa:
> #     [,1] [,2] [,3] [,4] [,5]
> #[1,]    0   10   20   30   40
> #[2,]    0   10   20   30   40
> #[3,]    0   10   20   30   40
> #
> # or its vector equivalent:
> #
> # rep(0:(ncol(aa)-1)*ncol(bb), each=nrow(aa))
> # > [1]  0  0  0 10 10 10 20 20 20 30 30 30 40 40 40
>
> bb <- matrix(1:50, ncol=10, nrow=5, byrow=TRUE)
> bv <- as.vector(bb)
> ai <- as.vector(aa) + rep(0:4*10, each=3)
> bv[ai] <- c(0,1,0)
> bb <- matrix(bv, ncol=10, nrow=5, byrow=TRUE)
> bb
>
> #which generalizes to:
>
> bb <- matrix(1:50, ncol=10, nrow=5, byrow=TRUE)
> bv <- as.vector(bb)
> ai <- as.vector(aa) + rep((1:nrow(aa)-1)*10, each=3)
> bv[ai] <- c(0,1,0)
> bb <- matrix(bv, ncol=10, nrow=5, byrow=TRUE)
> bb

Try this:

bb <- matrix(NA, ncol=10, nrow=5)
bb[cbind(c(col(aa)), c(aa))] <- c(0,1,0)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: yet another vectorization question

Adrian Dusa-2
On Tuesday 31 January 2006 06:55, Gabor Grothendieck wrote:

> On 1/30/06, Patricia J. Hawkins <[hidden email]> wrote:
> > [...snip...]
> >
> > #which generalizes to:
> >
> > bb <- matrix(1:50, ncol=10, nrow=5, byrow=TRUE)
> > bv <- as.vector(bb)
> > ai <- as.vector(aa) + rep((1:nrow(aa)-1)*10, each=3)
> > bv[ai] <- c(0,1,0)
> > bb <- matrix(bv, ncol=10, nrow=5, byrow=TRUE)
> > bb
>
> Try this:
>
> bb <- matrix(NA, ncol=10, nrow=5)
> bb[cbind(c(col(aa)), c(aa))] <- c(0,1,0)

Thank you very much both, I had a very good time exercising your solutions.
The "col" fuction especially is useful (and insightful).
I wrote a working solution based on this type of matrix indexing, which is...
unfortunately... _slower_ than the "for" loop, especially in large loops.
It seems that creating the necessary row and column indexes to cbind is much
slower than copying chunks of data at certain columns:

> library(combinat) # for the combn function
> system.time(all.expr(LETTERS[1:12]))
[1] 6.12 0.39 6.54 0.00 0.00
> system.time(all.expr2(LETTERS[1:12]))
[1] 8.62 0.27 8.91 0.00 0.00

If anyone interested, I uploaded both functions here:
http://www.roda.ro/all.expr.R

Thank you,
Adrian

--
Adrian DUSA
Arhiva Romana de Date Sociale
Bd. Schitu Magureanu nr.1
050025 Bucuresti sectorul 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html