Dear R-devs,
Below is a small example of what I am trying to achieve, that is trivial in R and I would like to learn how to do in C, for very large matrices: > (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2)) [,1] [,2] [,3] [1,] 1 0 2 [2,] 0 2 1 And I would like to produce: [1] "a*C" "B*c" Which can be trivially done in R via something like: foo <- function(mymat, colnms, tilde = FALSE) { apply(mymat, 1, function(x) { if (tilde) { colnms[x == 1] <- paste0("~", colnms[x == 1]) } else { colnms[x == 1] <- tolower(colnms[x == 1]) } paste(colnms[x > 0], collapse = "*") }) } > foo(mymat, LETTERS[1:3]) [1] "a*C" "B*c" > foo(mymat, LETTERS[1:3], tilde = TRUE) [1] "~A*C" "B*~C" I know that strings in C are far from trivial (encodings being one important issue), and this is the sort of thing much easier to do in R. On the other hand I found that, for a large matrix of say 1 million rows and 25 columns, setting the rownames of colnames in R copies the matrix and costs a lot of memory and time in the process. Having all necessary headers in C, the solution I came up with involves calling the function foo() from within C: SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) { SEXP call = PROTECT(LCONS(install("foo"), LCONS(mymat, LCONS(colnms, LCONS(tilde, R_NilValue))))); SEXP out = PROTECT(eval(call, R_GlobalEnv)); UNPROTECT(2); return(out); } After compilation, say in a file called test.c, back in R I get: > dyn.load("test.so") > .Call("test", mymat, LETTERS[1:3], FALSE) [1] "a*C" "B*c" > .Call("test", mymat, LETTERS[1:3], TRUE) [1] "~A*C" "B*~C" In my real situation, the matrix I am working on is produced in the C code (and it's much larger). I don't know for sure, when calling the R function foo(), if the matrix is copied: if not, this might be the best solution for me. Otherwise I know there is a function do_paste() in C, and wondered whether I could use that directly instead of calling R from C. I hope this explains what I would like to do, many thanks in advance for any hint, Adrian -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr. 90-92 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
To do this in C, it would probably be easier and faster to just do the
string manipulation directly. Luckily, there are already packages that have done this for you. See an example below using the S4Vectors package. foo2 <- function(mymat, colnms, tilde=FALSE) { chars <- colnms[col(mymat)] lowerChars <- if (tilde) paste0("~", chars) else tolower(chars) chars <- ifelse(mymat==1L, lowerChars, chars) keep <- mymat > 0L charList <- split(chars[keep], row(chars)[keep]) S4Vectors::unstrsplit(charList, "*") } Or perhaps more efficient using the compressed lists of IRanges: foo3 <- function(mymat, colnms, tilde=FALSE) { mymat <- t(mymat) chars <- colnms[row(mymat)] lowerChars <- if (tilde) paste0("~", chars) else tolower(chars) chars <- ifelse(mymat==1L, lowerChars, chars) keep <- mymat > 0L charList <- IRanges::splitAsList(chars[keep], col(chars)[keep]) S4Vectors::unstrsplit(charList, "*") } Michael On Tue, Jun 27, 2017 at 5:29 AM, Adrian Dușa <[hidden email]> wrote: > Dear R-devs, > > Below is a small example of what I am trying to achieve, that is trivial in > R and I would like to learn how to do in C, for very large matrices: > >> (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2)) > [,1] [,2] [,3] > [1,] 1 0 2 > [2,] 0 2 1 > > And I would like to produce: > [1] "a*C" "B*c" > > > Which can be trivially done in R via something like: > > foo <- function(mymat, colnms, tilde = FALSE) { > apply(mymat, 1, function(x) { > if (tilde) { > colnms[x == 1] <- paste0("~", colnms[x == 1]) > } else { > colnms[x == 1] <- tolower(colnms[x == 1]) > } > paste(colnms[x > 0], collapse = "*") > }) > } > >> foo(mymat, LETTERS[1:3]) > [1] "a*C" "B*c" > >> foo(mymat, LETTERS[1:3], tilde = TRUE) > [1] "~A*C" "B*~C" > > > I know that strings in C are far from trivial (encodings being one > important issue), and this is the sort of thing much easier to do in R. On > the other hand I found that, for a large matrix of say 1 million rows and > 25 columns, setting the rownames of colnames in R copies the matrix and > costs a lot of memory and time in the process. > > Having all necessary headers in C, the solution I came up with involves > calling the function foo() from within C: > > SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) { > > SEXP call = PROTECT(LCONS(install("foo"), > LCONS(mymat, > LCONS(colnms, > LCONS(tilde, R_NilValue))))); > > SEXP out = PROTECT(eval(call, R_GlobalEnv)); > > UNPROTECT(2); > return(out); > } > > > After compilation, say in a file called test.c, back in R I get: > >> dyn.load("test.so") > >> .Call("test", mymat, LETTERS[1:3], FALSE) > [1] "a*C" "B*c" > >> .Call("test", mymat, LETTERS[1:3], TRUE) > [1] "~A*C" "B*~C" > > > In my real situation, the matrix I am working on is produced in the C code > (and it's much larger). > I don't know for sure, when calling the R function foo(), if the matrix is > copied: if not, this might be the best solution for me. > > Otherwise I know there is a function do_paste() in C, and wondered whether > I could use that directly instead of calling R from C. > > I hope this explains what I would like to do, many thanks in advance for > any hint, > Adrian > > -- > Adrian Dusa > University of Bucharest > Romanian Social Data Archive > Soseaua Panduri nr. 90-92 > 050663 Bucharest sector 5 > Romania > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi Michael,
On Tue, Jun 27, 2017 at 5:31 PM, Michael Lawrence <[hidden email]> wrote: > > To do this in C, it would probably be easier and faster to just do the > string manipulation directly. Luckily, there are already packages that > have done this for you. See an example below using the S4Vectors > package. Thank you for your reply. The goal is to obtain those strings in C, not in R... The result is returned to R just for printing purposes, but I need the strings in C. And if at all possible, using C functions directly rather than calling R functions from C. My first instinct is to try using the do_paste() C function in the src/main/ directory from the R sources, but I haven't been successful so far. Adrian -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr. 90-92 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |