paste strings in C

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

paste strings in C

Adrian Dușa
Dear R-devs,

Below is a small example of what I am trying to achieve, that is trivial in
R and I would like to learn how to do in C, for very large matrices:

> (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2))
     [,1] [,2] [,3]
[1,]    1    0    2
[2,]    0    2    1

And I would like to produce:
[1] "a*C" "B*c"


Which can be trivially done in R via something like:

foo <- function(mymat, colnms, tilde = FALSE) {
    apply(mymat, 1, function(x) {
        if (tilde) {
            colnms[x == 1] <- paste0("~", colnms[x == 1])
        } else {
            colnms[x == 1] <- tolower(colnms[x == 1])
        }
        paste(colnms[x > 0], collapse = "*")
    })
}

> foo(mymat, LETTERS[1:3])
[1] "a*C" "B*c"

> foo(mymat, LETTERS[1:3], tilde = TRUE)
[1] "~A*C" "B*~C"


I know that strings in C are far from trivial (encodings being one
important issue), and this is the sort of thing much easier to do in R. On
the other hand I found that, for a large matrix of say 1 million rows and
25 columns, setting the rownames of colnames in R copies the matrix and
costs a lot of memory and time in the process.

Having all necessary headers in C, the solution I came up with involves
calling the function foo() from within C:

SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) {

    SEXP call = PROTECT(LCONS(install("foo"),
                        LCONS(mymat,
                        LCONS(colnms,
                        LCONS(tilde, R_NilValue)))));

    SEXP out = PROTECT(eval(call, R_GlobalEnv));

    UNPROTECT(2);
    return(out);
}


After compilation, say in a file called test.c, back in R I get:

> dyn.load("test.so")

> .Call("test", mymat, LETTERS[1:3], FALSE)
[1] "a*C" "B*c"

> .Call("test", mymat, LETTERS[1:3], TRUE)
[1] "~A*C" "B*~C"


In my real situation, the matrix I am working on is produced in the C code
(and it's much larger).
I don't know for sure, when calling the R function foo(), if the matrix is
copied: if not, this might be the best solution for me.

Otherwise I know there is a function do_paste() in C, and wondered whether
I could use that directly instead of calling R from C.

I hope this explains what I would like to do, many thanks in advance for
any hint,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: paste strings in C

Michael Lawrence-3
To do this in C, it would probably be easier and faster to just do the
string manipulation directly. Luckily, there are already packages that
have done this for you. See an example below using the S4Vectors
package.

foo2 <- function(mymat, colnms, tilde=FALSE) {
    chars <- colnms[col(mymat)]
    lowerChars <- if (tilde) paste0("~", chars) else tolower(chars)
    chars <- ifelse(mymat==1L, lowerChars, chars)
    keep <- mymat > 0L
    charList <- split(chars[keep], row(chars)[keep])
    S4Vectors::unstrsplit(charList, "*")
}

Or perhaps more efficient using the compressed lists of IRanges:

foo3 <- function(mymat, colnms, tilde=FALSE) {
    mymat <- t(mymat)
    chars <- colnms[row(mymat)]
    lowerChars <- if (tilde) paste0("~", chars) else tolower(chars)
    chars <- ifelse(mymat==1L, lowerChars, chars)
    keep <- mymat > 0L
    charList <- IRanges::splitAsList(chars[keep], col(chars)[keep])
    S4Vectors::unstrsplit(charList, "*")
}

Michael

On Tue, Jun 27, 2017 at 5:29 AM, Adrian Dușa <[hidden email]> wrote:

> Dear R-devs,
>
> Below is a small example of what I am trying to achieve, that is trivial in
> R and I would like to learn how to do in C, for very large matrices:
>
>> (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2))
>      [,1] [,2] [,3]
> [1,]    1    0    2
> [2,]    0    2    1
>
> And I would like to produce:
> [1] "a*C" "B*c"
>
>
> Which can be trivially done in R via something like:
>
> foo <- function(mymat, colnms, tilde = FALSE) {
>     apply(mymat, 1, function(x) {
>         if (tilde) {
>             colnms[x == 1] <- paste0("~", colnms[x == 1])
>         } else {
>             colnms[x == 1] <- tolower(colnms[x == 1])
>         }
>         paste(colnms[x > 0], collapse = "*")
>     })
> }
>
>> foo(mymat, LETTERS[1:3])
> [1] "a*C" "B*c"
>
>> foo(mymat, LETTERS[1:3], tilde = TRUE)
> [1] "~A*C" "B*~C"
>
>
> I know that strings in C are far from trivial (encodings being one
> important issue), and this is the sort of thing much easier to do in R. On
> the other hand I found that, for a large matrix of say 1 million rows and
> 25 columns, setting the rownames of colnames in R copies the matrix and
> costs a lot of memory and time in the process.
>
> Having all necessary headers in C, the solution I came up with involves
> calling the function foo() from within C:
>
> SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) {
>
>     SEXP call = PROTECT(LCONS(install("foo"),
>                         LCONS(mymat,
>                         LCONS(colnms,
>                         LCONS(tilde, R_NilValue)))));
>
>     SEXP out = PROTECT(eval(call, R_GlobalEnv));
>
>     UNPROTECT(2);
>     return(out);
> }
>
>
> After compilation, say in a file called test.c, back in R I get:
>
>> dyn.load("test.so")
>
>> .Call("test", mymat, LETTERS[1:3], FALSE)
> [1] "a*C" "B*c"
>
>> .Call("test", mymat, LETTERS[1:3], TRUE)
> [1] "~A*C" "B*~C"
>
>
> In my real situation, the matrix I am working on is produced in the C code
> (and it's much larger).
> I don't know for sure, when calling the R function foo(), if the matrix is
> copied: if not, this might be the best solution for me.
>
> Otherwise I know there is a function do_paste() in C, and wondered whether
> I could use that directly instead of calling R from C.
>
> I hope this explains what I would like to do, many thanks in advance for
> any hint,
> Adrian
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr. 90-92
> 050663 Bucharest sector 5
> Romania
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: paste strings in C

Adrian Dușa
Hi Michael,

On Tue, Jun 27, 2017 at 5:31 PM, Michael Lawrence <[hidden email]>
wrote:
>
> To do this in C, it would probably be easier and faster to just do the
> string manipulation directly. Luckily, there are already packages that
> have done this for you. See an example below using the S4Vectors
> package.

Thank you for your reply.
The goal is to obtain those strings in C, not in R...

The result is returned to R just for printing purposes, but I need the
strings in C.
And if at all possible, using C functions directly rather than calling R
functions from C.

My first instinct is to try using the do_paste() C function in the
src/main/ directory from the R sources, but I haven't been successful so
far.
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...