subset data.frame at C level

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

subset data.frame at C level

MorganMorgan
Hi,

Hope you are well.

I was wondering if there is a function at C level that is equivalent to
mtcars$carb or .subset2(mtcars, "carb").

If I have the index of the column then the answer would be VECTOR_ELT(df,
asInteger(idx)) but I was wondering if there is a way to do it directly
from the name of the column without having to loop over columns names to
find the index?

Thank you
Best regards
Morgan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: subset data.frame at C level

Jim Hester
It looks to me like internally .subset2 uses `get1index()`, but this
function is declared in Defn.h, which AFAIK is not part of the exported R
API.

 Looking at the code for `get1index()` it looks like it just loops over the
(translated) names, so I guess I just do that [0].

[0]:
https://github.com/r-devel/r-svn/blob/1ff1d4197495a6ee1e1d88348a03ff841fd27608/src/main/subscript.c#L226-L235

On Wed, Jun 17, 2020 at 6:11 AM Morgan Morgan <[hidden email]>
wrote:

> Hi,
>
> Hope you are well.
>
> I was wondering if there is a function at C level that is equivalent to
> mtcars$carb or .subset2(mtcars, "carb").
>
> If I have the index of the column then the answer would be VECTOR_ELT(df,
> asInteger(idx)) but I was wondering if there is a way to do it directly
> from the name of the column without having to loop over columns names to
> find the index?
>
> Thank you
> Best regards
> Morgan
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: subset data.frame at C level

MorganMorgan
Thank you Jim for the feedback.

I actually implemented it the way I describe it in my first email and it
seems fast enough for me.

Just to give a bit of context I will need it at some point in package kit.
I also implemented subset by row which I actually need more as I am working
on a faster version of the unique and duplicated function. The function
unique is particularly slow for data.frame. So far I got a 100x speedup.

Best regards
Morgan


On Tue, 23 Jun 2020, 21:11 Jim Hester, <[hidden email]> wrote:

> It looks to me like internally .subset2 uses `get1index()`, but this
> function is declared in Defn.h, which AFAIK is not part of the exported R
> API.
>
>  Looking at the code for `get1index()` it looks like it just loops over
> the (translated) names, so I guess I just do that [0].
>
> [0]:
> https://github.com/r-devel/r-svn/blob/1ff1d4197495a6ee1e1d88348a03ff841fd27608/src/main/subscript.c#L226-L235
>
> On Wed, Jun 17, 2020 at 6:11 AM Morgan Morgan <[hidden email]>
> wrote:
>
>> Hi,
>>
>> Hope you are well.
>>
>> I was wondering if there is a function at C level that is equivalent to
>> mtcars$carb or .subset2(mtcars, "carb").
>>
>> If I have the index of the column then the answer would be VECTOR_ELT(df,
>> asInteger(idx)) but I was wondering if there is a way to do it directly
>> from the name of the column without having to loop over columns names to
>> find the index?
>>
>> Thank you
>> Best regards
>> Morgan
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel