What is the best way to loop over an ALTREP vector?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the best way to loop over an ALTREP vector?

Wang Jiefei
Hi devel team,

I'm working on C/C++ level ALTREP compatibility for a package. The package
previously used pointers to access the data of a SEXP, so it would not work
for some ALTREP objects which do not have a pointer. I plan to rewrite the
code and use functions like get_elt, get_region, and get_subset to access
the values of a vector, so I have a few questions for ALTREP:

1. Since an ALTREP do not have to define all of the above
functions(element, region, subset), is there any way to check which
function has been defined for an ALTREP class? I did a search on
RInternal.h and altrep.c but did not find a solution for it. If not, will
it be added in the future?

2. Given the diversity of ALTREP classes, what is the best way to loop over
an ALTREP object? I hope there can be an all-in-one function which can get
the values from a vector as long as at least one of the above functions has
been defined, so package developers would not be bothered by tons of
`if-else` statement if they want their package to work with ALTREP. Since
it seems like there is no such function exist, what could be the best way
to do the loop under the current R version?

Best,
Jiefei

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Gabriel Becker-2
Jiefei,

I've been meaning to write up something about this so hopefully this will
be an impetus for me to actually do that, but until then, responses inline.


On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <[hidden email]> wrote:

> Hi devel team,
>
> I'm working on C/C++ level ALTREP compatibility for a package. The package
> previously used pointers to access the data of a SEXP, so it would not work
> for some ALTREP objects which do not have a pointer. I plan to rewrite the
> code and use functions like get_elt, get_region, and get_subset to access
> the values of a vector, so I have a few questions for ALTREP:
>
> 1. Since an ALTREP do not have to define all of the above
> functions(element, region, subset), is there any way to check which
> function has been defined for an ALTREP class? I did a search on
> RInternal.h and altrep.c but did not find a solution for it. If not, will
> it be added in the future?
>

Element and region are guaranteed to always be defined and work (for altrep
and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't have
region for STRSXP or VECSXP, I believe). If the altrep class does not
provide them then default methods will be used, which may be inefficient in
some cases but will work. Subset is currently a forward looking stub, but
once implimented, that will also be guaranteed to work for all valid ALTREP
classes.


>
> 2. Given the diversity of ALTREP classes, what is the best way to loop over
> an ALTREP object? I hope there can be an all-in-one function which can get
> the values from a vector as long as at least one of the above functions has
> been defined, so package developers would not be bothered by tons of
> `if-else` statement if they want their package to work with ALTREP. Since
> it seems like there is no such function exist, what could be the best way
> to do the loop under the current R version?
>

The best way to loop over all SEXPs, which supports both altrep and
nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
number of released versions, at least since 3.5.0 I think) and the much
newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
R_exts/Itermacros.h

The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).


   - sx - C level variable name of the SEXP to iterate over
   - px - variable name to use for the pointer populated with data from a
   region of sx
   - idx - variable name to use for the "outer", batch counter in the for
   loop. This will contain the 0-indexed start position of the batch you're
   currently processing
   - nb - variable name to use for the current batch size. This will always
   either be GET_REGION_BUFFSIZE (512), or the number of elements remaining in
   the vector, whichever is smaller
   - etype - element (C) type, e.g., int, double, of the data
   - vtype - vector (access API) type, e.g, INTEGER, REAL
   - strt - the 0-indexed position in the vector to start iterating
   - nfull - the total number oif elements to iterate over from the vector
   - expr - the code to process a single batch (Which will do things to px,
   typically)


So code to perform badly implemented not good idea summing of REALSXP data
might look like

double sillysum(SEXP x) {

    double s = 0.0;

    ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
        {

            for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
        })

     return s;
}

For meatier examples of ITERATE_BY_REGION's use in practice you can grep
the R sources. I know it is used in the implementations of the various
C-level summaries (summary.c), print and formatting functions, and anyNA.

Some things to remember

   - If you have an inner loop like the one above, your total position in
   the original vector is ind + i
   - ITERATE_BY_REGION always processes the whole vector, if you need to
   only do part of it yo'll either need custom breaking for both inner and
   outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
   - Don't use the variants ending in 0, all they do is skip over things
   that are a good idea in the case of non-altreps (and some very specific
   altreps).

Hope that helps.

Best,
~G












> Best,
> Jiefei
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Wang Jiefei
Hi Gabriel,

I have tried the macro and found a small issue, it seems like the macro is
written in C and does an implicit type conversion(const void * to const int
*), see below. While it is allowed in C, C++ seems not happy with it. Is it
possible to add an explicit type casting so that it can be compatible with
both language?


#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \

                             strt, nfull, expr) do {         \

       *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
\

       if (px != NULL) {                                      \

           R_xlen_t __ibr_n__ = strt + nfull;                        \

           R_xlen_t nb = __ibr_n__;                                  \

           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \

              expr                                            \

            }                                                 \

       }                                                      \

       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\

                                   strt, nfull, expr);        \

    } while (0)


  Also, I notice that the element type(etype) and vector type(vtype) has to
be specified in the macro. Since the SEXP is the first argument in the
macro, it seems redundant to define etype and vtype for they have to match
the type of the SEXP. I'm wondering if this is intentional? Will there be a
type-free macro in R in the future? Here is a simple type-free macro I'm
using.

#define type_free_iter(sx, ptr, ind, nbatch,expr)\

switch(TYPEOF(sx)){\

case INTSXP:\

       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\

       break; \

case REALSXP:\

       ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\

       break; \

case LGLSXP:\

       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\

       break; \

default:\

       Rf_error("Unknow data type\n"); \

       break; \

}



// [[Rcpp::export]]

double sillysum(SEXP x) {

       double s = 0.0;

       type_free_iter(x, ptr, ind, nbatch,

              {

                     for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }

              });

              return s;

}




Best,

Jiefei

On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <[hidden email]> wrote:

> Thank you, Gabriel. The loop macro is very helpful. It is also exciting to
> see that there are lots of changes in ALTREP in R devel version. I really
> appreciate your help!
>
> Best,
> Jiefei
>
> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <[hidden email]>
> wrote:
>
>> Jiefei,
>>
>> I've been meaning to write up something about this so hopefully this will
>> be an impetus for me to actually do that, but until then, responses inline.
>>
>>
>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <[hidden email]> wrote:
>>
>>> Hi devel team,
>>>
>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>> package
>>> previously used pointers to access the data of a SEXP, so it would not
>>> work
>>> for some ALTREP objects which do not have a pointer. I plan to rewrite
>>> the
>>> code and use functions like get_elt, get_region, and get_subset to access
>>> the values of a vector, so I have a few questions for ALTREP:
>>>
>>> 1. Since an ALTREP do not have to define all of the above
>>> functions(element, region, subset), is there any way to check which
>>> function has been defined for an ALTREP class? I did a search on
>>> RInternal.h and altrep.c but did not find a solution for it. If not, will
>>> it be added in the future?
>>>
>>
>> Element and region are guaranteed to always be defined and work (for
>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't
>> have region for STRSXP or VECSXP, I believe). If the altrep class does not
>> provide them then default methods will be used, which may be inefficient in
>> some cases but will work. Subset is currently a forward looking stub, but
>> once implimented, that will also be guaranteed to work for all valid ALTREP
>> classes.
>>
>>
>>>
>>> 2. Given the diversity of ALTREP classes, what is the best way to loop
>>> over
>>> an ALTREP object? I hope there can be an all-in-one function which can
>>> get
>>> the values from a vector as long as at least one of the above functions
>>> has
>>> been defined, so package developers would not be bothered by tons of
>>> `if-else` statement if they want their package to work with ALTREP. Since
>>> it seems like there is no such function exist, what could be the best way
>>> to do the loop under the current R version?
>>>
>>
>> The best way to loop over all SEXPs, which supports both altrep and
>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
>> number of released versions, at least since 3.5.0 I think) and the much
>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>> R_exts/Itermacros.h
>>
>> The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
>> are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).
>>
>>
>>    - sx - C level variable name of the SEXP to iterate over
>>    - px - variable name to use for the pointer populated with data from
>>    a region of sx
>>    - idx - variable name to use for the "outer", batch counter in the
>>    for loop. This will contain the 0-indexed start position of the batch
>>    you're currently processing
>>    - nb - variable name to use for the current batch size. This will
>>    always either be GET_REGION_BUFFSIZE (512), or the number of elements
>>    remaining in the vector, whichever is smaller
>>    - etype - element (C) type, e.g., int, double, of the data
>>    - vtype - vector (access API) type, e.g, INTEGER, REAL
>>    - strt - the 0-indexed position in the vector to start iterating
>>    - nfull - the total number oif elements to iterate over from the
>>    vector
>>    - expr - the code to process a single batch (Which will do things to
>>    px, typically)
>>
>>
>> So code to perform badly implemented not good idea summing of REALSXP
>> data might look like
>>
>> double sillysum(SEXP x) {
>>
>>     double s = 0.0;
>>
>>     ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>         {
>>
>>             for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>         })
>>
>>      return s;
>> }
>>
>> For meatier examples of ITERATE_BY_REGION's use in practice you can grep
>> the R sources. I know it is used in the implementations of the various
>> C-level summaries (summary.c), print and formatting functions, and anyNA.
>>
>> Some things to remember
>>
>>    - If you have an inner loop like the one above, your total position
>>    in the original vector is ind + i
>>    - ITERATE_BY_REGION always processes the whole vector, if you need to
>>    only do part of it yo'll either need custom breaking for both inner and
>>    outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>    - Don't use the variants ending in 0, all they do is skip over things
>>    that are a good idea in the case of non-altreps (and some very specific
>>    altreps).
>>
>> Hope that helps.
>>
>> Best,
>> ~G
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> Best,
>>> Jiefei
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Wang Jiefei
Sorry for post a lot of things, for the first part of code, I copied my C++
iter macro by mistake(and you can see an explicit type casting). Here is
the macro definition from R_exts/Itermacros.h

#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \

                             strt, nfull, expr) do {         \

*       const** etype *px = DATAPTR_OR_NULL(sx);           *             \

       if (px != NULL) {                                      \

           R_xlen_t __ibr_n__ = strt + nfull;                        \

           R_xlen_t nb = __ibr_n__;                                  \

           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \

              expr                                            \

            }                                                 \

       }                                                      \

       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\

                                  strt, nfull, expr);        \

    } while (0)


Best,

Jiefei

On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <[hidden email]> wrote:

> Hi Gabriel,
>
> I have tried the macro and found a small issue, it seems like the macro is
> written in C and does an implicit type conversion(const void * to const int
> *), see below. While it is allowed in C, C++ seems not happy with it. Is it
> possible to add an explicit type casting so that it can be compatible with
> both language?
>
>
> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>
>                              strt, nfull, expr) do {         \
>
>        *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
> \
>
>        if (px != NULL) {                                      \
>
>            R_xlen_t __ibr_n__ = strt + nfull;                        \
>
>            R_xlen_t nb = __ibr_n__;                                  \
>
>            for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>
>               expr                                            \
>
>             }                                                 \
>
>        }                                                      \
>
>        else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
> vtype,       \
>
>                                    strt, nfull, expr);        \
>
>     } while (0)
>
>
>   Also, I notice that the element type(etype) and vector type(vtype) has
> to be specified in the macro. Since the SEXP is the first argument in the
> macro, it seems redundant to define etype and vtype for they have to match
> the type of the SEXP. I'm wondering if this is intentional? Will there be a
> type-free macro in R in the future? Here is a simple type-free macro I'm
> using.
>
> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>
> switch(TYPEOF(sx)){\
>
> case INTSXP:\
>
>        ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>
>        break; \
>
> case REALSXP:\
>
>        ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>
>        break; \
>
> case LGLSXP:\
>
>        ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>
>        break; \
>
> default:\
>
>        Rf_error("Unknow data type\n"); \
>
>        break; \
>
> }
>
>
>
> // [[Rcpp::export]]
>
> double sillysum(SEXP x) {
>
>        double s = 0.0;
>
>        type_free_iter(x, ptr, ind, nbatch,
>
>               {
>
>                      for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
>
>               });
>
>               return s;
>
> }
>
>
>
>
> Best,
>
> Jiefei
>
> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <[hidden email]> wrote:
>
>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>> to see that there are lots of changes in ALTREP in R devel version. I
>> really appreciate your help!
>>
>> Best,
>> Jiefei
>>
>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <[hidden email]>
>> wrote:
>>
>>> Jiefei,
>>>
>>> I've been meaning to write up something about this so hopefully this
>>> will be an impetus for me to actually do that, but until then, responses
>>> inline.
>>>
>>>
>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <[hidden email]> wrote:
>>>
>>>> Hi devel team,
>>>>
>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>>> package
>>>> previously used pointers to access the data of a SEXP, so it would not
>>>> work
>>>> for some ALTREP objects which do not have a pointer. I plan to rewrite
>>>> the
>>>> code and use functions like get_elt, get_region, and get_subset to
>>>> access
>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>
>>>> 1. Since an ALTREP do not have to define all of the above
>>>> functions(element, region, subset), is there any way to check which
>>>> function has been defined for an ALTREP class? I did a search on
>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
>>>> will
>>>> it be added in the future?
>>>>
>>>
>>> Element and region are guaranteed to always be defined and work (for
>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't
>>> have region for STRSXP or VECSXP, I believe). If the altrep class does not
>>> provide them then default methods will be used, which may be inefficient in
>>> some cases but will work. Subset is currently a forward looking stub, but
>>> once implimented, that will also be guaranteed to work for all valid ALTREP
>>> classes.
>>>
>>>
>>>>
>>>> 2. Given the diversity of ALTREP classes, what is the best way to loop
>>>> over
>>>> an ALTREP object? I hope there can be an all-in-one function which can
>>>> get
>>>> the values from a vector as long as at least one of the above functions
>>>> has
>>>> been defined, so package developers would not be bothered by tons of
>>>> `if-else` statement if they want their package to work with ALTREP.
>>>> Since
>>>> it seems like there is no such function exist, what could be the best
>>>> way
>>>> to do the loop under the current R version?
>>>>
>>>
>>> The best way to loop over all SEXPs, which supports both altrep and
>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
>>> number of released versions, at least since 3.5.0 I think) and the much
>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>> R_exts/Itermacros.h
>>>
>>> The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
>>> are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).
>>>
>>>
>>>    - sx - C level variable name of the SEXP to iterate over
>>>    - px - variable name to use for the pointer populated with data from
>>>    a region of sx
>>>    - idx - variable name to use for the "outer", batch counter in the
>>>    for loop. This will contain the 0-indexed start position of the batch
>>>    you're currently processing
>>>    - nb - variable name to use for the current batch size. This will
>>>    always either be GET_REGION_BUFFSIZE (512), or the number of elements
>>>    remaining in the vector, whichever is smaller
>>>    - etype - element (C) type, e.g., int, double, of the data
>>>    - vtype - vector (access API) type, e.g, INTEGER, REAL
>>>    - strt - the 0-indexed position in the vector to start iterating
>>>    - nfull - the total number oif elements to iterate over from the
>>>    vector
>>>    - expr - the code to process a single batch (Which will do things to
>>>    px, typically)
>>>
>>>
>>> So code to perform badly implemented not good idea summing of REALSXP
>>> data might look like
>>>
>>> double sillysum(SEXP x) {
>>>
>>>     double s = 0.0;
>>>
>>>     ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>>         {
>>>
>>>             for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>>         })
>>>
>>>      return s;
>>> }
>>>
>>> For meatier examples of ITERATE_BY_REGION's use in practice you can grep
>>> the R sources. I know it is used in the implementations of the various
>>> C-level summaries (summary.c), print and formatting functions, and anyNA.
>>>
>>> Some things to remember
>>>
>>>    - If you have an inner loop like the one above, your total position
>>>    in the original vector is ind + i
>>>    - ITERATE_BY_REGION always processes the whole vector, if you need
>>>    to only do part of it yo'll either need custom breaking for both inner and
>>>    outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>>    - Don't use the variants ending in 0, all they do is skip over
>>>    things that are a good idea in the case of non-altreps (and some very
>>>    specific altreps).
>>>
>>> Hope that helps.
>>>
>>> Best,
>>> ~G
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Best,
>>>> Jiefei
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Bob Rudis
Not sure if you're using just C++ or Rcpp for C++ access but https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on using C++ w/ALTREP.

> On Sep 23, 2019, at 3:17 PM, Wang Jiefei <[hidden email]> wrote:
>
> Sorry for post a lot of things, for the first part of code, I copied my C++
> iter macro by mistake(and you can see an explicit type casting). Here is
> the macro definition from R_exts/Itermacros.h
>
> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>
>                             strt, nfull, expr) do {         \
>
> *       const** etype *px = DATAPTR_OR_NULL(sx);           *             \
>
>       if (px != NULL) {                                      \
>
>           R_xlen_t __ibr_n__ = strt + nfull;                        \
>
>           R_xlen_t nb = __ibr_n__;                                  \
>
>           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>
>              expr                                            \
>
>            }                                                 \
>
>       }                                                      \
>
>       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
> \
>
>                                  strt, nfull, expr);        \
>
>    } while (0)
>
>
> Best,
>
> Jiefei
>
> On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <[hidden email]> wrote:
>
>> Hi Gabriel,
>>
>> I have tried the macro and found a small issue, it seems like the macro is
>> written in C and does an implicit type conversion(const void * to const int
>> *), see below. While it is allowed in C, C++ seems not happy with it. Is it
>> possible to add an explicit type casting so that it can be compatible with
>> both language?
>>
>>
>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>>
>>                             strt, nfull, expr) do {         \
>>
>>       *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
>> \
>>
>>       if (px != NULL) {                                      \
>>
>>           R_xlen_t __ibr_n__ = strt + nfull;                        \
>>
>>           R_xlen_t nb = __ibr_n__;                                  \
>>
>>           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>>
>>              expr                                            \
>>
>>            }                                                 \
>>
>>       }                                                      \
>>
>>       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
>> vtype,       \
>>
>>                                   strt, nfull, expr);        \
>>
>>    } while (0)
>>
>>
>>  Also, I notice that the element type(etype) and vector type(vtype) has
>> to be specified in the macro. Since the SEXP is the first argument in the
>> macro, it seems redundant to define etype and vtype for they have to match
>> the type of the SEXP. I'm wondering if this is intentional? Will there be a
>> type-free macro in R in the future? Here is a simple type-free macro I'm
>> using.
>>
>> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>>
>> switch(TYPEOF(sx)){\
>>
>> case INTSXP:\
>>
>>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>>
>>       break; \
>>
>> case REALSXP:\
>>
>>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>>
>>       break; \
>>
>> case LGLSXP:\
>>
>>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>>
>>       break; \
>>
>> default:\
>>
>>       Rf_error("Unknow data type\n"); \
>>
>>       break; \
>>
>> }
>>
>>
>>
>> // [[Rcpp::export]]
>>
>> double sillysum(SEXP x) {
>>
>>       double s = 0.0;
>>
>>       type_free_iter(x, ptr, ind, nbatch,
>>
>>              {
>>
>>                     for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
>>
>>              });
>>
>>              return s;
>>
>> }
>>
>>
>>
>>
>> Best,
>>
>> Jiefei
>>
>> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <[hidden email]> wrote:
>>
>>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>>> to see that there are lots of changes in ALTREP in R devel version. I
>>> really appreciate your help!
>>>
>>> Best,
>>> Jiefei
>>>
>>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <[hidden email]>
>>> wrote:
>>>
>>>> Jiefei,
>>>>
>>>> I've been meaning to write up something about this so hopefully this
>>>> will be an impetus for me to actually do that, but until then, responses
>>>> inline.
>>>>
>>>>
>>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <[hidden email]> wrote:
>>>>
>>>>> Hi devel team,
>>>>>
>>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>>>> package
>>>>> previously used pointers to access the data of a SEXP, so it would not
>>>>> work
>>>>> for some ALTREP objects which do not have a pointer. I plan to rewrite
>>>>> the
>>>>> code and use functions like get_elt, get_region, and get_subset to
>>>>> access
>>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>>
>>>>> 1. Since an ALTREP do not have to define all of the above
>>>>> functions(element, region, subset), is there any way to check which
>>>>> function has been defined for an ALTREP class? I did a search on
>>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
>>>>> will
>>>>> it be added in the future?
>>>>>
>>>>
>>>> Element and region are guaranteed to always be defined and work (for
>>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't
>>>> have region for STRSXP or VECSXP, I believe). If the altrep class does not
>>>> provide them then default methods will be used, which may be inefficient in
>>>> some cases but will work. Subset is currently a forward looking stub, but
>>>> once implimented, that will also be guaranteed to work for all valid ALTREP
>>>> classes.
>>>>
>>>>
>>>>>
>>>>> 2. Given the diversity of ALTREP classes, what is the best way to loop
>>>>> over
>>>>> an ALTREP object? I hope there can be an all-in-one function which can
>>>>> get
>>>>> the values from a vector as long as at least one of the above functions
>>>>> has
>>>>> been defined, so package developers would not be bothered by tons of
>>>>> `if-else` statement if they want their package to work with ALTREP.
>>>>> Since
>>>>> it seems like there is no such function exist, what could be the best
>>>>> way
>>>>> to do the loop under the current R version?
>>>>>
>>>>
>>>> The best way to loop over all SEXPs, which supports both altrep and
>>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
>>>> number of released versions, at least since 3.5.0 I think) and the much
>>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>>> R_exts/Itermacros.h
>>>>
>>>> The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
>>>> are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).
>>>>
>>>>
>>>>   - sx - C level variable name of the SEXP to iterate over
>>>>   - px - variable name to use for the pointer populated with data from
>>>>   a region of sx
>>>>   - idx - variable name to use for the "outer", batch counter in the
>>>>   for loop. This will contain the 0-indexed start position of the batch
>>>>   you're currently processing
>>>>   - nb - variable name to use for the current batch size. This will
>>>>   always either be GET_REGION_BUFFSIZE (512), or the number of elements
>>>>   remaining in the vector, whichever is smaller
>>>>   - etype - element (C) type, e.g., int, double, of the data
>>>>   - vtype - vector (access API) type, e.g, INTEGER, REAL
>>>>   - strt - the 0-indexed position in the vector to start iterating
>>>>   - nfull - the total number oif elements to iterate over from the
>>>>   vector
>>>>   - expr - the code to process a single batch (Which will do things to
>>>>   px, typically)
>>>>
>>>>
>>>> So code to perform badly implemented not good idea summing of REALSXP
>>>> data might look like
>>>>
>>>> double sillysum(SEXP x) {
>>>>
>>>>    double s = 0.0;
>>>>
>>>>    ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>>>        {
>>>>
>>>>            for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>>>        })
>>>>
>>>>     return s;
>>>> }
>>>>
>>>> For meatier examples of ITERATE_BY_REGION's use in practice you can grep
>>>> the R sources. I know it is used in the implementations of the various
>>>> C-level summaries (summary.c), print and formatting functions, and anyNA.
>>>>
>>>> Some things to remember
>>>>
>>>>   - If you have an inner loop like the one above, your total position
>>>>   in the original vector is ind + i
>>>>   - ITERATE_BY_REGION always processes the whole vector, if you need
>>>>   to only do part of it yo'll either need custom breaking for both inner and
>>>>   outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>>>   - Don't use the variants ending in 0, all they do is skip over
>>>>   things that are a good idea in the case of non-altreps (and some very
>>>>   specific altreps).
>>>>
>>>> Hope that helps.
>>>>
>>>> Best,
>>>> ~G
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Best,
>>>>> Jiefei
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Gabriel Becker-2
Hi Bob,

Thanks for sending around the link to that. It looks mostly right and looks
like a useful onramp. There are a few things to watch out for though (I've
cc'ed Romain so he's aware of these comments). @romain I hope you taake the
following comments as they are intended, as help rather than attacks.

The largest issue I see is that the contract for Get_region is that it
*populates the
provided buffer with a copy of the data. *That buffer is expected to be
safe to destructively modify, shuffle, etc though I don't know if we are
actually doing that anywhere. As such, if I understand his C++ correctly,
that Get_region method  is not safe and shouldn't be used.

The other point is that Dataptr_or_null is not actually *guaranteed *not to
allocate. The default method returns NULL, but we have no way of preventing
an allocation in a user-defined method, and probably (?) no easy way of
detecting that it is occurring before it causes a bug. That said, Romain is
correct that when you are writing Dataptr_or_null methods you should write
them so that they don't allocate, generally. Basically your methods for
Dataptr_or_null shouldn't allocate, but you also should not write code that
relies on hard assumptions that no one's ever will.

Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
hits either INTEGER_ELT (which really should probably be a
ITERATE_BY_REGION) or ITERATE_BY_REGION.

Anyway, I hope that helps.
~G






On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <[hidden email]> wrote:

> Not sure if you're using just C++ or Rcpp for C++ access but
> https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on
> using C++ w/ALTREP.
>
> > On Sep 23, 2019, at 3:17 PM, Wang Jiefei <[hidden email]> wrote:
> >
> > Sorry for post a lot of things, for the first part of code, I copied my
> C++
> > iter macro by mistake(and you can see an explicit type casting). Here is
> > the macro definition from R_exts/Itermacros.h
> >
> > #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> >
> >                             strt, nfull, expr) do {         \
> >
> > *       const** etype *px = DATAPTR_OR_NULL(sx);           *
>  \
> >
> >       if (px != NULL) {                                      \
> >
> >           R_xlen_t __ibr_n__ = strt + nfull;                        \
> >
> >           R_xlen_t nb = __ibr_n__;                                  \
> >
> >           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> >
> >              expr                                            \
> >
> >            }                                                 \
> >
> >       }                                                      \
> >
> >       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
> > \
> >
> >                                  strt, nfull, expr);        \
> >
> >    } while (0)
> >
> >
> > Best,
> >
> > Jiefei
> >
> > On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <[hidden email]> wrote:
> >
> >> Hi Gabriel,
> >>
> >> I have tried the macro and found a small issue, it seems like the macro
> is
> >> written in C and does an implicit type conversion(const void * to const
> int
> >> *), see below. While it is allowed in C, C++ seems not happy with it.
> Is it
> >> possible to add an explicit type casting so that it can be compatible
> with
> >> both language?
> >>
> >>
> >> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> >>
> >>                             strt, nfull, expr) do {         \
> >>
> >>       *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
> >> \
> >>
> >>       if (px != NULL) {                                      \
> >>
> >>           R_xlen_t __ibr_n__ = strt + nfull;                        \
> >>
> >>           R_xlen_t nb = __ibr_n__;                                  \
> >>
> >>           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> >>
> >>              expr                                            \
> >>
> >>            }                                                 \
> >>
> >>       }                                                      \
> >>
> >>       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
> >> vtype,       \
> >>
> >>                                   strt, nfull, expr);        \
> >>
> >>    } while (0)
> >>
> >>
> >>  Also, I notice that the element type(etype) and vector type(vtype) has
> >> to be specified in the macro. Since the SEXP is the first argument in
> the
> >> macro, it seems redundant to define etype and vtype for they have to
> match
> >> the type of the SEXP. I'm wondering if this is intentional? Will there
> be a
> >> type-free macro in R in the future? Here is a simple type-free macro I'm
> >> using.
> >>
> >> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
> >>
> >> switch(TYPEOF(sx)){\
> >>
> >> case INTSXP:\
> >>
> >>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
> >>
> >>       break; \
> >>
> >> case REALSXP:\
> >>
> >>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
> >>
> >>       break; \
> >>
> >> case LGLSXP:\
> >>
> >>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
> >>
> >>       break; \
> >>
> >> default:\
> >>
> >>       Rf_error("Unknow data type\n"); \
> >>
> >>       break; \
> >>
> >> }
> >>
> >>
> >>
> >> // [[Rcpp::export]]
> >>
> >> double sillysum(SEXP x) {
> >>
> >>       double s = 0.0;
> >>
> >>       type_free_iter(x, ptr, ind, nbatch,
> >>
> >>              {
> >>
> >>                     for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
> >>
> >>              });
> >>
> >>              return s;
> >>
> >> }
> >>
> >>
> >>
> >>
> >> Best,
> >>
> >> Jiefei
> >>
> >> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <[hidden email]> wrote:
> >>
> >>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
> >>> to see that there are lots of changes in ALTREP in R devel version. I
> >>> really appreciate your help!
> >>>
> >>> Best,
> >>> Jiefei
> >>>
> >>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <[hidden email]>
> >>> wrote:
> >>>
> >>>> Jiefei,
> >>>>
> >>>> I've been meaning to write up something about this so hopefully this
> >>>> will be an impetus for me to actually do that, but until then,
> responses
> >>>> inline.
> >>>>
> >>>>
> >>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <[hidden email]> wrote:
> >>>>
> >>>>> Hi devel team,
> >>>>>
> >>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
> >>>>> package
> >>>>> previously used pointers to access the data of a SEXP, so it would
> not
> >>>>> work
> >>>>> for some ALTREP objects which do not have a pointer. I plan to
> rewrite
> >>>>> the
> >>>>> code and use functions like get_elt, get_region, and get_subset to
> >>>>> access
> >>>>> the values of a vector, so I have a few questions for ALTREP:
> >>>>>
> >>>>> 1. Since an ALTREP do not have to define all of the above
> >>>>> functions(element, region, subset), is there any way to check which
> >>>>> function has been defined for an ALTREP class? I did a search on
> >>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
> >>>>> will
> >>>>> it be added in the future?
> >>>>>
> >>>>
> >>>> Element and region are guaranteed to always be defined and work (for
> >>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently
> don't
> >>>> have region for STRSXP or VECSXP, I believe). If the altrep class
> does not
> >>>> provide them then default methods will be used, which may be
> inefficient in
> >>>> some cases but will work. Subset is currently a forward looking stub,
> but
> >>>> once implimented, that will also be guaranteed to work for all valid
> ALTREP
> >>>> classes.
> >>>>
> >>>>
> >>>>>
> >>>>> 2. Given the diversity of ALTREP classes, what is the best way to
> loop
> >>>>> over
> >>>>> an ALTREP object? I hope there can be an all-in-one function which
> can
> >>>>> get
> >>>>> the values from a vector as long as at least one of the above
> functions
> >>>>> has
> >>>>> been defined, so package developers would not be bothered by tons of
> >>>>> `if-else` statement if they want their package to work with ALTREP.
> >>>>> Since
> >>>>> it seems like there is no such function exist, what could be the best
> >>>>> way
> >>>>> to do the loop under the current R version?
> >>>>>
> >>>>
> >>>> The best way to loop over all SEXPs, which supports both altrep and
> >>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R
> for a
> >>>> number of released versions, at least since 3.5.0 I think) and the
> much
> >>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
> >>>> R_exts/Itermacros.h
> >>>>
> >>>> The meaning of the arguments is as follows for
> ITERATE_BY_REGION_PARTIAL
> >>>> are as follows (ITERATE_BY_REGION is the same except no strt, and
> nfull).
> >>>>
> >>>>
> >>>>   - sx - C level variable name of the SEXP to iterate over
> >>>>   - px - variable name to use for the pointer populated with data from
> >>>>   a region of sx
> >>>>   - idx - variable name to use for the "outer", batch counter in the
> >>>>   for loop. This will contain the 0-indexed start position of the
> batch
> >>>>   you're currently processing
> >>>>   - nb - variable name to use for the current batch size. This will
> >>>>   always either be GET_REGION_BUFFSIZE (512), or the number of
> elements
> >>>>   remaining in the vector, whichever is smaller
> >>>>   - etype - element (C) type, e.g., int, double, of the data
> >>>>   - vtype - vector (access API) type, e.g, INTEGER, REAL
> >>>>   - strt - the 0-indexed position in the vector to start iterating
> >>>>   - nfull - the total number oif elements to iterate over from the
> >>>>   vector
> >>>>   - expr - the code to process a single batch (Which will do things to
> >>>>   px, typically)
> >>>>
> >>>>
> >>>> So code to perform badly implemented not good idea summing of REALSXP
> >>>> data might look like
> >>>>
> >>>> double sillysum(SEXP x) {
> >>>>
> >>>>    double s = 0.0;
> >>>>
> >>>>    ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
> >>>>        {
> >>>>
> >>>>            for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
> >>>>        })
> >>>>
> >>>>     return s;
> >>>> }
> >>>>
> >>>> For meatier examples of ITERATE_BY_REGION's use in practice you can
> grep
> >>>> the R sources. I know it is used in the implementations of the various
> >>>> C-level summaries (summary.c), print and formatting functions, and
> anyNA.
> >>>>
> >>>> Some things to remember
> >>>>
> >>>>   - If you have an inner loop like the one above, your total position
> >>>>   in the original vector is ind + i
> >>>>   - ITERATE_BY_REGION always processes the whole vector, if you need
> >>>>   to only do part of it yo'll either need custom breaking for both
> inner and
> >>>>   outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
> >>>>   - Don't use the variants ending in 0, all they do is skip over
> >>>>   things that are a good idea in the case of non-altreps (and some
> very
> >>>>   specific altreps).
> >>>>
> >>>> Hope that helps.
> >>>>
> >>>> Best,
> >>>> ~G
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Best,
> >>>>> Jiefei
> >>>>>
> >>>>>        [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> [hidden email] mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>
> >>>>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Serguei Sokol
Le 24/09/2019 à 07:48, Gabriel Becker a écrit :
> Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
> hits either INTEGER_ELT (which really should probably be a
> ITERATE_BY_REGION) or ITERATE_BY_REGION.
>
Even if it is not the main point of this thread, I was wondering if
mean() could take an advantage of sum() (which handles ALTREP in
efficient way) to be defined as mean(x)=sum(x)/length(x)? Currently,
sum(1:1e14) is almost instantaneous while mean(1:1e14) is very long.

Best Serguei.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to loop over an ALTREP vector?

Romain Francois-3
In reply to this post by Gabriel Becker-2
Thanks for these comments. I should alter the blog post or write some follow up.

This was a weekend blog post that only benefited from a short time of research research. I’m glad people find it useful, but I’m sure a detailed documentation of the features from the authors would be more useful.

Romain

> Le 24 sept. 2019 à 07:48, Gabriel Becker <[hidden email]> a écrit :
>
> Hi Bob,
>
> Thanks for sending around the link to that. It looks mostly right and looks
> like a useful onramp. There are a few things to watch out for though (I've
> cc'ed Romain so he's aware of these comments). @romain I hope you taake the
> following comments as they are intended, as help rather than attacks.
>
> The largest issue I see is that the contract for Get_region is that it
> *populates the
> provided buffer with a copy of the data. *That buffer is expected to be
> safe to destructively modify, shuffle, etc though I don't know if we are
> actually doing that anywhere. As such, if I understand his C++ correctly,
> that Get_region method  is not safe and shouldn't be used.
>
> The other point is that Dataptr_or_null is not actually *guaranteed *not to
> allocate. The default method returns NULL, but we have no way of preventing
> an allocation in a user-defined method, and probably (?) no easy way of
> detecting that it is occurring before it causes a bug. That said, Romain is
> correct that when you are writing Dataptr_or_null methods you should write
> them so that they don't allocate, generally. Basically your methods for
> Dataptr_or_null shouldn't allocate, but you also should not write code that
> relies on hard assumptions that no one's ever will.
>
> Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
> hits either INTEGER_ELT (which really should probably be a
> ITERATE_BY_REGION) or ITERATE_BY_REGION.
>
> Anyway, I hope that helps.
> ~G
>
>
>
>
>
>
> On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <[hidden email] <mailto:[hidden email]>> wrote:
>
>> Not sure if you're using just C++ or Rcpp for C++ access but
>> https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on
>> using C++ w/ALTREP.
>>
>>> On Sep 23, 2019, at 3:17 PM, Wang Jiefei <[hidden email]> wrote:
>>>
>>> Sorry for post a lot of things, for the first part of code, I copied my
>> C++
>>> iter macro by mistake(and you can see an explicit type casting). Here is
>>> the macro definition from R_exts/Itermacros.h
>>>
>>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>>>
>>>                            strt, nfull, expr) do {         \
>>>
>>> *       const** etype *px = DATAPTR_OR_NULL(sx);           *
>> \
>>>
>>>      if (px != NULL) {                                      \
>>>
>>>          R_xlen_t __ibr_n__ = strt + nfull;                        \
>>>
>>>          R_xlen_t nb = __ibr_n__;                                  \
>>>
>>>          for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>>>
>>>             expr                                            \
>>>
>>>           }                                                 \
>>>
>>>      }                                                      \
>>>
>>>      else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
>>> \
>>>
>>>                                 strt, nfull, expr);        \
>>>
>>>   } while (0)
>>>
>>>
>>> Best,
>>>
>>> Jiefei
>>>
>>> On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <[hidden email]> wrote:
>>>
>>>> Hi Gabriel,
>>>>
>>>> I have tried the macro and found a small issue, it seems like the macro
>> is
>>>> written in C and does an implicit type conversion(const void * to const
>> int
>>>> *), see below. While it is allowed in C, C++ seems not happy with it.
>> Is it
>>>> possible to add an explicit type casting so that it can be compatible
>> with
>>>> both language?
>>>>
>>>>
>>>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>>>>
>>>>                            strt, nfull, expr) do {         \
>>>>
>>>>      *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
>>>> \
>>>>
>>>>      if (px != NULL) {                                      \
>>>>
>>>>          R_xlen_t __ibr_n__ = strt + nfull;                        \
>>>>
>>>>          R_xlen_t nb = __ibr_n__;                                  \
>>>>
>>>>          for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>>>>
>>>>             expr                                            \
>>>>
>>>>           }                                                 \
>>>>
>>>>      }                                                      \
>>>>
>>>>      else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
>>>> vtype,       \
>>>>
>>>>                                  strt, nfull, expr);        \
>>>>
>>>>   } while (0)
>>>>
>>>>
>>>> Also, I notice that the element type(etype) and vector type(vtype) has
>>>> to be specified in the macro. Since the SEXP is the first argument in
>> the
>>>> macro, it seems redundant to define etype and vtype for they have to
>> match
>>>> the type of the SEXP. I'm wondering if this is intentional? Will there
>> be a
>>>> type-free macro in R in the future? Here is a simple type-free macro I'm
>>>> using.
>>>>
>>>> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>>>>
>>>> switch(TYPEOF(sx)){\
>>>>
>>>> case INTSXP:\
>>>>
>>>>      ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>>>>
>>>>      break; \
>>>>
>>>> case REALSXP:\
>>>>
>>>>      ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>>>>
>>>>      break; \
>>>>
>>>> case LGLSXP:\
>>>>
>>>>      ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>>>>
>>>>      break; \
>>>>
>>>> default:\
>>>>
>>>>      Rf_error("Unknow data type\n"); \
>>>>
>>>>      break; \
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> // [[Rcpp::export]]
>>>>
>>>> double sillysum(SEXP x) {
>>>>
>>>>      double s = 0.0;
>>>>
>>>>      type_free_iter(x, ptr, ind, nbatch,
>>>>
>>>>             {
>>>>
>>>>                    for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
>>>>
>>>>             });
>>>>
>>>>             return s;
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jiefei
>>>>
>>>> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <[hidden email]> wrote:
>>>>
>>>>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>>>>> to see that there are lots of changes in ALTREP in R devel version. I
>>>>> really appreciate your help!
>>>>>
>>>>> Best,
>>>>> Jiefei
>>>>>
>>>>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Jiefei,
>>>>>>
>>>>>> I've been meaning to write up something about this so hopefully this
>>>>>> will be an impetus for me to actually do that, but until then,
>> responses
>>>>>> inline.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <[hidden email]> wrote:
>>>>>>
>>>>>>> Hi devel team,
>>>>>>>
>>>>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>>>>>> package
>>>>>>> previously used pointers to access the data of a SEXP, so it would
>> not
>>>>>>> work
>>>>>>> for some ALTREP objects which do not have a pointer. I plan to
>> rewrite
>>>>>>> the
>>>>>>> code and use functions like get_elt, get_region, and get_subset to
>>>>>>> access
>>>>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>>>>
>>>>>>> 1. Since an ALTREP do not have to define all of the above
>>>>>>> functions(element, region, subset), is there any way to check which
>>>>>>> function has been defined for an ALTREP class? I did a search on
>>>>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
>>>>>>> will
>>>>>>> it be added in the future?
>>>>>>>
>>>>>>
>>>>>> Element and region are guaranteed to always be defined and work (for
>>>>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently
>> don't
>>>>>> have region for STRSXP or VECSXP, I believe). If the altrep class
>> does not
>>>>>> provide them then default methods will be used, which may be
>> inefficient in
>>>>>> some cases but will work. Subset is currently a forward looking stub,
>> but
>>>>>> once implimented, that will also be guaranteed to work for all valid
>> ALTREP
>>>>>> classes.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2. Given the diversity of ALTREP classes, what is the best way to
>> loop
>>>>>>> over
>>>>>>> an ALTREP object? I hope there can be an all-in-one function which
>> can
>>>>>>> get
>>>>>>> the values from a vector as long as at least one of the above
>> functions
>>>>>>> has
>>>>>>> been defined, so package developers would not be bothered by tons of
>>>>>>> `if-else` statement if they want their package to work with ALTREP.
>>>>>>> Since
>>>>>>> it seems like there is no such function exist, what could be the best
>>>>>>> way
>>>>>>> to do the loop under the current R version?
>>>>>>>
>>>>>>
>>>>>> The best way to loop over all SEXPs, which supports both altrep and
>>>>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R
>> for a
>>>>>> number of released versions, at least since 3.5.0 I think) and the
>> much
>>>>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>>>>> R_exts/Itermacros.h
>>>>>>
>>>>>> The meaning of the arguments is as follows for
>> ITERATE_BY_REGION_PARTIAL
>>>>>> are as follows (ITERATE_BY_REGION is the same except no strt, and
>> nfull).
>>>>>>
>>>>>>
>>>>>>  - sx - C level variable name of the SEXP to iterate over
>>>>>>  - px - variable name to use for the pointer populated with data from
>>>>>>  a region of sx
>>>>>>  - idx - variable name to use for the "outer", batch counter in the
>>>>>>  for loop. This will contain the 0-indexed start position of the
>> batch
>>>>>>  you're currently processing
>>>>>>  - nb - variable name to use for the current batch size. This will
>>>>>>  always either be GET_REGION_BUFFSIZE (512), or the number of
>> elements
>>>>>>  remaining in the vector, whichever is smaller
>>>>>>  - etype - element (C) type, e.g., int, double, of the data
>>>>>>  - vtype - vector (access API) type, e.g, INTEGER, REAL
>>>>>>  - strt - the 0-indexed position in the vector to start iterating
>>>>>>  - nfull - the total number oif elements to iterate over from the
>>>>>>  vector
>>>>>>  - expr - the code to process a single batch (Which will do things to
>>>>>>  px, typically)
>>>>>>
>>>>>>
>>>>>> So code to perform badly implemented not good idea summing of REALSXP
>>>>>> data might look like
>>>>>>
>>>>>> double sillysum(SEXP x) {
>>>>>>
>>>>>>   double s = 0.0;
>>>>>>
>>>>>>   ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>>>>>       {
>>>>>>
>>>>>>           for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>>>>>       })
>>>>>>
>>>>>>    return s;
>>>>>> }
>>>>>>
>>>>>> For meatier examples of ITERATE_BY_REGION's use in practice you can
>> grep
>>>>>> the R sources. I know it is used in the implementations of the various
>>>>>> C-level summaries (summary.c), print and formatting functions, and
>> anyNA.
>>>>>>
>>>>>> Some things to remember
>>>>>>
>>>>>>  - If you have an inner loop like the one above, your total position
>>>>>>  in the original vector is ind + i
>>>>>>  - ITERATE_BY_REGION always processes the whole vector, if you need
>>>>>>  to only do part of it yo'll either need custom breaking for both
>> inner and
>>>>>>  outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>>>>>  - Don't use the variants ending in 0, all they do is skip over
>>>>>>  things that are a good idea in the case of non-altreps (and some
>> very
>>>>>>  specific altreps).
>>>>>>
>>>>>> Hope that helps.
>>>>>>
>>>>>> Best,
>>>>>> ~G
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>> Jiefei
>>>>>>>
>>>>>>>       [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] <mailto:[hidden email]> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel