Use of C++ in Packages

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Use of C++ in Packages

Jim Hester
First, thank you to Tomas for writing his recent post[0] on the R
developer blog. It raised important issues in interfacing R's C API
and C++ code.

However I do _not_ think the conclusion reached in the post is helpful
  > don’t use C++ to interface with R

There are now more than 1,600 packages on CRAN using C++, the time is
long past when that type of warning is going to be useful to the R
community.

These same issues will also occur with any newer language (such as
Rust or Julia[1]) which uses RAII to manage resources and tries to
interface with R. It doesn't seem a productive way forward for R to
say it can't interface with these languages without first doing
expensive copies into an intermediate heap.

The advice to avoid C++ is also antithetical to John Chambers vision
of first S and R as a interface language (from Extending R [2])

  > The *interface* principle has always been central to R and to S
before. An interface to subroutines was _the_ way to extend the first
version of S. Subroutine interfaces have continued to be central to R.

The book also has extensive sections on both C++ (via Rcpp) and Julia,
so clearly John thinks these are legitimate ways to extend R.

So if 'don't use C++' is not realistic and the current R API does not
allow safe use of C++ exceptions what are the alternatives?

One thing we could do is look how this is handled in other languages
written in C which also use longjmp for errors.

Lua is one example, they provide an alternative interface;
lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
an error code rather long jumping. These interfaces can then be safely
wrapped by RAII - exception based languages.

This alternative error code interface is not just useful for C++, but
also for resource cleanup in C, it is currently non-trivial to handle
cleanup in all the possible cases a longjmp can occur (interrupts,
warnings, custom conditions, timeouts any allocation etc.) even with R
finalizers.

It is past time for R to consider a non-jumpy C interface, so it can
continue to be used as an effective interface to programming routines
in the years to come.

[0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
[1]: https://github.com/JuliaLang/julia/issues/28606
[2]: https://doi.org/10.1201/9781315381305
[3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
[4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Simon Urbanek
Jim,

I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about.

I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important.

Cheers,
Simon


> On Mar 29, 2019, at 11:19 AM, Jim Hester <[hidden email]> wrote:
>
> First, thank you to Tomas for writing his recent post[0] on the R
> developer blog. It raised important issues in interfacing R's C API
> and C++ code.
>
> However I do _not_ think the conclusion reached in the post is helpful
>> don’t use C++ to interface with R
>
> There are now more than 1,600 packages on CRAN using C++, the time is
> long past when that type of warning is going to be useful to the R
> community.
>
> These same issues will also occur with any newer language (such as
> Rust or Julia[1]) which uses RAII to manage resources and tries to
> interface with R. It doesn't seem a productive way forward for R to
> say it can't interface with these languages without first doing
> expensive copies into an intermediate heap.
>
> The advice to avoid C++ is also antithetical to John Chambers vision
> of first S and R as a interface language (from Extending R [2])
>
>> The *interface* principle has always been central to R and to S
> before. An interface to subroutines was _the_ way to extend the first
> version of S. Subroutine interfaces have continued to be central to R.
>
> The book also has extensive sections on both C++ (via Rcpp) and Julia,
> so clearly John thinks these are legitimate ways to extend R.
>
> So if 'don't use C++' is not realistic and the current R API does not
> allow safe use of C++ exceptions what are the alternatives?
>
> One thing we could do is look how this is handled in other languages
> written in C which also use longjmp for errors.
>
> Lua is one example, they provide an alternative interface;
> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
> an error code rather long jumping. These interfaces can then be safely
> wrapped by RAII - exception based languages.
>
> This alternative error code interface is not just useful for C++, but
> also for resource cleanup in C, it is currently non-trivial to handle
> cleanup in all the possible cases a longjmp can occur (interrupts,
> warnings, custom conditions, timeouts any allocation etc.) even with R
> finalizers.
>
> It is past time for R to consider a non-jumpy C interface, so it can
> continue to be used as an effective interface to programming routines
> in the years to come.
>
> [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
> [1]: https://github.com/JuliaLang/julia/issues/28606
> [2]: https://doi.org/10.1201/9781315381305
> [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
> [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

John Mount
I appreciate the writing on this.

However I definitely think there is a huge difference between "use with care" and "don't use".  They just are not the same statement.

> On Mar 29, 2019, at 10:15 AM, Simon Urbanek <[hidden email]> wrote:
>
> Jim,
>
> I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about.
>
> I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important.
>
> Cheers,
> Simon
>
>
>> On Mar 29, 2019, at 11:19 AM, Jim Hester <[hidden email]> wrote:
>>
>> First, thank you to Tomas for writing his recent post[0] on the R
>> developer blog. It raised important issues in interfacing R's C API
>> and C++ code.
>>
>> However I do _not_ think the conclusion reached in the post is helpful
>>> don’t use C++ to interface with R
>>
>> There are now more than 1,600 packages on CRAN using C++, the time is
>> long past when that type of warning is going to be useful to the R
>> community.
>>
>> These same issues will also occur with any newer language (such as
>> Rust or Julia[1]) which uses RAII to manage resources and tries to
>> interface with R. It doesn't seem a productive way forward for R to
>> say it can't interface with these languages without first doing
>> expensive copies into an intermediate heap.
>>
>> The advice to avoid C++ is also antithetical to John Chambers vision
>> of first S and R as a interface language (from Extending R [2])
>>
>>> The *interface* principle has always been central to R and to S
>> before. An interface to subroutines was _the_ way to extend the first
>> version of S. Subroutine interfaces have continued to be central to R.
>>
>> The book also has extensive sections on both C++ (via Rcpp) and Julia,
>> so clearly John thinks these are legitimate ways to extend R.
>>
>> So if 'don't use C++' is not realistic and the current R API does not
>> allow safe use of C++ exceptions what are the alternatives?
>>
>> One thing we could do is look how this is handled in other languages
>> written in C which also use longjmp for errors.
>>
>> Lua is one example, they provide an alternative interface;
>> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
>> an error code rather long jumping. These interfaces can then be safely
>> wrapped by RAII - exception based languages.
>>
>> This alternative error code interface is not just useful for C++, but
>> also for resource cleanup in C, it is currently non-trivial to handle
>> cleanup in all the possible cases a longjmp can occur (interrupts,
>> warnings, custom conditions, timeouts any allocation etc.) even with R
>> finalizers.
>>
>> It is past time for R to consider a non-jumpy C interface, so it can
>> continue to be used as an effective interface to programming routines
>> in the years to come.
>>
>> [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
>> [1]: https://github.com/JuliaLang/julia/issues/28606
>> [2]: https://doi.org/10.1201/9781315381305
>> [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
>> [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

---------------
John Mount
http://www.win-vector.com/ <http://www.win-vector.com/>
Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/>




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Kevin Ushey
In reply to this post by Simon Urbanek
I think it's also worth saying that some of these issues affect C code
as well; e.g. this is not safe:

    FILE* f = fopen(...);
    Rf_eval(...);
    fclose(f);

whereas the C++ equivalent would likely handle closing of the file in
the destructor. In other words, I think many users just may not be
cognizant of the fact that most R APIs can longjmp, and what that
implies for cleanup of allocated resources. R_alloc() may help solve
the issue specifically for memory allocations, but for any library
interface that has a 'open' and 'close' step, the same sort of issue
will arise.

What I believe we should do, and what Rcpp has made steps towards, is
make it possible to interact with some subset of the R API safely from
C++ contexts. This has always been possible with e.g. R_ToplevelExec()
and R_ExecWithCleanup(), and now things are even better with
R_UnwindProtect(). In theory, as a prototype, an R package could
provide a 'safe' C++ interface to the R API using R_UnwindProtect()
and friends as appropriate, and client packages could import and link
to that package to gain access to the interface. Code generators (as
Rcpp Attributes does) can handle some of the pain in these interfaces,
so that users are mostly insulated from the nitty gritty details.

I agree that the content of Tomas's post is very helpful, especially
since I expect many R programmers who dip their toes into the C++
world are not aware of the caveats of talking to R from C++. However,
I don't think it's helpful to recommend "don't use C++"; rather, I
believe the question should be, "what can we do to make it possible to
easily and safely interact with R from C++?". Because, as I understand
it, all of the problems raised are solvable: either through a
well-defined C++ interface, or through better education.

I'll add my own opinion: writing correct C code is an incredibly
difficult task. C++, while obviously not perfect, makes things
substantially easier with tools like RAII, the STL, smart pointers,
and so on. And I strongly believe that C++ (with Rcpp) is still a
better choice than C for new users who want to interface with R from
compiled code.

tl;dr: I (and I think most others) just wish the summary had a more
positive outlook for the future of C++ with R.

Best,
Kevin

On Fri, Mar 29, 2019 at 10:16 AM Simon Urbanek
<[hidden email]> wrote:

>
> Jim,
>
> I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about.
>
> I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important.
>
> Cheers,
> Simon
>
>
> > On Mar 29, 2019, at 11:19 AM, Jim Hester <[hidden email]> wrote:
> >
> > First, thank you to Tomas for writing his recent post[0] on the R
> > developer blog. It raised important issues in interfacing R's C API
> > and C++ code.
> >
> > However I do _not_ think the conclusion reached in the post is helpful
> >> don’t use C++ to interface with R
> >
> > There are now more than 1,600 packages on CRAN using C++, the time is
> > long past when that type of warning is going to be useful to the R
> > community.
> >
> > These same issues will also occur with any newer language (such as
> > Rust or Julia[1]) which uses RAII to manage resources and tries to
> > interface with R. It doesn't seem a productive way forward for R to
> > say it can't interface with these languages without first doing
> > expensive copies into an intermediate heap.
> >
> > The advice to avoid C++ is also antithetical to John Chambers vision
> > of first S and R as a interface language (from Extending R [2])
> >
> >> The *interface* principle has always been central to R and to S
> > before. An interface to subroutines was _the_ way to extend the first
> > version of S. Subroutine interfaces have continued to be central to R.
> >
> > The book also has extensive sections on both C++ (via Rcpp) and Julia,
> > so clearly John thinks these are legitimate ways to extend R.
> >
> > So if 'don't use C++' is not realistic and the current R API does not
> > allow safe use of C++ exceptions what are the alternatives?
> >
> > One thing we could do is look how this is handled in other languages
> > written in C which also use longjmp for errors.
> >
> > Lua is one example, they provide an alternative interface;
> > lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
> > an error code rather long jumping. These interfaces can then be safely
> > wrapped by RAII - exception based languages.
> >
> > This alternative error code interface is not just useful for C++, but
> > also for resource cleanup in C, it is currently non-trivial to handle
> > cleanup in all the possible cases a longjmp can occur (interrupts,
> > warnings, custom conditions, timeouts any allocation etc.) even with R
> > finalizers.
> >
> > It is past time for R to consider a non-jumpy C interface, so it can
> > continue to be used as an effective interface to programming routines
> > in the years to come.
> >
> > [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
> > [1]: https://github.com/JuliaLang/julia/issues/28606
> > [2]: https://doi.org/10.1201/9781315381305
> > [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
> > [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Gabriel Becker-2
In reply to this post by Jim Hester
Hi Jim (et al.),

Comments inline (and assume any offense was unintended, these kinds of
things can be tricky to talk about).

On Fri, Mar 29, 2019 at 8:19 AM Jim Hester <[hidden email]> wrote:

> First, thank you to Tomas for writing his recent post[0] on the R
> developer blog. It raised important issues in interfacing R's C API
> and C++ code.
>
> However I do _not_ think the conclusion reached in the post is helpful
>   > don’t use C++ to interface with R
>

I was a bit surprised a the the strength of this too but its understandable
given the content/motivation of the post.

My personal take away, without putting any words in Tomas' or R-core's
mouths at all, is that the crux here is that using c++ in R packages safely
is a LOT less trivial than people in the wider R community think it is,
these days. Or rather, there are things you can do safely quite easily when
using c++ in an R package, and things you can't, but that distincton a)
isn't really on many people's radar, and b) isn't super trivial to identify
at any given time, and c) depends on internal implementation details so
isn't stable / safe to rely on across time anyway. There are a lot of
reasons for a), and none of them, nor anything else I'm about to say,
constitute criticisms of Rcpp or its developers.

I've always thought that we as tool/software developers in this space
should make things seem as easy and convenient to users as they
can/intrinsically are, *but not easier*. I don't know how popular that
second part I put in there is generally, but personally I think its true
and pretty important not to leave off. I read Tomas' past as suggesting
that as a community, without pointing fingers or laying any individual
blame,  have unintentionally crossed "as easy as it actually is/can be to
do right" line when it comes to the impression we give to novice/journeyman
package developers regarding using c++to interact with the R internals. I
honestly claim little familiarity with c++ but it seems like Tomas is the
relevant expert on both it and hard-core details about how aspects of the R
internals work so if he tells us that that has happened, we should probably
listen.


> There are now more than 1,600 packages on CRAN using C++, the time is
> long past when that type of warning is going to be useful to the R
> community.
>

Here I disagree here pretty strongly. I think the warning is very useful -
unless these issues were widely known before the post (my impression is
that they weren't) - and ignoring its contents or encouraging others to do
so as influential members of the R community would be irresponsible.

I mean, the reality of the situation as it exists now is more or less (I'd
assume a great deal 'more' than 'less', personally) what Tomas described,
right? Furthermore, regardless of what changes may come in the future, it
seems very unlikely any of them will be in this coming release (since grand
feature freeze is like, today?) so we're talking a year out, at LEAST.
Given that, this advice, or at least a more nuanced stance that gives the
information from the post proper weight and is different from the
prevailing sentiment now, basically has to be realistic in the short term.

At the very least I think the post tells us that we need to be really
careful as a community with the "you want speed throw some c++ in your
package at it, you can learn how in a day and it's super easy and basically
free" messaging. The reality is more nuanced than that, at best, even if
ultimately in many situations that is a valid/reasonable approach.


>
> These same issues will also occur with any newer language (such as
> Rust or Julia[1]) which uses RAII to manage resources and tries to
> interface with R. It doesn't seem a productive way forward for R to
> say it can't interface with these languages without first doing
> expensive copies into an intermediate heap.
>
> The advice to avoid C++ is also antithetical to John Chambers vision
> of first S and R as a interface language (from Extending R [2])
>
>   > The *interface* principle has always been central to R and to S
> before. An interface to subroutines was _the_ way to extend the first
> version of S. Subroutine interfaces have continued to be central to R.
>
> The book also has extensive sections on both C++ (via Rcpp) and Julia,
> so clearly John thinks these are legitimate ways to extend R.
>
> So if 'don't use C++' is not realistic and the current R API does not
> allow safe use of C++ exceptions what are the alternatives?
>

Again, nothing is going to change about this for a year*, at least *(AFAIK,
not on R-core) so we have to make it at least somewhat realistic; perhaps
not the blanket moratorium that Tomas advocated - though IMHO statements
from R-core about what is safe/supported when operating in R arena should
be granted *a lot *of weight - but certainly not the prevailing sentiment
it was responding to, either. That is true even if we commit to also
looking for ways to improve the situation in the longer term.


>
> One thing we could do is look how this is handled in other languages
> written in C which also use longjmp for errors.
>
> Lua is one example, they provide an alternative interface;
> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
> an error code rather long jumping. These interfaces can then be safely
> wrapped by RAII - exception based languages.
>

So there's the function that Simon mentioned, which would work at least for
evaluating R code, though it doesn't necessarily help when you want to hit
the C api directly I think. Because of ALTREP, a LOT of things can
allocate, and thus error, now. That was necessary to get what we needed
without an amount of work/refactoring that would have tanked the whole
project (I think), but it is a thing.


>
> This alternative error code interface is not just useful for C++, but
> also for resource cleanup in C, it is currently non-trivial to handle
> cleanup in all the possible cases a longjmp can occur (interrupts,
> warnings, custom conditions, timeouts any allocation etc.) even with R
> finalizers.
>
> It is past time for R to consider a non-jumpy C interface, so it can
> continue to be used as an effective interface to programming routines
> in the years to come.
>

I mean I totally get this desire, and don't even disagree necessarily in
principle, but that's a pretty easy thing to say, right? My impression,
without really knowing the details of what all that would entail is that it
would/will be a seriously non-trivial amount of work for a group of people
who are already very busy maintaining an extremely widely used, extremely
complex piece of software.

Best,
~G

>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Simon Urbanek
In reply to this post by Kevin Ushey
Kevin,


> On Mar 29, 2019, at 17:01, Kevin Ushey <[hidden email]> wrote:
>
> I think it's also worth saying that some of these issues affect C code
> as well; e.g. this is not safe:
>
>    FILE* f = fopen(...);
>    Rf_eval(...);
>    fclose(f);
>

I fully agree, but developers using C are well aware of the necessity of handling lifespan of objects explicitly, so at least there are no surprises.


> whereas the C++ equivalent would likely handle closing of the file in the destructor. In other words, I think many users just may not be cognizant of the fact that most R APIs can longjmp, and what that implies for cleanup of allocated resources. R_alloc() may help solve the issue specifically for memory allocations, but for any library interface that has a 'open' and 'close' step, the same sort of issue will arise.
>

Well, I hope that anyone writing native code in package is well aware of that and will use an external pointer with finalizer to clean up native objects in any 3rd party library that are created during the call.


> What I believe we should do, and what Rcpp has made steps towards, is make it possible to interact with some subset of the R API safely from C++ contexts. This has always been possible with e.g. R_ToplevelExec() and R_ExecWithCleanup(), and now things are even better with R_UnwindProtect(). In theory, as a prototype, an R package could provide a 'safe' C++ interface to the R API using R_UnwindProtect() and friends as appropriate, and client packages could import and link to that package to gain access to the interface. Code generators (as Rcpp Attributes does) can handle some of the pain in these interfaces, so that users are mostly insulated from the nitty gritty details.
>

I agree that we should strive to provide tools that make it safer, but note that it still requires participation of the users - they have to use such facilities or else they hit the same problem. So we can only fix this for the future, but let's start now.


> I agree that the content of Tomas's post is very helpful, especially since I expect many R programmers who dip their toes into the C++ world are not aware of the caveats of talking to R from C++. However, I don't think it's helpful to recommend "don't use C++"; rather, I believe the question should be, "what can we do to make it possible to easily and safely interact with R from C++?". Because, as I understand it, all of the problems raised are solvable: either through a well-defined C++ interface, or through better education.
>

I think the recommendation would be different if such tools existed, but they don't. It was based on the current reality which is not so rosy.  Apparently the post had its effect of mobilizing C++ proponents to do something about it, which is great, because if this leads to some solution, the recommendation in the future may change to "use C++ using tools XYZ".


> I'll add my own opinion: writing correct C code is an incredibly difficult task. C++, while obviously not perfect, makes things substantially easier with tools like RAII, the STL, smart pointers, and so on. And I strongly believe that C++ (with Rcpp) is still a better choice than C for new users who want to interface with R from compiled code.
>

My take is that Rcpp makes the interface *look* easier, but you still have to understand more about the R API that you think. Hence it much easier to write buggy code. Personally, that's why I don't like it (apart from the code bloat), because things are hidden that will get you into trouble, whereas using the C API is at least very clear - you have to understand what it's doing when you use it. That said, I'm obviously biased since I know a lot about R internals ;) so this doesn't necessarily generalize.


> tl;dr: I (and I think most others) just wish the summary had a more positive outlook for the future of C++ with R.
>

Well, unless someone actually takes the initiative there is no reason to believe in a bright future of C++. As we have seen with the lack of adoption of CXXR (which I thought was an incredible achievement), not enough people seem to really care about C++. If that is not true, then let's come out of hiding, get together and address it (it seems that this thread is a good start).

Cheers,
Simon



> Best,
> Kevin
>
> On Fri, Mar 29, 2019 at 10:16 AM Simon Urbanek
> <[hidden email]> wrote:
>>
>> Jim,
>>
>> I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about.
>>
>> I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important.
>>
>> Cheers,
>> Simon
>>
>>
>>> On Mar 29, 2019, at 11:19 AM, Jim Hester <[hidden email]> wrote:
>>>
>>> First, thank you to Tomas for writing his recent post[0] on the R
>>> developer blog. It raised important issues in interfacing R's C API
>>> and C++ code.
>>>
>>> However I do _not_ think the conclusion reached in the post is helpful
>>>> don’t use C++ to interface with R
>>>
>>> There are now more than 1,600 packages on CRAN using C++, the time is
>>> long past when that type of warning is going to be useful to the R
>>> community.
>>>
>>> These same issues will also occur with any newer language (such as
>>> Rust or Julia[1]) which uses RAII to manage resources and tries to
>>> interface with R. It doesn't seem a productive way forward for R to
>>> say it can't interface with these languages without first doing
>>> expensive copies into an intermediate heap.
>>>
>>> The advice to avoid C++ is also antithetical to John Chambers vision
>>> of first S and R as a interface language (from Extending R [2])
>>>
>>>> The *interface* principle has always been central to R and to S
>>> before. An interface to subroutines was _the_ way to extend the first
>>> version of S. Subroutine interfaces have continued to be central to R.
>>>
>>> The book also has extensive sections on both C++ (via Rcpp) and Julia,
>>> so clearly John thinks these are legitimate ways to extend R.
>>>
>>> So if 'don't use C++' is not realistic and the current R API does not
>>> allow safe use of C++ exceptions what are the alternatives?
>>>
>>> One thing we could do is look how this is handled in other languages
>>> written in C which also use longjmp for errors.
>>>
>>> Lua is one example, they provide an alternative interface;
>>> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
>>> an error code rather long jumping. These interfaces can then be safely
>>> wrapped by RAII - exception based languages.
>>>
>>> This alternative error code interface is not just useful for C++, but
>>> also for resource cleanup in C, it is currently non-trivial to handle
>>> cleanup in all the possible cases a longjmp can occur (interrupts,
>>> warnings, custom conditions, timeouts any allocation etc.) even with R
>>> finalizers.
>>>
>>> It is past time for R to consider a non-jumpy C interface, so it can
>>> continue to be used as an effective interface to programming routines
>>> in the years to come.
>>>
>>> [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
>>> [1]: https://github.com/JuliaLang/julia/issues/28606
>>> [2]: https://doi.org/10.1201/9781315381305
>>> [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
>>> [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Romain Francois-3
tl;dr: we need better C++ tools and documentation.

We collectively know more now with the rise of tools like rchk and improved documentation such as Tomas’s post. That’s a start, but it appears that there still is a lot of knowledge that would deserve to be promoted to actual documentation of best practices.

I think it is important to not equate C++ as a language, and Rcpp.

Also, C++ is not just RAII.

RAII is an important part of how Rcpp was conceived for sure, but it’s not the only thing C++ can bring as a language. Templates, lambdas, the stl are examples of things that can be used for expressiveness when just accessing data without interfering with R, calling R api functions ...

It would be nice that the usual « you should do that only if you know what you’re doing » be transformed to precise documentation, and maybe become part of some better tool. If precautions have to be taken before calling such and such functions: that’s ok. What are they ? Can we embed that in some tool.

 It is easy enough to enscope code that uses potentially jumpy code into a c++ lambda. This could be together with recommendations such as the body of the lambda shall only use POC data structures.

This is similar to precautions you’d take when writing concurrent code.

Romain

> Le 30 mars 2019 à 00:58, Simon Urbanek <[hidden email]> a écrit :
>
> Kevin,
>
>
>> On Mar 29, 2019, at 17:01, Kevin Ushey <[hidden email]> wrote:
>>
>> I think it's also worth saying that some of these issues affect C code
>> as well; e.g. this is not safe:
>>
>>   FILE* f = fopen(...);
>>   Rf_eval(...);
>>   fclose(f);
>
> I fully agree, but developers using C are well aware of the necessity of handling lifespan of objects explicitly, so at least there are no surprises.
>
>
>> whereas the C++ equivalent would likely handle closing of the file in the destructor. In other words, I think many users just may not be cognizant of the fact that most R APIs can longjmp, and what that implies for cleanup of allocated resources. R_alloc() may help solve the issue specifically for memory allocations, but for any library interface that has a 'open' and 'close' step, the same sort of issue will arise.
>
> Well, I hope that anyone writing native code in package is well aware of that and will use an external pointer with finalizer to clean up native objects in any 3rd party library that are created during the call.
>
>
>> What I believe we should do, and what Rcpp has made steps towards, is make it possible to interact with some subset of the R API safely from C++ contexts. This has always been possible with e.g. R_ToplevelExec() and R_ExecWithCleanup(), and now things are even better with R_UnwindProtect(). In theory, as a prototype, an R package could provide a 'safe' C++ interface to the R API using R_UnwindProtect() and friends as appropriate, and client packages could import and link to that package to gain access to the interface. Code generators (as Rcpp Attributes does) can handle some of the pain in these interfaces, so that users are mostly insulated from the nitty gritty details.
>
> I agree that we should strive to provide tools that make it safer, but note that it still requires participation of the users - they have to use such facilities or else they hit the same problem. So we can only fix this for the future, but let's start now.
>
>
>> I agree that the content of Tomas's post is very helpful, especially since I expect many R programmers who dip their toes into the C++ world are not aware of the caveats of talking to R from C++. However, I don't think it's helpful to recommend "don't use C++"; rather, I believe the question should be, "what can we do to make it possible to easily and safely interact with R from C++?". Because, as I understand it, all of the problems raised are solvable: either through a well-defined C++ interface, or through better education.
>
> I think the recommendation would be different if such tools existed, but they don't. It was based on the current reality which is not so rosy.  Apparently the post had its effect of mobilizing C++ proponents to do something about it, which is great, because if this leads to some solution, the recommendation in the future may change to "use C++ using tools XYZ".
>
>
>> I'll add my own opinion: writing correct C code is an incredibly difficult task. C++, while obviously not perfect, makes things substantially easier with tools like RAII, the STL, smart pointers, and so on. And I strongly believe that C++ (with Rcpp) is still a better choice than C for new users who want to interface with R from compiled code.
>
> My take is that Rcpp makes the interface *look* easier, but you still have to understand more about the R API that you think. Hence it much easier to write buggy code. Personally, that's why I don't like it (apart from the code bloat), because things are hidden that will get you into trouble, whereas using the C API is at least very clear - you have to understand what it's doing when you use it. That said, I'm obviously biased since I know a lot about R internals ;) so this doesn't necessarily generalize.
>
>
>> tl;dr: I (and I think most others) just wish the summary had a more positive outlook for the future of C++ with R.
>
> Well, unless someone actually takes the initiative there is no reason to believe in a bright future of C++. As we have seen with the lack of adoption of CXXR (which I thought was an incredible achievement), not enough people seem to really care about C++. If that is not true, then let's come out of hiding, get together and address it (it seems that this thread is a good start).
>
> Cheers,
> Simon
>
>
>
>> Best,
>> Kevin
>>
>> On Fri, Mar 29, 2019 at 10:16 AM Simon Urbanek
>> <[hidden email]> wrote:
>>>
>>> Jim,
>>>
>>> I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about.
>>>
>>> I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important.
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>> On Mar 29, 2019, at 11:19 AM, Jim Hester <[hidden email]> wrote:
>>>>
>>>> First, thank you to Tomas for writing his recent post[0] on the R
>>>> developer blog. It raised important issues in interfacing R's C API
>>>> and C++ code.
>>>>
>>>> However I do _not_ think the conclusion reached in the post is helpful
>>>>> don’t use C++ to interface with R
>>>>
>>>> There are now more than 1,600 packages on CRAN using C++, the time is
>>>> long past when that type of warning is going to be useful to the R
>>>> community.
>>>>
>>>> These same issues will also occur with any newer language (such as
>>>> Rust or Julia[1]) which uses RAII to manage resources and tries to
>>>> interface with R. It doesn't seem a productive way forward for R to
>>>> say it can't interface with these languages without first doing
>>>> expensive copies into an intermediate heap.
>>>>
>>>> The advice to avoid C++ is also antithetical to John Chambers vision
>>>> of first S and R as a interface language (from Extending R [2])
>>>>
>>>>> The *interface* principle has always been central to R and to S
>>>> before. An interface to subroutines was _the_ way to extend the first
>>>> version of S. Subroutine interfaces have continued to be central to R.
>>>>
>>>> The book also has extensive sections on both C++ (via Rcpp) and Julia,
>>>> so clearly John thinks these are legitimate ways to extend R.
>>>>
>>>> So if 'don't use C++' is not realistic and the current R API does not
>>>> allow safe use of C++ exceptions what are the alternatives?
>>>>
>>>> One thing we could do is look how this is handled in other languages
>>>> written in C which also use longjmp for errors.
>>>>
>>>> Lua is one example, they provide an alternative interface;
>>>> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
>>>> an error code rather long jumping. These interfaces can then be safely
>>>> wrapped by RAII - exception based languages.
>>>>
>>>> This alternative error code interface is not just useful for C++, but
>>>> also for resource cleanup in C, it is currently non-trivial to handle
>>>> cleanup in all the possible cases a longjmp can occur (interrupts,
>>>> warnings, custom conditions, timeouts any allocation etc.) even with R
>>>> finalizers.
>>>>
>>>> It is past time for R to consider a non-jumpy C interface, so it can
>>>> continue to be used as an effective interface to programming routines
>>>> in the years to come.
>>>>
>>>> [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
>>>> [1]: https://github.com/JuliaLang/julia/issues/28606
>>>> [2]: https://doi.org/10.1201/9781315381305
>>>> [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
>>>> [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

cstrato
In reply to this post by Simon Urbanek
Hi,

It may or may not be of interest to you but an example how to access R
functions from C++ can be found in the ROOT C++ framework, ROOT Version
6, see e.g.:
https://root.cern/doc/master/classROOT_1_1R_1_1TRInterface.html
with an example shown in:
https://root.cern/doc/master/r_2example_8C.html

BTW, I strongly disagree that C++ does not have  a bright future!

Best regards,
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at
_._._._._._._._._._._._._._._._._._


P.S.: Accessing a complete C++ program (based on ROOT Version 5) from R
is shown in my Bioconductor package 'xps'

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

R devel mailing list
In reply to this post by Romain Francois-3
It's great to see the community mobilize to try to resolve this issue.
Obviously C++ has become a big part of R extensions, so it would be nice
to have clear guidelines and tools to be able to use C++ safely with the
R API.

Unfortunately doing this will probably require a fair bit of work.  If
R-core where to do this it would take away from other valuable
improvements they could be making on R itself.  Given there is already a
supported and documented extension mechanism with access to the R API
via C, I can see why R-core might be reluctant to divert resources from
R development to add the same level of support for C++.

Obviously it would be impossible to try to provide better documentation
and/or mechanisms for C++ extensions without some R-core involvement,
but it seems like much of the grunt work could be done by others.  I
unfortunately have no C++ experience so cannot help here, but hopefully
there are others that have the experience and the recognition in the
community to offer to help and have their offer accepted.  Perhaps
R-consortium can even fund, although given the level of expertise
required here the funding may need to be meaningful.

That seems like the natural step here.  Someone with the qualifications
to do so either volunteers or is funded to do this, and hopefully R-core
agrees to provide input and final stamp of approval.  The documentation
is probably more straightforward, as tools will need more work from
R-core to integrate.  It is possible R-core may decline to do this, but
absent someone actually offering to put in the hard work it's all
theoretical.

Respectfully,

Brodie.

On 3/30/19 3:59 AM, Romain Francois wrote:

> tl;dr: we need better C++ tools and documentation.
>
> We collectively know more now with the rise of tools like rchk and improved documentation such as Tomas’s post. That’s a start, but it appears that there still is a lot of knowledge that would deserve to be promoted to actual documentation of best practices.
>
> I think it is important to not equate C++ as a language, and Rcpp.
>
> Also, C++ is not just RAII.
>
> RAII is an important part of how Rcpp was conceived for sure, but it’s not the only thing C++ can bring as a language. Templates, lambdas, the stl are examples of things that can be used for expressiveness when just accessing data without interfering with R, calling R api functions ...
>
> It would be nice that the usual « you should do that only if you know what you’re doing » be transformed to precise documentation, and maybe become part of some better tool. If precautions have to be taken before calling such and such functions: that’s ok. What are they ? Can we embed that in some tool.
>
>   It is easy enough to enscope code that uses potentially jumpy code into a c++ lambda. This could be together with recommendations such as the body of the lambda shall only use POC data structures.
>
> This is similar to precautions you’d take when writing concurrent code.
>
> Romain
>
>> Le 30 mars 2019 à 00:58, Simon Urbanek <[hidden email]> a écrit :
>>
>> Kevin,
>>
>>
>>> On Mar 29, 2019, at 17:01, Kevin Ushey <[hidden email]> wrote:
>>>
>>> I think it's also worth saying that some of these issues affect C code
>>> as well; e.g. this is not safe:
>>>
>>>    FILE* f = fopen(...);
>>>    Rf_eval(...);
>>>    fclose(f);
>>
>> I fully agree, but developers using C are well aware of the necessity of handling lifespan of objects explicitly, so at least there are no surprises.
>>
>>
>>> whereas the C++ equivalent would likely handle closing of the file in the destructor. In other words, I think many users just may not be cognizant of the fact that most R APIs can longjmp, and what that implies for cleanup of allocated resources. R_alloc() may help solve the issue specifically for memory allocations, but for any library interface that has a 'open' and 'close' step, the same sort of issue will arise.
>>
>> Well, I hope that anyone writing native code in package is well aware of that and will use an external pointer with finalizer to clean up native objects in any 3rd party library that are created during the call.
>>
>>
>>> What I believe we should do, and what Rcpp has made steps towards, is make it possible to interact with some subset of the R API safely from C++ contexts. This has always been possible with e.g. R_ToplevelExec() and R_ExecWithCleanup(), and now things are even better with R_UnwindProtect(). In theory, as a prototype, an R package could provide a 'safe' C++ interface to the R API using R_UnwindProtect() and friends as appropriate, and client packages could import and link to that package to gain access to the interface. Code generators (as Rcpp Attributes does) can handle some of the pain in these interfaces, so that users are mostly insulated from the nitty gritty details.
>>
>> I agree that we should strive to provide tools that make it safer, but note that it still requires participation of the users - they have to use such facilities or else they hit the same problem. So we can only fix this for the future, but let's start now.
>>
>>
>>> I agree that the content of Tomas's post is very helpful, especially since I expect many R programmers who dip their toes into the C++ world are not aware of the caveats of talking to R from C++. However, I don't think it's helpful to recommend "don't use C++"; rather, I believe the question should be, "what can we do to make it possible to easily and safely interact with R from C++?". Because, as I understand it, all of the problems raised are solvable: either through a well-defined C++ interface, or through better education.
>>
>> I think the recommendation would be different if such tools existed, but they don't. It was based on the current reality which is not so rosy.  Apparently the post had its effect of mobilizing C++ proponents to do something about it, which is great, because if this leads to some solution, the recommendation in the future may change to "use C++ using tools XYZ".
>>
>>
>>> I'll add my own opinion: writing correct C code is an incredibly difficult task. C++, while obviously not perfect, makes things substantially easier with tools like RAII, the STL, smart pointers, and so on. And I strongly believe that C++ (with Rcpp) is still a better choice than C for new users who want to interface with R from compiled code.
>>
>> My take is that Rcpp makes the interface *look* easier, but you still have to understand more about the R API that you think. Hence it much easier to write buggy code. Personally, that's why I don't like it (apart from the code bloat), because things are hidden that will get you into trouble, whereas using the C API is at least very clear - you have to understand what it's doing when you use it. That said, I'm obviously biased since I know a lot about R internals ;) so this doesn't necessarily generalize.
>>
>>
>>> tl;dr: I (and I think most others) just wish the summary had a more positive outlook for the future of C++ with R.
>>
>> Well, unless someone actually takes the initiative there is no reason to believe in a bright future of C++. As we have seen with the lack of adoption of CXXR (which I thought was an incredible achievement), not enough people seem to really care about C++. If that is not true, then let's come out of hiding, get together and address it (it seems that this thread is a good start).
>>
>> Cheers,
>> Simon
>>
>>
>>
>>> Best,
>>> Kevin
>>>
>>> On Fri, Mar 29, 2019 at 10:16 AM Simon Urbanek
>>> <[hidden email]> wrote:
>>>>
>>>> Jim,
>>>>
>>>> I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about.
>>>>
>>>> I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important.
>>>>
>>>> Cheers,
>>>> Simon
>>>>
>>>>
>>>>> On Mar 29, 2019, at 11:19 AM, Jim Hester <[hidden email]> wrote:
>>>>>
>>>>> First, thank you to Tomas for writing his recent post[0] on the R
>>>>> developer blog. It raised important issues in interfacing R's C API
>>>>> and C++ code.
>>>>>
>>>>> However I do _not_ think the conclusion reached in the post is helpful
>>>>>> don’t use C++ to interface with R
>>>>>
>>>>> There are now more than 1,600 packages on CRAN using C++, the time is
>>>>> long past when that type of warning is going to be useful to the R
>>>>> community.
>>>>>
>>>>> These same issues will also occur with any newer language (such as
>>>>> Rust or Julia[1]) which uses RAII to manage resources and tries to
>>>>> interface with R. It doesn't seem a productive way forward for R to
>>>>> say it can't interface with these languages without first doing
>>>>> expensive copies into an intermediate heap.
>>>>>
>>>>> The advice to avoid C++ is also antithetical to John Chambers vision
>>>>> of first S and R as a interface language (from Extending R [2])
>>>>>
>>>>>> The *interface* principle has always been central to R and to S
>>>>> before. An interface to subroutines was _the_ way to extend the first
>>>>> version of S. Subroutine interfaces have continued to be central to R.
>>>>>
>>>>> The book also has extensive sections on both C++ (via Rcpp) and Julia,
>>>>> so clearly John thinks these are legitimate ways to extend R.
>>>>>
>>>>> So if 'don't use C++' is not realistic and the current R API does not
>>>>> allow safe use of C++ exceptions what are the alternatives?
>>>>>
>>>>> One thing we could do is look how this is handled in other languages
>>>>> written in C which also use longjmp for errors.
>>>>>
>>>>> Lua is one example, they provide an alternative interface;
>>>>> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return
>>>>> an error code rather long jumping. These interfaces can then be safely
>>>>> wrapped by RAII - exception based languages.
>>>>>
>>>>> This alternative error code interface is not just useful for C++, but
>>>>> also for resource cleanup in C, it is currently non-trivial to handle
>>>>> cleanup in all the possible cases a longjmp can occur (interrupts,
>>>>> warnings, custom conditions, timeouts any allocation etc.) even with R
>>>>> finalizers.
>>>>>
>>>>> It is past time for R to consider a non-jumpy C interface, so it can
>>>>> continue to be used as an effective interface to programming routines
>>>>> in the years to come.
>>>>>
>>>>> [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/
>>>>> [1]: https://github.com/JuliaLang/julia/issues/28606
>>>>> [2]: https://doi.org/10.1201/9781315381305
>>>>> [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall
>>>>> [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Tomas Kalibera
In reply to this post by Romain Francois-3
On 3/30/19 8:59 AM, Romain Francois wrote:
> tl;dr: we need better C++ tools and documentation.
>
> We collectively know more now with the rise of tools like rchk and improved documentation such as Tomas’s post. That’s a start, but it appears that there still is a lot of knowledge that would deserve to be promoted to actual documentation of best practices.
Well there is quite a bit of knowledge in Writing R Extensions and many
problems could have been prevented had it been read more thoroughly by
package developers. The problem that C++ runs some functions
automatically (like destructors), should not be too hard to identify
based on what WRE says about the need for protection against garbage
collection.

 From my experience, one can learn most about R internals from debugging
and reading source code - when debugging PROTECT errors and other memory
errors/memory corruption, common problems caused by bugs in native C/C++
code - one needs to read and understand source code involved at all
layers, one needs to understand the documentation covering code at
different layers, and one has to think about these things, forming
hypotheses, narrowing down to smaller examples, etc.

My suggestion for package authors who write native code and want to
learn more, and who want to be responsible (these kinds of bugs affect
other packaged indirectly and can be woken up by inconsequential and
correct code changes, even in R runtime): test and debug your code hard
- look at UBSAN/ASAN/valgrind/rchk checks from CRAN and run these tools
yourself if needed. Run with strict barrier checking and with gctorture.
Write more tests to increase the coverage. Specifically now if you use
C++ code, try to read all of your related code and check you do not have
the problems I mentioned in my blog. Think of other related problems and
if you find about them, tell others. Make sure you only use the API from
Writing R Extensions (and R help system). If you really can't find
anything wrong about your package, but still want to learn more, try to
debug some bugs reported against R runtime or against your favorite
packages you use (or their CRAN check reports from various tools). In
addition to learning more about R internals, by spending much more time
on debugging you may also get a different perspective on some of the
things about C++ I pointed to. Finally, it would help us with the
problem we have now - that many R packages in C++ have serious bugs.

Tomas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Hugh Marera
Some of us are learning about development in R and use R in our work data
analysis pipelines. What is the best way to identify packages that
currently have these C++ problems? I would like to be able to help fix the
bugs but more importantly not use these packages in critical work
pipelines. Any C++ R package bug squashing events out there?

Regards

Hugh

On Mon, Apr 1, 2019 at 6:23 PM Tomas Kalibera <[hidden email]>
wrote:

> On 3/30/19 8:59 AM, Romain Francois wrote:
> > tl;dr: we need better C++ tools and documentation.
> >
> > We collectively know more now with the rise of tools like rchk and
> improved documentation such as Tomas’s post. That’s a start, but it appears
> that there still is a lot of knowledge that would deserve to be promoted to
> actual documentation of best practices.
> Well there is quite a bit of knowledge in Writing R Extensions and many
> problems could have been prevented had it been read more thoroughly by
> package developers. The problem that C++ runs some functions
> automatically (like destructors), should not be too hard to identify
> based on what WRE says about the need for protection against garbage
> collection.
>
>  From my experience, one can learn most about R internals from debugging
> and reading source code - when debugging PROTECT errors and other memory
> errors/memory corruption, common problems caused by bugs in native C/C++
> code - one needs to read and understand source code involved at all
> layers, one needs to understand the documentation covering code at
> different layers, and one has to think about these things, forming
> hypotheses, narrowing down to smaller examples, etc.
>
> My suggestion for package authors who write native code and want to
> learn more, and who want to be responsible (these kinds of bugs affect
> other packaged indirectly and can be woken up by inconsequential and
> correct code changes, even in R runtime): test and debug your code hard
> - look at UBSAN/ASAN/valgrind/rchk checks from CRAN and run these tools
> yourself if needed. Run with strict barrier checking and with gctorture.
> Write more tests to increase the coverage. Specifically now if you use
> C++ code, try to read all of your related code and check you do not have
> the problems I mentioned in my blog. Think of other related problems and
> if you find about them, tell others. Make sure you only use the API from
> Writing R Extensions (and R help system). If you really can't find
> anything wrong about your package, but still want to learn more, try to
> debug some bugs reported against R runtime or against your favorite
> packages you use (or their CRAN check reports from various tools). In
> addition to learning more about R internals, by spending much more time
> on debugging you may also get a different perspective on some of the
> things about C++ I pointed to. Finally, it would help us with the
> problem we have now - that many R packages in C++ have serious bugs.
>
> Tomas
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Use of C++ in Packages

Tomas Kalibera
On 4/24/19 6:41 PM, Hugh Marera wrote:
> Some of us are learning about development in R and use R in our work
> data analysis pipelines. What is the best way to identify packages
> that currently have these C++ problems? I would like to be able to
> help fix the bugs but more importantly not use these packages in
> critical work pipelines. Any C++ R package bug squashing events out there?

I think the best way available now is manual inspection/review of the
source code of the packages you are using for your critical work. Such
review should cover more than just dangerous use of C++ - a lot of
problems exist also in plain C code (using unexported API from R,
violating value semantics of R, other kinds of PROTECT errors, memory
leaks due to long jumps, etc). The review could be limited to the
context of your pipeline, on how the package is used there and whether
you have a reliable external process for validating the results.

Out of the problems I've mentioned in my blog, the worst for normal use
of packages is probably a PROTECT error on the fast path due to
allocation in a destructor or other function run automatically. Various
memory leaks or correctness problems on error paths (long jumps) may not
be a complete showstopper if you restart R often and if you have a
reliable way of validating results, but such issues would still make it
much harder to diagnose problems.

The simple steps may include looking at CRAN check results, if there
were any errors, warnings, notes, reports from analyzers (valgrind,
asan, ubsan, rchk). The analyzers _may_ be able to spot a PROTECT error
due to allocation in a destructor if one is lucky (in the case I
mentioned in the blog, there was an ASAN report), but I think manual
inspection is needed, and it can also reveal other problems.

Tomas

>
> Regards
>
> Hugh
>
> On Mon, Apr 1, 2019 at 6:23 PM Tomas Kalibera
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 3/30/19 8:59 AM, Romain Francois wrote:
>     > tl;dr: we need better C++ tools and documentation.
>     >
>     > We collectively know more now with the rise of tools like rchk
>     and improved documentation such as Tomas’s post. That’s a start,
>     but it appears that there still is a lot of knowledge that would
>     deserve to be promoted to actual documentation of best practices.
>     Well there is quite a bit of knowledge in Writing R Extensions and
>     many
>     problems could have been prevented had it been read more
>     thoroughly by
>     package developers. The problem that C++ runs some functions
>     automatically (like destructors), should not be too hard to identify
>     based on what WRE says about the need for protection against garbage
>     collection.
>
>      From my experience, one can learn most about R internals from
>     debugging
>     and reading source code - when debugging PROTECT errors and other
>     memory
>     errors/memory corruption, common problems caused by bugs in native
>     C/C++
>     code - one needs to read and understand source code involved at all
>     layers, one needs to understand the documentation covering code at
>     different layers, and one has to think about these things, forming
>     hypotheses, narrowing down to smaller examples, etc.
>
>     My suggestion for package authors who write native code and want to
>     learn more, and who want to be responsible (these kinds of bugs
>     affect
>     other packaged indirectly and can be woken up by inconsequential and
>     correct code changes, even in R runtime): test and debug your code
>     hard
>     - look at UBSAN/ASAN/valgrind/rchk checks from CRAN and run these
>     tools
>     yourself if needed. Run with strict barrier checking and with
>     gctorture.
>     Write more tests to increase the coverage. Specifically now if you
>     use
>     C++ code, try to read all of your related code and check you do
>     not have
>     the problems I mentioned in my blog. Think of other related
>     problems and
>     if you find about them, tell others. Make sure you only use the
>     API from
>     Writing R Extensions (and R help system). If you really can't find
>     anything wrong about your package, but still want to learn more,
>     try to
>     debug some bugs reported against R runtime or against your favorite
>     packages you use (or their CRAN check reports from various tools). In
>     addition to learning more about R internals, by spending much more
>     time
>     on debugging you may also get a different perspective on some of the
>     things about C++ I pointed to. Finally, it would help us with the
>     problem we have now - that many R packages in C++ have serious bugs.
>
>     Tomas
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel