PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()

Henrik Bengtsson-5
SUMMARY:

I'm proposing that R assert that 'connection' options have not changed
since first created such that R will produce the following error:

> fh <- file("a.txt", open = "w+")
> cat("hello\n", file = fh)
> close(fh)

> fh2 <- file("b.txt", open = "w+")
> cat("world\n", file = fh2)

> cat("hello again\n", file = fh)
Error in cat("hello again\n", file = fh) :
  invalid connection (non-existing 'conn_id')

Note that, currently in R, the latter silently writes to 'b.txt' - not
'a.txt' (for more details, see
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81).


BACKGROUND:

In R, connections are indexed by their (zero-based) row indices in the
table of available connections.  For example,

> fh <- file("a.txt", open = "w")
> showConnections(all = TRUE)
  description class      mode text   isopen   can read can write
0 "stdin"     "terminal" "r"  "text" "opened" "yes"    "no"
1 "stdout"    "terminal" "w"  "text" "opened" "no"     "yes"
2 "stderr"    "terminal" "w"  "text" "opened" "no"     "yes"
3 "a.txt"     "file"     "w"  "text" "opened" "no"     "yes"
> con <- getConnection(3)
> identical(con, fh)
[1] TRUE


ISSUE:

The problem with the current design/implementation where connections
are referred to by their index (only), is that

(i) the table of connections changes over time and
(ii) connection indices are recycled.

Because a `connection` object holds the connection row index, it means
that *the actual underlying connection that a `connection` object
refers to may change over its lifetime*.


SUGGESTION:

Make use of the 'Rconn' struct field 'id', which is unique, to assert
that the 'connection' object used is referring to the
original/expected connection.  The 'id' field is available via
attribute 'conn_id' part of a 'connection' object.


PATCH:

See attached 'connection.patch' file (or
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222).
The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function,
which looks up a connection by its index *and* the 'id' field. This
function is backward compatible with R_GetConnection(), which looks up
a connection by its index (only). In addition, R_GetConnection2() also
accepts 'sConn' of type integer, which the looks up the connection
similar to how the internal getConnection() function does it.

Comment: The patch is just one of many alternatives.  Hopefully, it
helps clarify what I'm suggesting.  It passes 'make check' and I've
tested it on a few packages of mine that make heavy use of different
types of connections.

In addition to "overridden" connections, the patch protects against
invalid 'connection':s that have been serialized, e.g.

> fh2 <- file("b.txt", open = "w+")
> saveRDS(fh2, file = "fh2.rds")
> fh3 <- readRDS("fh2.rds")
> attr(fh2, "conn_id")
<pointer: 0x78>
> attr(fh3, "conn_id")
<pointer: (nil)>  #<== NIL because external pointer was lost when serialized
> isOpen(fh2)
[1] TRUE
> isOpen(fh3)
Error in isOpen(fh3) : invalid connection ('conn_id' is NULL)

This is useful, when for instance 'connection':s are (incorrectly)
passed to background R sessions (e.g. PSOCK cluster nodes).


SEE ALSO:

* More details of the above are scribbled down on
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81
* R-devel post 'closeAllConnections() can really mess things up',
2016-10-30, https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html

All the best,

Henrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

connections.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()

Henrik Bengtsson-5
Bumping this thread in the hope to catch the attention from R core.

As I try to argue in my original post, given the existing internal
structure of connections, I don't think it's too hard to add protection
against the use of corrupted R connection.

Writing to corrupted connection is a mistake that currently may pass
silently while corrupting a non-intended target.

Henrik

On Tue, Oct 30, 2018, 19:51 Henrik Bengtsson <[hidden email]
wrote:

> SUMMARY:
>
> I'm proposing that R assert that 'connection' options have not changed
> since first created such that R will produce the following error:
>
> > fh <- file("a.txt", open = "w+")
> > cat("hello\n", file = fh)
> > close(fh)
>
> > fh2 <- file("b.txt", open = "w+")
> > cat("world\n", file = fh2)
>
> > cat("hello again\n", file = fh)
> Error in cat("hello again\n", file = fh) :
>   invalid connection (non-existing 'conn_id')
>
> Note that, currently in R, the latter silently writes to 'b.txt' - not
> 'a.txt' (for more details, see
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81).
>
>
> BACKGROUND:
>
> In R, connections are indexed by their (zero-based) row indices in the
> table of available connections.  For example,
>
> > fh <- file("a.txt", open = "w")
> > showConnections(all = TRUE)
>   description class      mode text   isopen   can read can write
> 0 "stdin"     "terminal" "r"  "text" "opened" "yes"    "no"
> 1 "stdout"    "terminal" "w"  "text" "opened" "no"     "yes"
> 2 "stderr"    "terminal" "w"  "text" "opened" "no"     "yes"
> 3 "a.txt"     "file"     "w"  "text" "opened" "no"     "yes"
> > con <- getConnection(3)
> > identical(con, fh)
> [1] TRUE
>
>
> ISSUE:
>
> The problem with the current design/implementation where connections
> are referred to by their index (only), is that
>
> (i) the table of connections changes over time and
> (ii) connection indices are recycled.
>
> Because a `connection` object holds the connection row index, it means
> that *the actual underlying connection that a `connection` object
> refers to may change over its lifetime*.
>
>
> SUGGESTION:
>
> Make use of the 'Rconn' struct field 'id', which is unique, to assert
> that the 'connection' object used is referring to the
> original/expected connection.  The 'id' field is available via
> attribute 'conn_id' part of a 'connection' object.
>
>
> PATCH:
>
> See attached 'connection.patch' file (or
>
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222
> ).
> The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function,
> which looks up a connection by its index *and* the 'id' field. This
> function is backward compatible with R_GetConnection(), which looks up
> a connection by its index (only). In addition, R_GetConnection2() also
> accepts 'sConn' of type integer, which the looks up the connection
> similar to how the internal getConnection() function does it.
>
> Comment: The patch is just one of many alternatives.  Hopefully, it
> helps clarify what I'm suggesting.  It passes 'make check' and I've
> tested it on a few packages of mine that make heavy use of different
> types of connections.
>
> In addition to "overridden" connections, the patch protects against
> invalid 'connection':s that have been serialized, e.g.
>
> > fh2 <- file("b.txt", open = "w+")
> > saveRDS(fh2, file = "fh2.rds")
> > fh3 <- readRDS("fh2.rds")
> > attr(fh2, "conn_id")
> <pointer: 0x78>
> > attr(fh3, "conn_id")
> <pointer: (nil)>  #<== NIL because external pointer was lost when
> serialized
> > isOpen(fh2)
> [1] TRUE
> > isOpen(fh3)
> Error in isOpen(fh3) : invalid connection ('conn_id' is NULL)
>
> This is useful, when for instance 'connection':s are (incorrectly)
> passed to background R sessions (e.g. PSOCK cluster nodes).
>
>
> SEE ALSO:
>
> * More details of the above are scribbled down on
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81
> * R-devel post 'closeAllConnections() can really mess things up',
> 2016-10-30,
> https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html
>
> All the best,
>
> Henrik
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel