PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()

Henrik Bengtsson-5
SUMMARY:

I'm proposing that R assert that 'connection' options have not changed
since first created such that R will produce the following error:

> fh <- file("a.txt", open = "w+")
> cat("hello\n", file = fh)
> close(fh)

> fh2 <- file("b.txt", open = "w+")
> cat("world\n", file = fh2)

> cat("hello again\n", file = fh)
Error in cat("hello again\n", file = fh) :
  invalid connection (non-existing 'conn_id')

Note that, currently in R, the latter silently writes to 'b.txt' - not
'a.txt' (for more details, see
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81).


BACKGROUND:

In R, connections are indexed by their (zero-based) row indices in the
table of available connections.  For example,

> fh <- file("a.txt", open = "w")
> showConnections(all = TRUE)
  description class      mode text   isopen   can read can write
0 "stdin"     "terminal" "r"  "text" "opened" "yes"    "no"
1 "stdout"    "terminal" "w"  "text" "opened" "no"     "yes"
2 "stderr"    "terminal" "w"  "text" "opened" "no"     "yes"
3 "a.txt"     "file"     "w"  "text" "opened" "no"     "yes"
> con <- getConnection(3)
> identical(con, fh)
[1] TRUE


ISSUE:

The problem with the current design/implementation where connections
are referred to by their index (only), is that

(i) the table of connections changes over time and
(ii) connection indices are recycled.

Because a `connection` object holds the connection row index, it means
that *the actual underlying connection that a `connection` object
refers to may change over its lifetime*.


SUGGESTION:

Make use of the 'Rconn' struct field 'id', which is unique, to assert
that the 'connection' object used is referring to the
original/expected connection.  The 'id' field is available via
attribute 'conn_id' part of a 'connection' object.


PATCH:

See attached 'connection.patch' file (or
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222).
The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function,
which looks up a connection by its index *and* the 'id' field. This
function is backward compatible with R_GetConnection(), which looks up
a connection by its index (only). In addition, R_GetConnection2() also
accepts 'sConn' of type integer, which the looks up the connection
similar to how the internal getConnection() function does it.

Comment: The patch is just one of many alternatives.  Hopefully, it
helps clarify what I'm suggesting.  It passes 'make check' and I've
tested it on a few packages of mine that make heavy use of different
types of connections.

In addition to "overridden" connections, the patch protects against
invalid 'connection':s that have been serialized, e.g.

> fh2 <- file("b.txt", open = "w+")
> saveRDS(fh2, file = "fh2.rds")
> fh3 <- readRDS("fh2.rds")
> attr(fh2, "conn_id")
<pointer: 0x78>
> attr(fh3, "conn_id")
<pointer: (nil)>  #<== NIL because external pointer was lost when serialized
> isOpen(fh2)
[1] TRUE
> isOpen(fh3)
Error in isOpen(fh3) : invalid connection ('conn_id' is NULL)

This is useful, when for instance 'connection':s are (incorrectly)
passed to background R sessions (e.g. PSOCK cluster nodes).


SEE ALSO:

* More details of the above are scribbled down on
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81
* R-devel post 'closeAllConnections() can really mess things up',
2016-10-30, https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html

All the best,

Henrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

connections.patch (20K) Download Attachment