Julia

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Julia

Oliver-3
On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:

> On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <[hidden email]> wrote:
> > S (and its derivatives and successors) promises that functions
> > will not change their arguments, so in an expression like
> >   val <- func(arg)
> > you know that arg will not be changed.  You can
> > do that by having func copy arg before doing anything,
> > but that uses space and time that you want to conserve.
> > If arg is not a named item in any environment then it
> > should be fine to write over the original because there
> > is no way the caller can detect that shortcut.  E.g., in
> >    cx <- cos(runif(n))
> > the cos function does not need to allocate new space for
> > its output, it can just write over its input because, without
> > a name attached to it, the caller has no way of looking
> > at what runif(n) returned.  If you did
> >    x <- runif(n)
> >    cx <- cos(x)

You have two names here, x and cx, hence
your example does not fit into what you want to explain.

A better example would be:
x <- runif(n)
x <- cos(x)



> > then cos would have to allocate new space for its output
> > because overwriting its input would affect a subsequent
> >    sum(x)
> > I suppose that end-users and function-writers could learn
> > to live with having to decide when to copy, but not having
> > to make that decision makes S more pleasant (and safer) to use.
> > I think that is a major reason that people are able to
> > share S code so easily.
>
> But don't forget the "Holy Grail" that Doug mentioned at the
> start of this thread: finding a flexible language that is also
> fast. Currently many R packages employ C/C++ components
> to compensate for the fact that the R interpreter can be slow,
> and the pass-by-value semantics of S provides no protection
> here.
[...]

The distinction imperative vs. functional has nothing to do
with the distinction interpreted vs. directly executed.




Thinking again on the problem that was mentioned here,
I think it might be circumvented.

Looking again at R's properties, looking again into U.Ligges "Programmieren in
R", I saw there was mentioned that in R anything (?!) is an object... so then it's
OOP; but also it was mentioned, R is a functional language. But this does not
mean it's purely functional or has no imperative data structures.

As R relies heavily on vectors, here we have an imperative datastructure.

So, it rather looks to me that "<-" does work in-place
on the vectors, even "<-" itself is a function (which does not matter for
the problem).

If thats true (I assume here, it is; correct me, if it's wrong),
then I think, assigning with "<<-" and assign() also would do an imperative
(in-place) change of the contents.

Then the copying-of-big-objects-when-passed-as-args problem can be circumvented
by working on either a variable in the GlobalEnv (and using "<<-", or using a
certain environment for the big data and passing it's name (and the variable)
as value to the function which then uses assign() and get() to work on that
data.
Then in-place modification should be possible.





>
> In 2008 Ross Ihaka and Duncan Temple Lang published the
> paper "Back to the Future: Lisp as a base for a statistical
> computing system" where they propose Common
> Lisp as a new foundation for R. They suggest that
> this could be done while maintaining the same
> familiar R syntax.
>
> A key requirement of any strategy is to maintain
> easy access to the huge universe of existing
> C/C++/Fortran numerical and graphics libraries,
> as these libraries are not likely to be rewritten.
>
> Thus there will always be a need for a foreign
> function interface, and the problem is to provide
> a flexible and type-safe language that does not
> force developers to use another unfamiliar,
> less flexible, and error-prone language to
> optimize the hot spots.

If I here "type safe" I rather would think about OCaml
or maybe Ada, but not LISP.

Also, LISP has so many "("'s and ")"'s,
that it's making people going crazy ;-)

Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

William Dunlap
No my examples are what I meant.  My point was that a function, say cos(),
can act like it does call-by-value but conserve memory when it can  if it can
distinguish between the case
    cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
and and the case
   x <- runif(n)
   cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
   sum(x)              # Otherwise sum(x) would return sum(cx)
The function needs to know if a memory block is referred to by a name in any environment
in order to do that.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: oliver [mailto:[hidden email]]
> Sent: Wednesday, March 07, 2012 10:22 AM
> To: Dominick Samperi
> Cc: William Dunlap; R-devel
> Subject: Re: [Rd] Julia
>
> On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <[hidden email]>
> wrote:
> > > S (and its derivatives and successors) promises that functions will
> > > not change their arguments, so in an expression like
> > >   val <- func(arg)
> > > you know that arg will not be changed.  You can do that by having
> > > func copy arg before doing anything, but that uses space and time
> > > that you want to conserve.
> > > If arg is not a named item in any environment then it should be fine
> > > to write over the original because there is no way the caller can
> > > detect that shortcut.  E.g., in
> > >    cx <- cos(runif(n))
> > > the cos function does not need to allocate new space for its output,
> > > it can just write over its input because, without a name attached to
> > > it, the caller has no way of looking at what runif(n) returned.  If
> > > you did
> > >    x <- runif(n)
> > >    cx <- cos(x)
>
> You have two names here, x and cx, hence your example does not fit into what
> you want to explain.
>
> A better example would be:
> x <- runif(n)
> x <- cos(x)
>
>
>
> > > then cos would have to allocate new space for its output because
> > > overwriting its input would affect a subsequent
> > >    sum(x)
> > > I suppose that end-users and function-writers could learn to live
> > > with having to decide when to copy, but not having to make that
> > > decision makes S more pleasant (and safer) to use.
> > > I think that is a major reason that people are able to share S code
> > > so easily.
> >
> > But don't forget the "Holy Grail" that Doug mentioned at the start of
> > this thread: finding a flexible language that is also fast. Currently
> > many R packages employ C/C++ components to compensate for the fact
> > that the R interpreter can be slow, and the pass-by-value semantics of
> > S provides no protection here.
> [...]
>
> The distinction imperative vs. functional has nothing to do with the distinction
> interpreted vs. directly executed.
>
>
>
>
> Thinking again on the problem that was mentioned here, I think it might be
> circumvented.
>
> Looking again at R's properties, looking again into U.Ligges "Programmieren in
> R", I saw there was mentioned that in R anything (?!) is an object... so then it's
> OOP; but also it was mentioned, R is a functional language. But this does not
> mean it's purely functional or has no imperative data structures.
>
> As R relies heavily on vectors, here we have an imperative datastructure.
>
> So, it rather looks to me that "<-" does work in-place on the vectors, even "<-"
> itself is a function (which does not matter for the problem).
>
> If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning
> with "<<-" and assign() also would do an imperative
> (in-place) change of the contents.
>
> Then the copying-of-big-objects-when-passed-as-args problem can be
> circumvented by working on either a variable in the GlobalEnv (and using "<<-",
> or using a certain environment for the big data and passing it's name (and the
> variable) as value to the function which then uses assign() and get() to work on
> that data.
> Then in-place modification should be possible.
>
>
>
>
>
> >
> > In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to
> > the Future: Lisp as a base for a statistical computing system" where
> > they propose Common Lisp as a new foundation for R. They suggest that
> > this could be done while maintaining the same familiar R syntax.
> >
> > A key requirement of any strategy is to maintain easy access to the
> > huge universe of existing C/C++/Fortran numerical and graphics
> > libraries, as these libraries are not likely to be rewritten.
> >
> > Thus there will always be a need for a foreign function interface, and
> > the problem is to provide a flexible and type-safe language that does
> > not force developers to use another unfamiliar, less flexible, and
> > error-prone language to optimize the hot spots.
>
> If I here "type safe" I rather would think about OCaml or maybe Ada, but not
> LISP.
>
> Also, LISP has so many "("'s and ")"'s,
> that it's making people going crazy ;-)
>
> Ciao,
>    Oliver
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

Oliver-3
Hi,

ok, thank you for clarifiying what you meant.
You only referred to the reusage of the args,
not of an already existing vector.
So I overgenerealized your example.

But when looking at your example,
and how I would implement the cos()
I doubt I would use copying the args
before calculating the result.

Just allocate a result-vector, and then place the cos()
of the input-vector into the result vector.

I didn't looked at how it is done in R,
but I would guess it's like that.


  In pseudo-Code something like that:
    cos_val[idx] = cos( input_val[idx] );

But R also handles complex data with cos()
so it will look a bit more laborious.

What I have seen so far from implementing C-extensions
for R is rather C-ish, and so you have the control
on many details. Copying the input just to read it
would not make sense here.

I doubt that R internally is doing that.
Or did you found that in the R-code?

The other problem, someone mentioned, was *changing* the contents
of a matrix... and that this is NO>T done in-place, when using
a function for it.
But the namespace-name / variable-name as "references" to the matrix
might solve that problem.


Ciao,
  Oliver



On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:

> No my examples are what I meant.  My point was that a function, say cos(),
> can act like it does call-by-value but conserve memory when it can  if it can
> distinguish between the case
>     cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
> and and the case
>    x <- runif(n)
>    cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
>    sum(x)              # Otherwise sum(x) would return sum(cx)
> The function needs to know if a memory block is referred to by a name in any environment
> in order to do that.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: oliver [mailto:[hidden email]]
> > Sent: Wednesday, March 07, 2012 10:22 AM
> > To: Dominick Samperi
> > Cc: William Dunlap; R-devel
> > Subject: Re: [Rd] Julia
> >
> > On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <[hidden email]>
> > wrote:
> > > > S (and its derivatives and successors) promises that functions will
> > > > not change their arguments, so in an expression like
> > > >   val <- func(arg)
> > > > you know that arg will not be changed.  You can do that by having
> > > > func copy arg before doing anything, but that uses space and time
> > > > that you want to conserve.
> > > > If arg is not a named item in any environment then it should be fine
> > > > to write over the original because there is no way the caller can
> > > > detect that shortcut.  E.g., in
> > > >    cx <- cos(runif(n))
> > > > the cos function does not need to allocate new space for its output,
> > > > it can just write over its input because, without a name attached to
> > > > it, the caller has no way of looking at what runif(n) returned.  If
> > > > you did
> > > >    x <- runif(n)
> > > >    cx <- cos(x)
> >
> > You have two names here, x and cx, hence your example does not fit into what
> > you want to explain.
> >
> > A better example would be:
> > x <- runif(n)
> > x <- cos(x)
> >
> >
> >
> > > > then cos would have to allocate new space for its output because
> > > > overwriting its input would affect a subsequent
> > > >    sum(x)
> > > > I suppose that end-users and function-writers could learn to live
> > > > with having to decide when to copy, but not having to make that
> > > > decision makes S more pleasant (and safer) to use.
> > > > I think that is a major reason that people are able to share S code
> > > > so easily.
> > >
> > > But don't forget the "Holy Grail" that Doug mentioned at the start of
> > > this thread: finding a flexible language that is also fast. Currently
> > > many R packages employ C/C++ components to compensate for the fact
> > > that the R interpreter can be slow, and the pass-by-value semantics of
> > > S provides no protection here.
> > [...]
> >
> > The distinction imperative vs. functional has nothing to do with the distinction
> > interpreted vs. directly executed.
> >
> >
> >
> >
> > Thinking again on the problem that was mentioned here, I think it might be
> > circumvented.
> >
> > Looking again at R's properties, looking again into U.Ligges "Programmieren in
> > R", I saw there was mentioned that in R anything (?!) is an object... so then it's
> > OOP; but also it was mentioned, R is a functional language. But this does not
> > mean it's purely functional or has no imperative data structures.
> >
> > As R relies heavily on vectors, here we have an imperative datastructure.
> >
> > So, it rather looks to me that "<-" does work in-place on the vectors, even "<-"
> > itself is a function (which does not matter for the problem).
> >
> > If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning
> > with "<<-" and assign() also would do an imperative
> > (in-place) change of the contents.
> >
> > Then the copying-of-big-objects-when-passed-as-args problem can be
> > circumvented by working on either a variable in the GlobalEnv (and using "<<-",
> > or using a certain environment for the big data and passing it's name (and the
> > variable) as value to the function which then uses assign() and get() to work on
> > that data.
> > Then in-place modification should be possible.
> >
> >
> >
> >
> >
> > >
> > > In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to
> > > the Future: Lisp as a base for a statistical computing system" where
> > > they propose Common Lisp as a new foundation for R. They suggest that
> > > this could be done while maintaining the same familiar R syntax.
> > >
> > > A key requirement of any strategy is to maintain easy access to the
> > > huge universe of existing C/C++/Fortran numerical and graphics
> > > libraries, as these libraries are not likely to be rewritten.
> > >
> > > Thus there will always be a need for a foreign function interface, and
> > > the problem is to provide a flexible and type-safe language that does
> > > not force developers to use another unfamiliar, less flexible, and
> > > error-prone language to optimize the hot spots.
> >
> > If I here "type safe" I rather would think about OCaml or maybe Ada, but not
> > LISP.
> >
> > Also, LISP has so many "("'s and ")"'s,
> > that it's making people going crazy ;-)
> >
> > Ciao,
> >    Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

Oliver-3
Ah, and you mean if it's an anonymous array
it could be reused directly from the args.

OK, now I see why you insist on the anonymous data thing.
I didn't grasped it even in my last mail.



But that somehow also relates to what I wrote about reusing an already
existing, named vector.

Just the moment of in-place-modification is different.

From
  x  <- runif(n)
  cx <- cos(x)

instead of
> >     cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value

to something like

  cx  <- runif(n)
  cos( cx, inplace=TRUE)

or

  cos( runif(n), inplace=TRUE)




This way it would be possible to specify the reusage
of the input *explicitly* (without  implicit rules
like anonymous vs. named values).



In Pseudo-Code something like that:

   if (in_place == TRUE )
   {
     input_val[idx] = cos( input_val[idx] );
     return input_val;
   }
   else
   {
     result_val = alloc_vec( LENGTH(input_val), ... );
     result_val[idx] = cos( input_val[idx] );
     return result_val;
   }



Is this matching, what you were looking for?


Ciao,
   Oliver


On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:

> Hi,
>
> ok, thank you for clarifiying what you meant.
> You only referred to the reusage of the args,
> not of an already existing vector.
> So I overgenerealized your example.
>
> But when looking at your example,
> and how I would implement the cos()
> I doubt I would use copying the args
> before calculating the result.
>
> Just allocate a result-vector, and then place the cos()
> of the input-vector into the result vector.
>
> I didn't looked at how it is done in R,
> but I would guess it's like that.
>
>
>   In pseudo-Code something like that:
>     cos_val[idx] = cos( input_val[idx] );
>
> But R also handles complex data with cos()
> so it will look a bit more laborious.
>
> What I have seen so far from implementing C-extensions
> for R is rather C-ish, and so you have the control
> on many details. Copying the input just to read it
> would not make sense here.
>
> I doubt that R internally is doing that.
> Or did you found that in the R-code?
>
> The other problem, someone mentioned, was *changing* the contents
> of a matrix... and that this is NO>T done in-place, when using
> a function for it.
> But the namespace-name / variable-name as "references" to the matrix
> might solve that problem.
>
>
> Ciao,
>   Oliver
>
>
>
> On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
> > No my examples are what I meant.  My point was that a function, say cos(),
> > can act like it does call-by-value but conserve memory when it can  if it can
> > distinguish between the case
> >     cx <- cos(x=runif(n)) # no allocation needed, use the input space for the return value
> > and and the case
> >    x <- runif(n)
> >    cx <- cos(x=x) # return value cannot reuse the argument's memory, so allocate space for return value
> >    sum(x)              # Otherwise sum(x) would return sum(cx)
> > The function needs to know if a memory block is referred to by a name in any environment
> > in order to do that.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> > > -----Original Message-----
> > > From: oliver [mailto:[hidden email]]
> > > Sent: Wednesday, March 07, 2012 10:22 AM
> > > To: Dominick Samperi
> > > Cc: William Dunlap; R-devel
> > > Subject: Re: [Rd] Julia
> > >
> > > On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > > > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap <[hidden email]>
> > > wrote:
> > > > > S (and its derivatives and successors) promises that functions will
> > > > > not change their arguments, so in an expression like
> > > > >   val <- func(arg)
> > > > > you know that arg will not be changed.  You can do that by having
> > > > > func copy arg before doing anything, but that uses space and time
> > > > > that you want to conserve.
> > > > > If arg is not a named item in any environment then it should be fine
> > > > > to write over the original because there is no way the caller can
> > > > > detect that shortcut.  E.g., in
> > > > >    cx <- cos(runif(n))
> > > > > the cos function does not need to allocate new space for its output,
> > > > > it can just write over its input because, without a name attached to
> > > > > it, the caller has no way of looking at what runif(n) returned.  If
> > > > > you did
> > > > >    x <- runif(n)
> > > > >    cx <- cos(x)
> > >
> > > You have two names here, x and cx, hence your example does not fit into what
> > > you want to explain.
> > >
> > > A better example would be:
> > > x <- runif(n)
> > > x <- cos(x)
> > >
> > >
> > >
> > > > > then cos would have to allocate new space for its output because
> > > > > overwriting its input would affect a subsequent
> > > > >    sum(x)
> > > > > I suppose that end-users and function-writers could learn to live
> > > > > with having to decide when to copy, but not having to make that
> > > > > decision makes S more pleasant (and safer) to use.
> > > > > I think that is a major reason that people are able to share S code
> > > > > so easily.
> > > >
> > > > But don't forget the "Holy Grail" that Doug mentioned at the start of
> > > > this thread: finding a flexible language that is also fast. Currently
> > > > many R packages employ C/C++ components to compensate for the fact
> > > > that the R interpreter can be slow, and the pass-by-value semantics of
> > > > S provides no protection here.
> > > [...]
> > >
> > > The distinction imperative vs. functional has nothing to do with the distinction
> > > interpreted vs. directly executed.
> > >
> > >
> > >
> > >
> > > Thinking again on the problem that was mentioned here, I think it might be
> > > circumvented.
> > >
> > > Looking again at R's properties, looking again into U.Ligges "Programmieren in
> > > R", I saw there was mentioned that in R anything (?!) is an object... so then it's
> > > OOP; but also it was mentioned, R is a functional language. But this does not
> > > mean it's purely functional or has no imperative data structures.
> > >
> > > As R relies heavily on vectors, here we have an imperative datastructure.
> > >
> > > So, it rather looks to me that "<-" does work in-place on the vectors, even "<-"
> > > itself is a function (which does not matter for the problem).
> > >
> > > If thats true (I assume here, it is; correct me, if it's wrong), then I think, assigning
> > > with "<<-" and assign() also would do an imperative
> > > (in-place) change of the contents.
> > >
> > > Then the copying-of-big-objects-when-passed-as-args problem can be
> > > circumvented by working on either a variable in the GlobalEnv (and using "<<-",
> > > or using a certain environment for the big data and passing it's name (and the
> > > variable) as value to the function which then uses assign() and get() to work on
> > > that data.
> > > Then in-place modification should be possible.
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > In 2008 Ross Ihaka and Duncan Temple Lang published the paper "Back to
> > > > the Future: Lisp as a base for a statistical computing system" where
> > > > they propose Common Lisp as a new foundation for R. They suggest that
> > > > this could be done while maintaining the same familiar R syntax.
> > > >
> > > > A key requirement of any strategy is to maintain easy access to the
> > > > huge universe of existing C/C++/Fortran numerical and graphics
> > > > libraries, as these libraries are not likely to be rewritten.
> > > >
> > > > Thus there will always be a need for a foreign function interface, and
> > > > the problem is to provide a flexible and type-safe language that does
> > > > not force developers to use another unfamiliar, less flexible, and
> > > > error-prone language to optimize the hot spots.
> > >
> > > If I here "type safe" I rather would think about OCaml or maybe Ada, but not
> > > LISP.
> > >
> > > Also, LISP has so many "("'s and ")"'s,
> > > that it's making people going crazy ;-)
> > >
> > > Ciao,
> > >    Oliver
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

William Dunlap
So you propose an inplace=TRUE/FALSE entry for each
argument to each function which may may want to avoid
allocating memory?  The major problem is that the function
writer has no idea what the value of inplace should be,
as it depends on how the function gets called.  This makes
writing reusable functions (hence packages) difficult.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: oliver [mailto:[hidden email]]
> Sent: Thursday, March 08, 2012 7:40 AM
> To: William Dunlap
> Cc: R-devel
> Subject: Re: [Rd] Julia
>
> Ah, and you mean if it's an anonymous array it could be reused directly from the
> args.
>
> OK, now I see why you insist on the anonymous data thing.
> I didn't grasped it even in my last mail.
>
>
>
> But that somehow also relates to what I wrote about reusing an already
> existing, named vector.
>
> Just the moment of in-place-modification is different.
>
> From
>   x  <- runif(n)
>   cx <- cos(x)
>
> instead of
> > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > space for the return value
>
> to something like
>
>   cx  <- runif(n)
>   cos( cx, inplace=TRUE)
>
> or
>
>   cos( runif(n), inplace=TRUE)
>
>
>
>
> This way it would be possible to specify the reusage of the input *explicitly*
> (without  implicit rules like anonymous vs. named values).
>
>
>
> In Pseudo-Code something like that:
>
>    if (in_place == TRUE )
>    {
>      input_val[idx] = cos( input_val[idx] );
>      return input_val;
>    }
>    else
>    {
>      result_val = alloc_vec( LENGTH(input_val), ... );
>      result_val[idx] = cos( input_val[idx] );
>      return result_val;
>    }
>
>
>
> Is this matching, what you were looking for?
>
>
> Ciao,
>    Oliver
>
>
> On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
> > Hi,
> >
> > ok, thank you for clarifiying what you meant.
> > You only referred to the reusage of the args, not of an already
> > existing vector.
> > So I overgenerealized your example.
> >
> > But when looking at your example,
> > and how I would implement the cos()
> > I doubt I would use copying the args
> > before calculating the result.
> >
> > Just allocate a result-vector, and then place the cos() of the
> > input-vector into the result vector.
> >
> > I didn't looked at how it is done in R, but I would guess it's like
> > that.
> >
> >
> >   In pseudo-Code something like that:
> >     cos_val[idx] = cos( input_val[idx] );
> >
> > But R also handles complex data with cos() so it will look a bit more
> > laborious.
> >
> > What I have seen so far from implementing C-extensions for R is rather
> > C-ish, and so you have the control on many details. Copying the input
> > just to read it would not make sense here.
> >
> > I doubt that R internally is doing that.
> > Or did you found that in the R-code?
> >
> > The other problem, someone mentioned, was *changing* the contents of a
> > matrix... and that this is NO>T done in-place, when using a function
> > for it.
> > But the namespace-name / variable-name as "references" to the matrix
> > might solve that problem.
> >
> >
> > Ciao,
> >   Oliver
> >
> >
> >
> > On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
> > > No my examples are what I meant.  My point was that a function, say
> > > cos(), can act like it does call-by-value but conserve memory when
> > > it can  if it can distinguish between the case
> > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > space for the return value and and the case
> > >    x <- runif(n)
> > >    cx <- cos(x=x) # return value cannot reuse the argument's memory, so
> allocate space for return value
> > >    sum(x)              # Otherwise sum(x) would return sum(cx)
> > > The function needs to know if a memory block is referred to by a
> > > name in any environment in order to do that.
> > >
> > > Bill Dunlap
> > > Spotfire, TIBCO Software
> > > wdunlap tibco.com
> > >
> > > > -----Original Message-----
> > > > From: oliver [mailto:[hidden email]]
> > > > Sent: Wednesday, March 07, 2012 10:22 AM
> > > > To: Dominick Samperi
> > > > Cc: William Dunlap; R-devel
> > > > Subject: Re: [Rd] Julia
> > > >
> > > > On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > > > > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap
> > > > > <[hidden email]>
> > > > wrote:
> > > > > > S (and its derivatives and successors) promises that functions
> > > > > > will not change their arguments, so in an expression like
> > > > > >   val <- func(arg)
> > > > > > you know that arg will not be changed.  You can do that by
> > > > > > having func copy arg before doing anything, but that uses
> > > > > > space and time that you want to conserve.
> > > > > > If arg is not a named item in any environment then it should
> > > > > > be fine to write over the original because there is no way the
> > > > > > caller can detect that shortcut.  E.g., in
> > > > > >    cx <- cos(runif(n))
> > > > > > the cos function does not need to allocate new space for its
> > > > > > output, it can just write over its input because, without a
> > > > > > name attached to it, the caller has no way of looking at what
> > > > > > runif(n) returned.  If you did
> > > > > >    x <- runif(n)
> > > > > >    cx <- cos(x)
> > > >
> > > > You have two names here, x and cx, hence your example does not fit
> > > > into what you want to explain.
> > > >
> > > > A better example would be:
> > > > x <- runif(n)
> > > > x <- cos(x)
> > > >
> > > >
> > > >
> > > > > > then cos would have to allocate new space for its output
> > > > > > because overwriting its input would affect a subsequent
> > > > > >    sum(x)
> > > > > > I suppose that end-users and function-writers could learn to
> > > > > > live with having to decide when to copy, but not having to
> > > > > > make that decision makes S more pleasant (and safer) to use.
> > > > > > I think that is a major reason that people are able to share S
> > > > > > code so easily.
> > > > >
> > > > > But don't forget the "Holy Grail" that Doug mentioned at the
> > > > > start of this thread: finding a flexible language that is also
> > > > > fast. Currently many R packages employ C/C++ components to
> > > > > compensate for the fact that the R interpreter can be slow, and
> > > > > the pass-by-value semantics of S provides no protection here.
> > > > [...]
> > > >
> > > > The distinction imperative vs. functional has nothing to do with
> > > > the distinction interpreted vs. directly executed.
> > > >
> > > >
> > > >
> > > >
> > > > Thinking again on the problem that was mentioned here, I think it
> > > > might be circumvented.
> > > >
> > > > Looking again at R's properties, looking again into U.Ligges
> > > > "Programmieren in R", I saw there was mentioned that in R anything
> > > > (?!) is an object... so then it's OOP; but also it was mentioned,
> > > > R is a functional language. But this does not mean it's purely functional or
> has no imperative data structures.
> > > >
> > > > As R relies heavily on vectors, here we have an imperative datastructure.
> > > >
> > > > So, it rather looks to me that "<-" does work in-place on the vectors, even
> "<-"
> > > > itself is a function (which does not matter for the problem).
> > > >
> > > > If thats true (I assume here, it is; correct me, if it's wrong),
> > > > then I think, assigning with "<<-" and assign() also would do an
> > > > imperative
> > > > (in-place) change of the contents.
> > > >
> > > > Then the copying-of-big-objects-when-passed-as-args problem can be
> > > > circumvented by working on either a variable in the GlobalEnv (and
> > > > using "<<-", or using a certain environment for the big data and
> > > > passing it's name (and the
> > > > variable) as value to the function which then uses assign() and
> > > > get() to work on that data.
> > > > Then in-place modification should be possible.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > In 2008 Ross Ihaka and Duncan Temple Lang published the paper
> > > > > "Back to the Future: Lisp as a base for a statistical computing
> > > > > system" where they propose Common Lisp as a new foundation for
> > > > > R. They suggest that this could be done while maintaining the same
> familiar R syntax.
> > > > >
> > > > > A key requirement of any strategy is to maintain easy access to
> > > > > the huge universe of existing C/C++/Fortran numerical and
> > > > > graphics libraries, as these libraries are not likely to be rewritten.
> > > > >
> > > > > Thus there will always be a need for a foreign function
> > > > > interface, and the problem is to provide a flexible and
> > > > > type-safe language that does not force developers to use another
> > > > > unfamiliar, less flexible, and error-prone language to optimize the hot
> spots.
> > > >
> > > > If I here "type safe" I rather would think about OCaml or maybe
> > > > Ada, but not LISP.
> > > >
> > > > Also, LISP has so many "("'s and ")"'s, that it's making people
> > > > going crazy ;-)
> > > >
> > > > Ciao,
> > > >    Oliver
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

Oliver-3
I don't think that using in-place modification as a general property would make
sense.

In-place modification brings in side-effects and that would mean that
the order of evaluation can change the result.

To get reliable results, the order of evaluation should not be
the reason for different results, and thats the reason, why
the functional approach is much better for reliable programs.

So, in general I would say, this feature is a no-no.
In general I would rather discourage in-place modification.

For some certain cases it might help...
but for such certain cases either such a boolean flag
or programming a sparate module in C would make sense.

There could also be a global in-place-flag that might be used (via options
maybe) but if such a thing would be implemented, the default value should be
FALSE.



Ciao,
   Oliver


On Thu, Mar 08, 2012 at 04:21:42PM +0000, William Dunlap wrote:

> So you propose an inplace=TRUE/FALSE entry for each
> argument to each function which may may want to avoid
> allocating memory?  The major problem is that the function
> writer has no idea what the value of inplace should be,
> as it depends on how the function gets called.  This makes
> writing reusable functions (hence packages) difficult.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: oliver [mailto:[hidden email]]
> > Sent: Thursday, March 08, 2012 7:40 AM
> > To: William Dunlap
> > Cc: R-devel
> > Subject: Re: [Rd] Julia
> >
> > Ah, and you mean if it's an anonymous array it could be reused directly from the
> > args.
> >
> > OK, now I see why you insist on the anonymous data thing.
> > I didn't grasped it even in my last mail.
> >
> >
> >
> > But that somehow also relates to what I wrote about reusing an already
> > existing, named vector.
> >
> > Just the moment of in-place-modification is different.
> >
> > From
> >   x  <- runif(n)
> >   cx <- cos(x)
> >
> > instead of
> > > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > > space for the return value
> >
> > to something like
> >
> >   cx  <- runif(n)
> >   cos( cx, inplace=TRUE)
> >
> > or
> >
> >   cos( runif(n), inplace=TRUE)
> >
> >
> >
> >
> > This way it would be possible to specify the reusage of the input *explicitly*
> > (without  implicit rules like anonymous vs. named values).
> >
> >
> >
> > In Pseudo-Code something like that:
> >
> >    if (in_place == TRUE )
> >    {
> >      input_val[idx] = cos( input_val[idx] );
> >      return input_val;
> >    }
> >    else
> >    {
> >      result_val = alloc_vec( LENGTH(input_val), ... );
> >      result_val[idx] = cos( input_val[idx] );
> >      return result_val;
> >    }
> >
> >
> >
> > Is this matching, what you were looking for?
> >
> >
> > Ciao,
> >    Oliver
> >
> >
> > On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
> > > Hi,
> > >
> > > ok, thank you for clarifiying what you meant.
> > > You only referred to the reusage of the args, not of an already
> > > existing vector.
> > > So I overgenerealized your example.
> > >
> > > But when looking at your example,
> > > and how I would implement the cos()
> > > I doubt I would use copying the args
> > > before calculating the result.
> > >
> > > Just allocate a result-vector, and then place the cos() of the
> > > input-vector into the result vector.
> > >
> > > I didn't looked at how it is done in R, but I would guess it's like
> > > that.
> > >
> > >
> > >   In pseudo-Code something like that:
> > >     cos_val[idx] = cos( input_val[idx] );
> > >
> > > But R also handles complex data with cos() so it will look a bit more
> > > laborious.
> > >
> > > What I have seen so far from implementing C-extensions for R is rather
> > > C-ish, and so you have the control on many details. Copying the input
> > > just to read it would not make sense here.
> > >
> > > I doubt that R internally is doing that.
> > > Or did you found that in the R-code?
> > >
> > > The other problem, someone mentioned, was *changing* the contents of a
> > > matrix... and that this is NO>T done in-place, when using a function
> > > for it.
> > > But the namespace-name / variable-name as "references" to the matrix
> > > might solve that problem.
> > >
> > >
> > > Ciao,
> > >   Oliver
> > >
> > >
> > >
> > > On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
> > > > No my examples are what I meant.  My point was that a function, say
> > > > cos(), can act like it does call-by-value but conserve memory when
> > > > it can  if it can distinguish between the case
> > > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > > space for the return value and and the case
> > > >    x <- runif(n)
> > > >    cx <- cos(x=x) # return value cannot reuse the argument's memory, so
> > allocate space for return value
> > > >    sum(x)              # Otherwise sum(x) would return sum(cx)
> > > > The function needs to know if a memory block is referred to by a
> > > > name in any environment in order to do that.
> > > >
> > > > Bill Dunlap
> > > > Spotfire, TIBCO Software
> > > > wdunlap tibco.com
> > > >
> > > > > -----Original Message-----
> > > > > From: oliver [mailto:[hidden email]]
> > > > > Sent: Wednesday, March 07, 2012 10:22 AM
> > > > > To: Dominick Samperi
> > > > > Cc: William Dunlap; R-devel
> > > > > Subject: Re: [Rd] Julia
> > > > >
> > > > > On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > > > > > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap
> > > > > > <[hidden email]>
> > > > > wrote:
> > > > > > > S (and its derivatives and successors) promises that functions
> > > > > > > will not change their arguments, so in an expression like
> > > > > > >   val <- func(arg)
> > > > > > > you know that arg will not be changed.  You can do that by
> > > > > > > having func copy arg before doing anything, but that uses
> > > > > > > space and time that you want to conserve.
> > > > > > > If arg is not a named item in any environment then it should
> > > > > > > be fine to write over the original because there is no way the
> > > > > > > caller can detect that shortcut.  E.g., in
> > > > > > >    cx <- cos(runif(n))
> > > > > > > the cos function does not need to allocate new space for its
> > > > > > > output, it can just write over its input because, without a
> > > > > > > name attached to it, the caller has no way of looking at what
> > > > > > > runif(n) returned.  If you did
> > > > > > >    x <- runif(n)
> > > > > > >    cx <- cos(x)
> > > > >
> > > > > You have two names here, x and cx, hence your example does not fit
> > > > > into what you want to explain.
> > > > >
> > > > > A better example would be:
> > > > > x <- runif(n)
> > > > > x <- cos(x)
> > > > >
> > > > >
> > > > >
> > > > > > > then cos would have to allocate new space for its output
> > > > > > > because overwriting its input would affect a subsequent
> > > > > > >    sum(x)
> > > > > > > I suppose that end-users and function-writers could learn to
> > > > > > > live with having to decide when to copy, but not having to
> > > > > > > make that decision makes S more pleasant (and safer) to use.
> > > > > > > I think that is a major reason that people are able to share S
> > > > > > > code so easily.
> > > > > >
> > > > > > But don't forget the "Holy Grail" that Doug mentioned at the
> > > > > > start of this thread: finding a flexible language that is also
> > > > > > fast. Currently many R packages employ C/C++ components to
> > > > > > compensate for the fact that the R interpreter can be slow, and
> > > > > > the pass-by-value semantics of S provides no protection here.
> > > > > [...]
> > > > >
> > > > > The distinction imperative vs. functional has nothing to do with
> > > > > the distinction interpreted vs. directly executed.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thinking again on the problem that was mentioned here, I think it
> > > > > might be circumvented.
> > > > >
> > > > > Looking again at R's properties, looking again into U.Ligges
> > > > > "Programmieren in R", I saw there was mentioned that in R anything
> > > > > (?!) is an object... so then it's OOP; but also it was mentioned,
> > > > > R is a functional language. But this does not mean it's purely functional or
> > has no imperative data structures.
> > > > >
> > > > > As R relies heavily on vectors, here we have an imperative datastructure.
> > > > >
> > > > > So, it rather looks to me that "<-" does work in-place on the vectors, even
> > "<-"
> > > > > itself is a function (which does not matter for the problem).
> > > > >
> > > > > If thats true (I assume here, it is; correct me, if it's wrong),
> > > > > then I think, assigning with "<<-" and assign() also would do an
> > > > > imperative
> > > > > (in-place) change of the contents.
> > > > >
> > > > > Then the copying-of-big-objects-when-passed-as-args problem can be
> > > > > circumvented by working on either a variable in the GlobalEnv (and
> > > > > using "<<-", or using a certain environment for the big data and
> > > > > passing it's name (and the
> > > > > variable) as value to the function which then uses assign() and
> > > > > get() to work on that data.
> > > > > Then in-place modification should be possible.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > In 2008 Ross Ihaka and Duncan Temple Lang published the paper
> > > > > > "Back to the Future: Lisp as a base for a statistical computing
> > > > > > system" where they propose Common Lisp as a new foundation for
> > > > > > R. They suggest that this could be done while maintaining the same
> > familiar R syntax.
> > > > > >
> > > > > > A key requirement of any strategy is to maintain easy access to
> > > > > > the huge universe of existing C/C++/Fortran numerical and
> > > > > > graphics libraries, as these libraries are not likely to be rewritten.
> > > > > >
> > > > > > Thus there will always be a need for a foreign function
> > > > > > interface, and the problem is to provide a flexible and
> > > > > > type-safe language that does not force developers to use another
> > > > > > unfamiliar, less flexible, and error-prone language to optimize the hot
> > spots.
> > > > >
> > > > > If I here "type safe" I rather would think about OCaml or maybe
> > > > > Ada, but not LISP.
> > > > >
> > > > > Also, LISP has so many "("'s and ")"'s, that it's making people
> > > > > going crazy ;-)
> > > > >
> > > > > Ciao,
> > > > >    Oliver
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

William Dunlap
I guess my point is not getting across.  The user should see
the functional programming style but under the hood the
evaluator should be able to use whatever memory and time
saving tricks it can.  Julia seems to want to be a nonfunctional
language, which I think makes it harder to write the sort of
easily reusable functions that S allows.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: oliver [mailto:[hidden email]]
> Sent: Thursday, March 08, 2012 2:23 PM
> To: William Dunlap
> Cc: R-devel
> Subject: Re: [Rd] Julia
>
> I don't think that using in-place modification as a general property would make
> sense.
>
> In-place modification brings in side-effects and that would mean that the order
> of evaluation can change the result.
>
> To get reliable results, the order of evaluation should not be the reason for
> different results, and thats the reason, why the functional approach is much
> better for reliable programs.
>
> So, in general I would say, this feature is a no-no.
> In general I would rather discourage in-place modification.
>
> For some certain cases it might help...
> but for such certain cases either such a boolean flag or programming a sparate
> module in C would make sense.
>
> There could also be a global in-place-flag that might be used (via options
> maybe) but if such a thing would be implemented, the default value should be
> FALSE.
>
>
>
> Ciao,
>    Oliver
>
>
> On Thu, Mar 08, 2012 at 04:21:42PM +0000, William Dunlap wrote:
> > So you propose an inplace=TRUE/FALSE entry for each argument to each
> > function which may may want to avoid allocating memory?  The major
> > problem is that the function writer has no idea what the value of
> > inplace should be, as it depends on how the function gets called.
> > This makes writing reusable functions (hence packages) difficult.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> > > -----Original Message-----
> > > From: oliver [mailto:[hidden email]]
> > > Sent: Thursday, March 08, 2012 7:40 AM
> > > To: William Dunlap
> > > Cc: R-devel
> > > Subject: Re: [Rd] Julia
> > >
> > > Ah, and you mean if it's an anonymous array it could be reused
> > > directly from the args.
> > >
> > > OK, now I see why you insist on the anonymous data thing.
> > > I didn't grasped it even in my last mail.
> > >
> > >
> > >
> > > But that somehow also relates to what I wrote about reusing an
> > > already existing, named vector.
> > >
> > > Just the moment of in-place-modification is different.
> > >
> > > From
> > >   x  <- runif(n)
> > >   cx <- cos(x)
> > >
> > > instead of
> > > > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > > > space for the return value
> > >
> > > to something like
> > >
> > >   cx  <- runif(n)
> > >   cos( cx, inplace=TRUE)
> > >
> > > or
> > >
> > >   cos( runif(n), inplace=TRUE)
> > >
> > >
> > >
> > >
> > > This way it would be possible to specify the reusage of the input
> > > *explicitly* (without  implicit rules like anonymous vs. named values).
> > >
> > >
> > >
> > > In Pseudo-Code something like that:
> > >
> > >    if (in_place == TRUE )
> > >    {
> > >      input_val[idx] = cos( input_val[idx] );
> > >      return input_val;
> > >    }
> > >    else
> > >    {
> > >      result_val = alloc_vec( LENGTH(input_val), ... );
> > >      result_val[idx] = cos( input_val[idx] );
> > >      return result_val;
> > >    }
> > >
> > >
> > >
> > > Is this matching, what you were looking for?
> > >
> > >
> > > Ciao,
> > >    Oliver
> > >
> > >
> > > On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
> > > > Hi,
> > > >
> > > > ok, thank you for clarifiying what you meant.
> > > > You only referred to the reusage of the args, not of an already
> > > > existing vector.
> > > > So I overgenerealized your example.
> > > >
> > > > But when looking at your example,
> > > > and how I would implement the cos() I doubt I would use copying
> > > > the args before calculating the result.
> > > >
> > > > Just allocate a result-vector, and then place the cos() of the
> > > > input-vector into the result vector.
> > > >
> > > > I didn't looked at how it is done in R, but I would guess it's
> > > > like that.
> > > >
> > > >
> > > >   In pseudo-Code something like that:
> > > >     cos_val[idx] = cos( input_val[idx] );
> > > >
> > > > But R also handles complex data with cos() so it will look a bit
> > > > more laborious.
> > > >
> > > > What I have seen so far from implementing C-extensions for R is
> > > > rather C-ish, and so you have the control on many details. Copying
> > > > the input just to read it would not make sense here.
> > > >
> > > > I doubt that R internally is doing that.
> > > > Or did you found that in the R-code?
> > > >
> > > > The other problem, someone mentioned, was *changing* the contents
> > > > of a matrix... and that this is NO>T done in-place, when using a
> > > > function for it.
> > > > But the namespace-name / variable-name as "references" to the
> > > > matrix might solve that problem.
> > > >
> > > >
> > > > Ciao,
> > > >   Oliver
> > > >
> > > >
> > > >
> > > > On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
> > > > > No my examples are what I meant.  My point was that a function,
> > > > > say cos(), can act like it does call-by-value but conserve
> > > > > memory when it can  if it can distinguish between the case
> > > > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > > > space for the return value and and the case
> > > > >    x <- runif(n)
> > > > >    cx <- cos(x=x) # return value cannot reuse the argument's
> > > > > memory, so
> > > allocate space for return value
> > > > >    sum(x)              # Otherwise sum(x) would return sum(cx)
> > > > > The function needs to know if a memory block is referred to by a
> > > > > name in any environment in order to do that.
> > > > >
> > > > > Bill Dunlap
> > > > > Spotfire, TIBCO Software
> > > > > wdunlap tibco.com
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: oliver [mailto:[hidden email]]
> > > > > > Sent: Wednesday, March 07, 2012 10:22 AM
> > > > > > To: Dominick Samperi
> > > > > > Cc: William Dunlap; R-devel
> > > > > > Subject: Re: [Rd] Julia
> > > > > >
> > > > > > On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > > > > > > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap
> > > > > > > <[hidden email]>
> > > > > > wrote:
> > > > > > > > S (and its derivatives and successors) promises that
> > > > > > > > functions will not change their arguments, so in an
> > > > > > > > expression like
> > > > > > > >   val <- func(arg)
> > > > > > > > you know that arg will not be changed.  You can do that by
> > > > > > > > having func copy arg before doing anything, but that uses
> > > > > > > > space and time that you want to conserve.
> > > > > > > > If arg is not a named item in any environment then it
> > > > > > > > should be fine to write over the original because there is
> > > > > > > > no way the caller can detect that shortcut.  E.g., in
> > > > > > > >    cx <- cos(runif(n))
> > > > > > > > the cos function does not need to allocate new space for
> > > > > > > > its output, it can just write over its input because,
> > > > > > > > without a name attached to it, the caller has no way of
> > > > > > > > looking at what
> > > > > > > > runif(n) returned.  If you did
> > > > > > > >    x <- runif(n)
> > > > > > > >    cx <- cos(x)
> > > > > >
> > > > > > You have two names here, x and cx, hence your example does not
> > > > > > fit into what you want to explain.
> > > > > >
> > > > > > A better example would be:
> > > > > > x <- runif(n)
> > > > > > x <- cos(x)
> > > > > >
> > > > > >
> > > > > >
> > > > > > > > then cos would have to allocate new space for its output
> > > > > > > > because overwriting its input would affect a subsequent
> > > > > > > >    sum(x)
> > > > > > > > I suppose that end-users and function-writers could learn
> > > > > > > > to live with having to decide when to copy, but not having
> > > > > > > > to make that decision makes S more pleasant (and safer) to use.
> > > > > > > > I think that is a major reason that people are able to
> > > > > > > > share S code so easily.
> > > > > > >
> > > > > > > But don't forget the "Holy Grail" that Doug mentioned at the
> > > > > > > start of this thread: finding a flexible language that is
> > > > > > > also fast. Currently many R packages employ C/C++ components
> > > > > > > to compensate for the fact that the R interpreter can be
> > > > > > > slow, and the pass-by-value semantics of S provides no protection
> here.
> > > > > > [...]
> > > > > >
> > > > > > The distinction imperative vs. functional has nothing to do
> > > > > > with the distinction interpreted vs. directly executed.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thinking again on the problem that was mentioned here, I think
> > > > > > it might be circumvented.
> > > > > >
> > > > > > Looking again at R's properties, looking again into U.Ligges
> > > > > > "Programmieren in R", I saw there was mentioned that in R
> > > > > > anything
> > > > > > (?!) is an object... so then it's OOP; but also it was
> > > > > > mentioned, R is a functional language. But this does not mean
> > > > > > it's purely functional or
> > > has no imperative data structures.
> > > > > >
> > > > > > As R relies heavily on vectors, here we have an imperative
> datastructure.
> > > > > >
> > > > > > So, it rather looks to me that "<-" does work in-place on the
> > > > > > vectors, even
> > > "<-"
> > > > > > itself is a function (which does not matter for the problem).
> > > > > >
> > > > > > If thats true (I assume here, it is; correct me, if it's
> > > > > > wrong), then I think, assigning with "<<-" and assign() also
> > > > > > would do an imperative
> > > > > > (in-place) change of the contents.
> > > > > >
> > > > > > Then the copying-of-big-objects-when-passed-as-args problem
> > > > > > can be circumvented by working on either a variable in the
> > > > > > GlobalEnv (and using "<<-", or using a certain environment for
> > > > > > the big data and passing it's name (and the
> > > > > > variable) as value to the function which then uses assign()
> > > > > > and
> > > > > > get() to work on that data.
> > > > > > Then in-place modification should be possible.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > In 2008 Ross Ihaka and Duncan Temple Lang published the
> > > > > > > paper "Back to the Future: Lisp as a base for a statistical
> > > > > > > computing system" where they propose Common Lisp as a new
> > > > > > > foundation for R. They suggest that this could be done while
> > > > > > > maintaining the same
> > > familiar R syntax.
> > > > > > >
> > > > > > > A key requirement of any strategy is to maintain easy access
> > > > > > > to the huge universe of existing C/C++/Fortran numerical and
> > > > > > > graphics libraries, as these libraries are not likely to be rewritten.
> > > > > > >
> > > > > > > Thus there will always be a need for a foreign function
> > > > > > > interface, and the problem is to provide a flexible and
> > > > > > > type-safe language that does not force developers to use
> > > > > > > another unfamiliar, less flexible, and error-prone language
> > > > > > > to optimize the hot
> > > spots.
> > > > > >
> > > > > > If I here "type safe" I rather would think about OCaml or
> > > > > > maybe Ada, but not LISP.
> > > > > >
> > > > > > Also, LISP has so many "("'s and ")"'s, that it's making
> > > > > > people going crazy ;-)
> > > > > >
> > > > > > Ciao,
> > > > > >    Oliver
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Julia

Oliver-3
Aha, ok.

So you not especially look at that one feature (like the anonymous
evaluation tricks), but in general want to ask for better internal optimization.

Especially with your example of the anonymous (unnamed) values given to
a function, I would ask: do you want to write programs all without
using names/variables?
I think this would be much harder than just to add a boolean flag
with inplace=TRUE.
So your reply on the flag-proposal as too much of bad usability
I need to reply with: it's even worse to write code without
variable names and put anything into anonymous datastructures,
that are called inside function application, and inside each of the arguments
there will be more of unnamed calculations.
You will end up not only with a mess, but also with slower calculations,
because unnamed ressources must be calculated more than once if they will be used
more than once.

So I think that you are just asking for more internal optimizations.
Fine.

But I think internal intermediate code (that can be optimized)
would be better than that one "enhancement" of reusing anonymous
data for the output.


Ciao,
   Oliver


On Thu, Mar 08, 2012 at 10:27:22PM +0000, William Dunlap wrote:

> I guess my point is not getting across.  The user should see
> the functional programming style but under the hood the
> evaluator should be able to use whatever memory and time
> saving tricks it can.  Julia seems to want to be a nonfunctional
> language, which I think makes it harder to write the sort of
> easily reusable functions that S allows.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: oliver [mailto:[hidden email]]
> > Sent: Thursday, March 08, 2012 2:23 PM
> > To: William Dunlap
> > Cc: R-devel
> > Subject: Re: [Rd] Julia
> >
> > I don't think that using in-place modification as a general property would make
> > sense.
> >
> > In-place modification brings in side-effects and that would mean that the order
> > of evaluation can change the result.
> >
> > To get reliable results, the order of evaluation should not be the reason for
> > different results, and thats the reason, why the functional approach is much
> > better for reliable programs.
> >
> > So, in general I would say, this feature is a no-no.
> > In general I would rather discourage in-place modification.
> >
> > For some certain cases it might help...
> > but for such certain cases either such a boolean flag or programming a sparate
> > module in C would make sense.
> >
> > There could also be a global in-place-flag that might be used (via options
> > maybe) but if such a thing would be implemented, the default value should be
> > FALSE.
> >
> >
> >
> > Ciao,
> >    Oliver
> >
> >
> > On Thu, Mar 08, 2012 at 04:21:42PM +0000, William Dunlap wrote:
> > > So you propose an inplace=TRUE/FALSE entry for each argument to each
> > > function which may may want to avoid allocating memory?  The major
> > > problem is that the function writer has no idea what the value of
> > > inplace should be, as it depends on how the function gets called.
> > > This makes writing reusable functions (hence packages) difficult.
> > >
> > > Bill Dunlap
> > > Spotfire, TIBCO Software
> > > wdunlap tibco.com
> > >
> > > > -----Original Message-----
> > > > From: oliver [mailto:[hidden email]]
> > > > Sent: Thursday, March 08, 2012 7:40 AM
> > > > To: William Dunlap
> > > > Cc: R-devel
> > > > Subject: Re: [Rd] Julia
> > > >
> > > > Ah, and you mean if it's an anonymous array it could be reused
> > > > directly from the args.
> > > >
> > > > OK, now I see why you insist on the anonymous data thing.
> > > > I didn't grasped it even in my last mail.
> > > >
> > > >
> > > >
> > > > But that somehow also relates to what I wrote about reusing an
> > > > already existing, named vector.
> > > >
> > > > Just the moment of in-place-modification is different.
> > > >
> > > > From
> > > >   x  <- runif(n)
> > > >   cx <- cos(x)
> > > >
> > > > instead of
> > > > > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > > > > space for the return value
> > > >
> > > > to something like
> > > >
> > > >   cx  <- runif(n)
> > > >   cos( cx, inplace=TRUE)
> > > >
> > > > or
> > > >
> > > >   cos( runif(n), inplace=TRUE)
> > > >
> > > >
> > > >
> > > >
> > > > This way it would be possible to specify the reusage of the input
> > > > *explicitly* (without  implicit rules like anonymous vs. named values).
> > > >
> > > >
> > > >
> > > > In Pseudo-Code something like that:
> > > >
> > > >    if (in_place == TRUE )
> > > >    {
> > > >      input_val[idx] = cos( input_val[idx] );
> > > >      return input_val;
> > > >    }
> > > >    else
> > > >    {
> > > >      result_val = alloc_vec( LENGTH(input_val), ... );
> > > >      result_val[idx] = cos( input_val[idx] );
> > > >      return result_val;
> > > >    }
> > > >
> > > >
> > > >
> > > > Is this matching, what you were looking for?
> > > >
> > > >
> > > > Ciao,
> > > >    Oliver
> > > >
> > > >
> > > > On Thu, Mar 08, 2012 at 02:56:24PM +0100, oliver wrote:
> > > > > Hi,
> > > > >
> > > > > ok, thank you for clarifiying what you meant.
> > > > > You only referred to the reusage of the args, not of an already
> > > > > existing vector.
> > > > > So I overgenerealized your example.
> > > > >
> > > > > But when looking at your example,
> > > > > and how I would implement the cos() I doubt I would use copying
> > > > > the args before calculating the result.
> > > > >
> > > > > Just allocate a result-vector, and then place the cos() of the
> > > > > input-vector into the result vector.
> > > > >
> > > > > I didn't looked at how it is done in R, but I would guess it's
> > > > > like that.
> > > > >
> > > > >
> > > > >   In pseudo-Code something like that:
> > > > >     cos_val[idx] = cos( input_val[idx] );
> > > > >
> > > > > But R also handles complex data with cos() so it will look a bit
> > > > > more laborious.
> > > > >
> > > > > What I have seen so far from implementing C-extensions for R is
> > > > > rather C-ish, and so you have the control on many details. Copying
> > > > > the input just to read it would not make sense here.
> > > > >
> > > > > I doubt that R internally is doing that.
> > > > > Or did you found that in the R-code?
> > > > >
> > > > > The other problem, someone mentioned, was *changing* the contents
> > > > > of a matrix... and that this is NO>T done in-place, when using a
> > > > > function for it.
> > > > > But the namespace-name / variable-name as "references" to the
> > > > > matrix might solve that problem.
> > > > >
> > > > >
> > > > > Ciao,
> > > > >   Oliver
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 07, 2012 at 07:10:43PM +0000, William Dunlap wrote:
> > > > > > No my examples are what I meant.  My point was that a function,
> > > > > > say cos(), can act like it does call-by-value but conserve
> > > > > > memory when it can  if it can distinguish between the case
> > > > > >     cx <- cos(x=runif(n)) # no allocation needed, use the input
> > > > > > space for the return value and and the case
> > > > > >    x <- runif(n)
> > > > > >    cx <- cos(x=x) # return value cannot reuse the argument's
> > > > > > memory, so
> > > > allocate space for return value
> > > > > >    sum(x)              # Otherwise sum(x) would return sum(cx)
> > > > > > The function needs to know if a memory block is referred to by a
> > > > > > name in any environment in order to do that.
> > > > > >
> > > > > > Bill Dunlap
> > > > > > Spotfire, TIBCO Software
> > > > > > wdunlap tibco.com
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: oliver [mailto:[hidden email]]
> > > > > > > Sent: Wednesday, March 07, 2012 10:22 AM
> > > > > > > To: Dominick Samperi
> > > > > > > Cc: William Dunlap; R-devel
> > > > > > > Subject: Re: [Rd] Julia
> > > > > > >
> > > > > > > On Tue, Mar 06, 2012 at 12:49:32PM -0500, Dominick Samperi wrote:
> > > > > > > > On Tue, Mar 6, 2012 at 11:44 AM, William Dunlap
> > > > > > > > <[hidden email]>
> > > > > > > wrote:
> > > > > > > > > S (and its derivatives and successors) promises that
> > > > > > > > > functions will not change their arguments, so in an
> > > > > > > > > expression like
> > > > > > > > >   val <- func(arg)
> > > > > > > > > you know that arg will not be changed.  You can do that by
> > > > > > > > > having func copy arg before doing anything, but that uses
> > > > > > > > > space and time that you want to conserve.
> > > > > > > > > If arg is not a named item in any environment then it
> > > > > > > > > should be fine to write over the original because there is
> > > > > > > > > no way the caller can detect that shortcut.  E.g., in
> > > > > > > > >    cx <- cos(runif(n))
> > > > > > > > > the cos function does not need to allocate new space for
> > > > > > > > > its output, it can just write over its input because,
> > > > > > > > > without a name attached to it, the caller has no way of
> > > > > > > > > looking at what
> > > > > > > > > runif(n) returned.  If you did
> > > > > > > > >    x <- runif(n)
> > > > > > > > >    cx <- cos(x)
> > > > > > >
> > > > > > > You have two names here, x and cx, hence your example does not
> > > > > > > fit into what you want to explain.
> > > > > > >
> > > > > > > A better example would be:
> > > > > > > x <- runif(n)
> > > > > > > x <- cos(x)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > then cos would have to allocate new space for its output
> > > > > > > > > because overwriting its input would affect a subsequent
> > > > > > > > >    sum(x)
> > > > > > > > > I suppose that end-users and function-writers could learn
> > > > > > > > > to live with having to decide when to copy, but not having
> > > > > > > > > to make that decision makes S more pleasant (and safer) to use.
> > > > > > > > > I think that is a major reason that people are able to
> > > > > > > > > share S code so easily.
> > > > > > > >
> > > > > > > > But don't forget the "Holy Grail" that Doug mentioned at the
> > > > > > > > start of this thread: finding a flexible language that is
> > > > > > > > also fast. Currently many R packages employ C/C++ components
> > > > > > > > to compensate for the fact that the R interpreter can be
> > > > > > > > slow, and the pass-by-value semantics of S provides no protection
> > here.
> > > > > > > [...]
> > > > > > >
> > > > > > > The distinction imperative vs. functional has nothing to do
> > > > > > > with the distinction interpreted vs. directly executed.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thinking again on the problem that was mentioned here, I think
> > > > > > > it might be circumvented.
> > > > > > >
> > > > > > > Looking again at R's properties, looking again into U.Ligges
> > > > > > > "Programmieren in R", I saw there was mentioned that in R
> > > > > > > anything
> > > > > > > (?!) is an object... so then it's OOP; but also it was
> > > > > > > mentioned, R is a functional language. But this does not mean
> > > > > > > it's purely functional or
> > > > has no imperative data structures.
> > > > > > >
> > > > > > > As R relies heavily on vectors, here we have an imperative
> > datastructure.
> > > > > > >
> > > > > > > So, it rather looks to me that "<-" does work in-place on the
> > > > > > > vectors, even
> > > > "<-"
> > > > > > > itself is a function (which does not matter for the problem).
> > > > > > >
> > > > > > > If thats true (I assume here, it is; correct me, if it's
> > > > > > > wrong), then I think, assigning with "<<-" and assign() also
> > > > > > > would do an imperative
> > > > > > > (in-place) change of the contents.
> > > > > > >
> > > > > > > Then the copying-of-big-objects-when-passed-as-args problem
> > > > > > > can be circumvented by working on either a variable in the
> > > > > > > GlobalEnv (and using "<<-", or using a certain environment for
> > > > > > > the big data and passing it's name (and the
> > > > > > > variable) as value to the function which then uses assign()
> > > > > > > and
> > > > > > > get() to work on that data.
> > > > > > > Then in-place modification should be possible.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > In 2008 Ross Ihaka and Duncan Temple Lang published the
> > > > > > > > paper "Back to the Future: Lisp as a base for a statistical
> > > > > > > > computing system" where they propose Common Lisp as a new
> > > > > > > > foundation for R. They suggest that this could be done while
> > > > > > > > maintaining the same
> > > > familiar R syntax.
> > > > > > > >
> > > > > > > > A key requirement of any strategy is to maintain easy access
> > > > > > > > to the huge universe of existing C/C++/Fortran numerical and
> > > > > > > > graphics libraries, as these libraries are not likely to be rewritten.
> > > > > > > >
> > > > > > > > Thus there will always be a need for a foreign function
> > > > > > > > interface, and the problem is to provide a flexible and
> > > > > > > > type-safe language that does not force developers to use
> > > > > > > > another unfamiliar, less flexible, and error-prone language
> > > > > > > > to optimize the hot
> > > > spots.
> > > > > > >
> > > > > > > If I here "type safe" I rather would think about OCaml or
> > > > > > > maybe Ada, but not LISP.
> > > > > > >
> > > > > > > Also, LISP has so many "("'s and ")"'s, that it's making
> > > > > > > people going crazy ;-)
> > > > > > >
> > > > > > > Ciao,
> > > > > > >    Oliver
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

R's copying of arguments (Re: Julia)

Oliver-3
In reply to this post by William Dunlap
Hello,

regarding the copying issue,
I would like to point to the

"Writing R-Extensions" documentation.

There it is mentio9ned, that functions of extensions
that use the .C interface normally do get their arguments
pre-copied...


In section 5.2:

  "There can be up to 65 further arguments giving R objects to be
  passed to compiled code. Normally these are copied before being
  passed in, and copied again to an R list object when the compiled
  code returns."

But for the .Call and .Extension interfaces this is NOT the case.



In section 5.9:
  "The .Call and .External interfaces allow much more control, but
  they also impose much greater responsibilities so need to be used
  with care. Neither .Call nor .External copy their arguments. You
  should treat arguments you receive through these interfaces as
  read-only."


Why is read-only preferred?

Please, see the discussion in section 5.9.10.

It's mentioned there, that a copy of an object in the R-language
not necessarily doies a real copy of that object, but instead of
this, just a "rerference" to the real data is created (two names
referring to one bulk of data). That's typical functional
programming: not a variable, but a name (and possibly more than one
name) bound to an object.


Of course, if yo change the orgiginal named value, when there
would be no copy of it, before changing it, then both names
would refer to the changed data.
of course that is not, what is wanted.

But what you also can see in section 5.9.10 is, that
there already is a mechanism (reference counting) that allows
to distinguish between unnamed and named object.

So, this is directly adressing the points you have mentioned in your
examples.

So, at least in principial, R allows to do in-place modifications
of object with the .Call interface.

You seem to refer to the .C interface, and I had explored the .Call
interface. That's the reason why you may insist on "it's copyied
always" and I wondered, what you were talking about, because the
.Call interface allowed me rather C-like raw style of programming
(and the user of it to decide, if copying will be done or not).

The mechanism to descide, if copying should be done or not,
also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
with NAMED you can get the number of references.

But later in that section it is mentioned, that - at least for now -
NAMED always returns the value 2.


  "Currently all arguments to a .Call call will have NAMED set to 2,
  and so users must assume that they need to be duplicated before
  alteration."
               (section 5.9.10, last sentence)


So, the in-place modification can be done already with the .Call
intefcae for example. But the decision if it is safe or not
is not supported at the moment.

So the situation is somewhere between: "it is possible" and
"R does not support a safe decision if, what is possible, also
can be recommended".
At the moment R rather deprecates in-place modification by default
(the save way, and I agree with this default).

But it's not true, that R in general copies arguments.

But this seems to be true for the .C interface.

Maybe a lot of performance-/memory-problems can be solved
by rewriting already existing packages, by providing them
via .Call instead of .C.


Ciao,
   Oliver




On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:

> S (and its derivatives and successors) promises that functions
> will not change their arguments, so in an expression like
>    val <- func(arg)
> you know that arg will not be changed.  You can
> do that by having func copy arg before doing anything,
> but that uses space and time that you want to conserve.
> If arg is not a named item in any environment then it
> should be fine to write over the original because there
> is no way the caller can detect that shortcut.  E.g., in
>     cx <- cos(runif(n))
> the cos function does not need to allocate new space for
> its output, it can just write over its input because, without
> a name attached to it, the caller has no way of looking
> at what runif(n) returned.  If you did
>     x <- runif(n)
>     cx <- cos(x)
> then cos would have to allocate new space for its output
> because overwriting its input would affect a subsequent
>     sum(x)
> I suppose that end-users and function-writers could learn
> to live with having to decide when to copy, but not having
> to make that decision makes S more pleasant (and safer) to use.
> I think that is a major reason that people are able to
> share S code so easily.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: oliver [mailto:[hidden email]]
> > Sent: Tuesday, March 06, 2012 1:12 AM
> > To: William Dunlap
> > Cc: Hervé Pagès; R-devel
> > Subject: Re: [Rd] Julia
> >
> > On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
> > [...]
> > > I find R's (& S+'s & S's) copy-on-write-if-not-copying-would-be-discoverable-
> > > by-the-uer machanism for giving the allusion of pass-by-value a good way
> > > to structure the contract between the function writer and the function user.
> > [...]
> >
> >
> > Can you elaborate more on this,
> > especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
> > stuff?
> >
> > What do you mean with discoverability of not-copying?
> >
> > Ciao,
> >    Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: R's copying of arguments (Re: Julia)

Hervé Pagès
Hi Oliver,

On 03/17/2012 08:35 AM, oliver wrote:

> Hello,
>
> regarding the copying issue,
> I would like to point to the
>
> "Writing R-Extensions" documentation.
>
> There it is mentio9ned, that functions of extensions
> that use the .C interface normally do get their arguments
> pre-copied...
>
>
> In section 5.2:
>
>    "There can be up to 65 further arguments giving R objects to be
>    passed to compiled code. Normally these are copied before being
>    passed in, and copied again to an R list object when the compiled
>    code returns."
>
> But for the .Call and .Extension interfaces this is NOT the case.
>
>
>
> In section 5.9:
>    "The .Call and .External interfaces allow much more control, but
>    they also impose much greater responsibilities so need to be used
>    with care. Neither .Call nor .External copy their arguments. You
>    should treat arguments you receive through these interfaces as
>    read-only."
>
>
> Why is read-only preferred?
>
> Please, see the discussion in section 5.9.10.
>
> It's mentioned there, that a copy of an object in the R-language
> not necessarily doies a real copy of that object, but instead of
> this, just a "rerference" to the real data is created (two names
> referring to one bulk of data). That's typical functional
> programming: not a variable, but a name (and possibly more than one
> name) bound to an object.
>
>
> Of course, if yo change the orgiginal named value, when there
> would be no copy of it, before changing it, then both names
> would refer to the changed data.
> of course that is not, what is wanted.
>
> But what you also can see in section 5.9.10 is, that
> there already is a mechanism (reference counting) that allows
> to distinguish between unnamed and named object.
>
> So, this is directly adressing the points you have mentioned in your
> examples.
>
> So, at least in principial, R allows to do in-place modifications
> of object with the .Call interface.
>
> You seem to refer to the .C interface, and I had explored the .Call
> interface. That's the reason why you may insist on "it's copyied
> always" and I wondered, what you were talking about, because the
> .Call interface allowed me rather C-like raw style of programming
> (and the user of it to decide, if copying will be done or not).
>
> The mechanism to descide, if copying should be done or not,
> also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
> with NAMED you can get the number of references.
>
> But later in that section it is mentioned, that - at least for now -
> NAMED always returns the value 2.
>
>
>    "Currently all arguments to a .Call call will have NAMED set to 2,
>    and so users must assume that they need to be duplicated before
>    alteration."
>                 (section 5.9.10, last sentence)
>
>
> So, the in-place modification can be done already with the .Call
> intefcae for example. But the decision if it is safe or not
> is not supported at the moment.
>
> So the situation is somewhere between: "it is possible" and
> "R does not support a safe decision if, what is possible, also
> can be recommended".
> At the moment R rather deprecates in-place modification by default
> (the save way, and I agree with this default).
>
> But it's not true, that R in general copies arguments.
>
> But this seems to be true for the .C interface.
>
> Maybe a lot of performance-/memory-problems can be solved
> by rewriting already existing packages, by providing them
> via .Call instead of .C.

My understanding is that most packages use the .C interface
because it's simpler to deal with and because they don't need
to pass complicated objects at the C level, just atomic vectors.
My guess is that it's probably rarely the case that the cost
of copying the arguments passed to .C is significant, but,
if that was the case, then they could always call .C() with
DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
section in the man page).

No need to switch to .Call

Cheers,
H.

>
>
> Ciao,
>     Oliver
>
>
>
>
> On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
>> S (and its derivatives and successors) promises that functions
>> will not change their arguments, so in an expression like
>>     val<- func(arg)
>> you know that arg will not be changed.  You can
>> do that by having func copy arg before doing anything,
>> but that uses space and time that you want to conserve.
>> If arg is not a named item in any environment then it
>> should be fine to write over the original because there
>> is no way the caller can detect that shortcut.  E.g., in
>>      cx<- cos(runif(n))
>> the cos function does not need to allocate new space for
>> its output, it can just write over its input because, without
>> a name attached to it, the caller has no way of looking
>> at what runif(n) returned.  If you did
>>      x<- runif(n)
>>      cx<- cos(x)
>> then cos would have to allocate new space for its output
>> because overwriting its input would affect a subsequent
>>      sum(x)
>> I suppose that end-users and function-writers could learn
>> to live with having to decide when to copy, but not having
>> to make that decision makes S more pleasant (and safer) to use.
>> I think that is a major reason that people are able to
>> share S code so easily.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>> -----Original Message-----
>>> From: oliver [mailto:[hidden email]]
>>> Sent: Tuesday, March 06, 2012 1:12 AM
>>> To: William Dunlap
>>> Cc: Hervé Pagès; R-devel
>>> Subject: Re: [Rd] Julia
>>>
>>> On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
>>> [...]
>>>> I find R's (&  S+'s&  S's) copy-on-write-if-not-copying-would-be-discoverable-
>>>> by-the-uer machanism for giving the allusion of pass-by-value a good way
>>>> to structure the contract between the function writer and the function user.
>>> [...]
>>>
>>>
>>> Can you elaborate more on this,
>>> especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
>>> stuff?
>>>
>>> What do you mean with discoverability of not-copying?
>>>
>>> Ciao,
>>>     Oliver
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: R's copying of arguments (Re: Julia)

Oliver-3
On Tue, Mar 20, 2012 at 12:08:12PM -0700, Hervé Pagès wrote:
[...]

> >So the situation is somewhere between: "it is possible" and
> >"R does not support a safe decision if, what is possible, also
> >can be recommended".
> >At the moment R rather deprecates in-place modification by default
> >(the save way, and I agree with this default).
> >
> >But it's not true, that R in general copies arguments.
> >
> >But this seems to be true for the .C interface.
> >
> >Maybe a lot of performance-/memory-problems can be solved
> >by rewriting already existing packages, by providing them
> >via .Call instead of .C.
>
> My understanding is that most packages use the .C interface
> because it's simpler to deal with and because they don't need
> to pass complicated objects at the C level, just atomic vectors.
> My guess is that it's probably rarely the case that the cost
> of copying the arguments passed to .C is significant, but,
> if that was the case, then they could always call .C() with
> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
> section in the man page).
[...]

Yes. I have seen that (DUP=FALSE) in the docs, but while I was
writing the answer like a maniac, I forgot it. ;-)

Thanks for mentionig it.

In the manual also was mentioned, that .Call allows more control.
I did not looked at .C and used .Call from the beginning on.
It did not looked very complicated. But maybe .C would be much easier.
Don't know.


>
> No need to switch to .Call

OK, at least not for the point of DUP-arg.
But it seems to me, that when later the names-result will
be correctly set to 0, 1 and 2, then such optimisations,
which were asked for, could be done "automagically".
And to do it safely too.

The .C interface with the DUP-arg seems not to allow this.


Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: R's copying of arguments (Re: Julia)

Simon Urbanek
In reply to this post by Hervé Pagès

On Mar 20, 2012, at 3:08 PM, Hervé Pagès wrote:

> Hi Oliver,
>
> On 03/17/2012 08:35 AM, oliver wrote:
>> Hello,
>>
>> regarding the copying issue,
>> I would like to point to the
>>
>> "Writing R-Extensions" documentation.
>>
>> There it is mentio9ned, that functions of extensions
>> that use the .C interface normally do get their arguments
>> pre-copied...
>>
>>
>> In section 5.2:
>>
>>   "There can be up to 65 further arguments giving R objects to be
>>   passed to compiled code. Normally these are copied before being
>>   passed in, and copied again to an R list object when the compiled
>>   code returns."
>>
>> But for the .Call and .Extension interfaces this is NOT the case.
>>
>>
>>
>> In section 5.9:
>>   "The .Call and .External interfaces allow much more control, but
>>   they also impose much greater responsibilities so need to be used
>>   with care. Neither .Call nor .External copy their arguments. You
>>   should treat arguments you receive through these interfaces as
>>   read-only."
>>
>>
>> Why is read-only preferred?
>>
>> Please, see the discussion in section 5.9.10.
>>
>> It's mentioned there, that a copy of an object in the R-language
>> not necessarily doies a real copy of that object, but instead of
>> this, just a "rerference" to the real data is created (two names
>> referring to one bulk of data). That's typical functional
>> programming: not a variable, but a name (and possibly more than one
>> name) bound to an object.
>>
>>
>> Of course, if yo change the orgiginal named value, when there
>> would be no copy of it, before changing it, then both names
>> would refer to the changed data.
>> of course that is not, what is wanted.
>>
>> But what you also can see in section 5.9.10 is, that
>> there already is a mechanism (reference counting) that allows
>> to distinguish between unnamed and named object.
>>
>> So, this is directly adressing the points you have mentioned in your
>> examples.
>>
>> So, at least in principial, R allows to do in-place modifications
>> of object with the .Call interface.
>>
>> You seem to refer to the .C interface, and I had explored the .Call
>> interface. That's the reason why you may insist on "it's copyied
>> always" and I wondered, what you were talking about, because the
>> .Call interface allowed me rather C-like raw style of programming
>> (and the user of it to decide, if copying will be done or not).
>>
>> The mechanism to descide, if copying should be done or not,
>> also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
>> with NAMED you can get the number of references.
>>
>> But later in that section it is mentioned, that - at least for now -
>> NAMED always returns the value 2.
>>
>>
>>   "Currently all arguments to a .Call call will have NAMED set to 2,
>>   and so users must assume that they need to be duplicated before
>>   alteration."
>>                (section 5.9.10, last sentence)
>>
>>
>> So, the in-place modification can be done already with the .Call
>> intefcae for example. But the decision if it is safe or not
>> is not supported at the moment.
>>
>> So the situation is somewhere between: "it is possible" and
>> "R does not support a safe decision if, what is possible, also
>> can be recommended".
>> At the moment R rather deprecates in-place modification by default
>> (the save way, and I agree with this default).
>>
>> But it's not true, that R in general copies arguments.
>>
>> But this seems to be true for the .C interface.
>>
>> Maybe a lot of performance-/memory-problems can be solved
>> by rewriting already existing packages, by providing them
>> via .Call instead of .C.
>
> My understanding is that most packages use the .C interface
> because it's simpler to deal with and because they don't need
> to pass complicated objects at the C level, just atomic vectors.
> My guess is that it's probably rarely the case that the cost
> of copying the arguments passed to .C is significant, but,
> if that was the case, then they could always call .C() with
> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
> section in the man page).
>
> No need to switch to .Call
>

I strongly disagree. I'm appalled to see that sentence here. The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).

Cheers,
Simon






> Cheers,
> H.
>
>>
>>
>> Ciao,
>>    Oliver
>>
>>
>>
>>
>> On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
>>> S (and its derivatives and successors) promises that functions
>>> will not change their arguments, so in an expression like
>>>    val<- func(arg)
>>> you know that arg will not be changed.  You can
>>> do that by having func copy arg before doing anything,
>>> but that uses space and time that you want to conserve.
>>> If arg is not a named item in any environment then it
>>> should be fine to write over the original because there
>>> is no way the caller can detect that shortcut.  E.g., in
>>>     cx<- cos(runif(n))
>>> the cos function does not need to allocate new space for
>>> its output, it can just write over its input because, without
>>> a name attached to it, the caller has no way of looking
>>> at what runif(n) returned.  If you did
>>>     x<- runif(n)
>>>     cx<- cos(x)
>>> then cos would have to allocate new space for its output
>>> because overwriting its input would affect a subsequent
>>>     sum(x)
>>> I suppose that end-users and function-writers could learn
>>> to live with having to decide when to copy, but not having
>>> to make that decision makes S more pleasant (and safer) to use.
>>> I think that is a major reason that people are able to
>>> share S code so easily.
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>> -----Original Message-----
>>>> From: oliver [mailto:[hidden email]]
>>>> Sent: Tuesday, March 06, 2012 1:12 AM
>>>> To: William Dunlap
>>>> Cc: Hervé Pagès; R-devel
>>>> Subject: Re: [Rd] Julia
>>>>
>>>> On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
>>>> [...]
>>>>> I find R's (&  S+'s&  S's) copy-on-write-if-not-copying-would-be-discoverable-
>>>>> by-the-uer machanism for giving the allusion of pass-by-value a good way
>>>>> to structure the contract between the function writer and the function user.
>>>> [...]
>>>>
>>>>
>>>> Can you elaborate more on this,
>>>> especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
>>>> stuff?
>>>>
>>>> What do you mean with discoverability of not-copying?
>>>>
>>>> Ciao,
>>>>    Oliver
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: R's copying of arguments (Re: Julia)

Hervé Pagès
On 03/21/2012 06:23 PM, Simon Urbanek wrote:

>
> On Mar 20, 2012, at 3:08 PM, Hervé Pagès wrote:
>
>> Hi Oliver,
>>
>> On 03/17/2012 08:35 AM, oliver wrote:
>>> Hello,
>>>
>>> regarding the copying issue,
>>> I would like to point to the
>>>
>>> "Writing R-Extensions" documentation.
>>>
>>> There it is mentio9ned, that functions of extensions
>>> that use the .C interface normally do get their arguments
>>> pre-copied...
>>>
>>>
>>> In section 5.2:
>>>
>>>    "There can be up to 65 further arguments giving R objects to be
>>>    passed to compiled code. Normally these are copied before being
>>>    passed in, and copied again to an R list object when the compiled
>>>    code returns."
>>>
>>> But for the .Call and .Extension interfaces this is NOT the case.
>>>
>>>
>>>
>>> In section 5.9:
>>>    "The .Call and .External interfaces allow much more control, but
>>>    they also impose much greater responsibilities so need to be used
>>>    with care. Neither .Call nor .External copy their arguments. You
>>>    should treat arguments you receive through these interfaces as
>>>    read-only."
>>>
>>>
>>> Why is read-only preferred?
>>>
>>> Please, see the discussion in section 5.9.10.
>>>
>>> It's mentioned there, that a copy of an object in the R-language
>>> not necessarily doies a real copy of that object, but instead of
>>> this, just a "rerference" to the real data is created (two names
>>> referring to one bulk of data). That's typical functional
>>> programming: not a variable, but a name (and possibly more than one
>>> name) bound to an object.
>>>
>>>
>>> Of course, if yo change the orgiginal named value, when there
>>> would be no copy of it, before changing it, then both names
>>> would refer to the changed data.
>>> of course that is not, what is wanted.
>>>
>>> But what you also can see in section 5.9.10 is, that
>>> there already is a mechanism (reference counting) that allows
>>> to distinguish between unnamed and named object.
>>>
>>> So, this is directly adressing the points you have mentioned in your
>>> examples.
>>>
>>> So, at least in principial, R allows to do in-place modifications
>>> of object with the .Call interface.
>>>
>>> You seem to refer to the .C interface, and I had explored the .Call
>>> interface. That's the reason why you may insist on "it's copyied
>>> always" and I wondered, what you were talking about, because the
>>> .Call interface allowed me rather C-like raw style of programming
>>> (and the user of it to decide, if copying will be done or not).
>>>
>>> The mechanism to descide, if copying should be done or not,
>>> also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
>>> with NAMED you can get the number of references.
>>>
>>> But later in that section it is mentioned, that - at least for now -
>>> NAMED always returns the value 2.
>>>
>>>
>>>    "Currently all arguments to a .Call call will have NAMED set to 2,
>>>    and so users must assume that they need to be duplicated before
>>>    alteration."
>>>                 (section 5.9.10, last sentence)
>>>
>>>
>>> So, the in-place modification can be done already with the .Call
>>> intefcae for example. But the decision if it is safe or not
>>> is not supported at the moment.
>>>
>>> So the situation is somewhere between: "it is possible" and
>>> "R does not support a safe decision if, what is possible, also
>>> can be recommended".
>>> At the moment R rather deprecates in-place modification by default
>>> (the save way, and I agree with this default).
>>>
>>> But it's not true, that R in general copies arguments.
>>>
>>> But this seems to be true for the .C interface.
>>>
>>> Maybe a lot of performance-/memory-problems can be solved
>>> by rewriting already existing packages, by providing them
>>> via .Call instead of .C.
>>
>> My understanding is that most packages use the .C interface
>> because it's simpler to deal with and because they don't need
>> to pass complicated objects at the C level, just atomic vectors.
>> My guess is that it's probably rarely the case that the cost
>> of copying the arguments passed to .C is significant, but,
>> if that was the case, then they could always call .C() with
>> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
>> section in the man page).
>>
>> No need to switch to .Call
>>
>
> I strongly disagree. I'm appalled to see that sentence here.

Come on!

> The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).

So aleph will not support the .C interface? ;-)

H.

>
> Cheers,
> Simon
>
>
>
>
>
>
>> Cheers,
>> H.
>>
>>>
>>>
>>> Ciao,
>>>     Oliver
>>>
>>>
>>>
>>>
>>> On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
>>>> S (and its derivatives and successors) promises that functions
>>>> will not change their arguments, so in an expression like
>>>>     val<- func(arg)
>>>> you know that arg will not be changed.  You can
>>>> do that by having func copy arg before doing anything,
>>>> but that uses space and time that you want to conserve.
>>>> If arg is not a named item in any environment then it
>>>> should be fine to write over the original because there
>>>> is no way the caller can detect that shortcut.  E.g., in
>>>>      cx<- cos(runif(n))
>>>> the cos function does not need to allocate new space for
>>>> its output, it can just write over its input because, without
>>>> a name attached to it, the caller has no way of looking
>>>> at what runif(n) returned.  If you did
>>>>      x<- runif(n)
>>>>      cx<- cos(x)
>>>> then cos would have to allocate new space for its output
>>>> because overwriting its input would affect a subsequent
>>>>      sum(x)
>>>> I suppose that end-users and function-writers could learn
>>>> to live with having to decide when to copy, but not having
>>>> to make that decision makes S more pleasant (and safer) to use.
>>>> I think that is a major reason that people are able to
>>>> share S code so easily.
>>>>
>>>> Bill Dunlap
>>>> Spotfire, TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>>> -----Original Message-----
>>>>> From: oliver [mailto:[hidden email]]
>>>>> Sent: Tuesday, March 06, 2012 1:12 AM
>>>>> To: William Dunlap
>>>>> Cc: Hervé Pagès; R-devel
>>>>> Subject: Re: [Rd] Julia
>>>>>
>>>>> On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
>>>>> [...]
>>>>>> I find R's (&   S+'s&   S's) copy-on-write-if-not-copying-would-be-discoverable-
>>>>>> by-the-uer machanism for giving the allusion of pass-by-value a good way
>>>>>> to structure the contract between the function writer and the function user.
>>>>> [...]
>>>>>
>>>>>
>>>>> Can you elaborate more on this,
>>>>> especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
>>>>> stuff?
>>>>>
>>>>> What do you mean with discoverability of not-copying?
>>>>>
>>>>> Ciao,
>>>>>     Oliver
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: R's copying of arguments (Re: Julia)

Simon Urbanek

On Mar 21, 2012, at 9:31 PM, Hervé Pagès wrote:

> On 03/21/2012 06:23 PM, Simon Urbanek wrote:
>>
>> On Mar 20, 2012, at 3:08 PM, Hervé Pagès wrote:
>>
>>> Hi Oliver,
>>>
>>> On 03/17/2012 08:35 AM, oliver wrote:
>>>> Hello,
>>>>
>>>> regarding the copying issue,
>>>> I would like to point to the
>>>>
>>>> "Writing R-Extensions" documentation.
>>>>
>>>> There it is mentio9ned, that functions of extensions
>>>> that use the .C interface normally do get their arguments
>>>> pre-copied...
>>>>
>>>>
>>>> In section 5.2:
>>>>
>>>>   "There can be up to 65 further arguments giving R objects to be
>>>>   passed to compiled code. Normally these are copied before being
>>>>   passed in, and copied again to an R list object when the compiled
>>>>   code returns."
>>>>
>>>> But for the .Call and .Extension interfaces this is NOT the case.
>>>>
>>>>
>>>>
>>>> In section 5.9:
>>>>   "The .Call and .External interfaces allow much more control, but
>>>>   they also impose much greater responsibilities so need to be used
>>>>   with care. Neither .Call nor .External copy their arguments. You
>>>>   should treat arguments you receive through these interfaces as
>>>>   read-only."
>>>>
>>>>
>>>> Why is read-only preferred?
>>>>
>>>> Please, see the discussion in section 5.9.10.
>>>>
>>>> It's mentioned there, that a copy of an object in the R-language
>>>> not necessarily doies a real copy of that object, but instead of
>>>> this, just a "rerference" to the real data is created (two names
>>>> referring to one bulk of data). That's typical functional
>>>> programming: not a variable, but a name (and possibly more than one
>>>> name) bound to an object.
>>>>
>>>>
>>>> Of course, if yo change the orgiginal named value, when there
>>>> would be no copy of it, before changing it, then both names
>>>> would refer to the changed data.
>>>> of course that is not, what is wanted.
>>>>
>>>> But what you also can see in section 5.9.10 is, that
>>>> there already is a mechanism (reference counting) that allows
>>>> to distinguish between unnamed and named object.
>>>>
>>>> So, this is directly adressing the points you have mentioned in your
>>>> examples.
>>>>
>>>> So, at least in principial, R allows to do in-place modifications
>>>> of object with the .Call interface.
>>>>
>>>> You seem to refer to the .C interface, and I had explored the .Call
>>>> interface. That's the reason why you may insist on "it's copyied
>>>> always" and I wondered, what you were talking about, because the
>>>> .Call interface allowed me rather C-like raw style of programming
>>>> (and the user of it to decide, if copying will be done or not).
>>>>
>>>> The mechanism to descide, if copying should be done or not,
>>>> also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
>>>> with NAMED you can get the number of references.
>>>>
>>>> But later in that section it is mentioned, that - at least for now -
>>>> NAMED always returns the value 2.
>>>>
>>>>
>>>>   "Currently all arguments to a .Call call will have NAMED set to 2,
>>>>   and so users must assume that they need to be duplicated before
>>>>   alteration."
>>>>                (section 5.9.10, last sentence)
>>>>
>>>>
>>>> So, the in-place modification can be done already with the .Call
>>>> intefcae for example. But the decision if it is safe or not
>>>> is not supported at the moment.
>>>>
>>>> So the situation is somewhere between: "it is possible" and
>>>> "R does not support a safe decision if, what is possible, also
>>>> can be recommended".
>>>> At the moment R rather deprecates in-place modification by default
>>>> (the save way, and I agree with this default).
>>>>
>>>> But it's not true, that R in general copies arguments.
>>>>
>>>> But this seems to be true for the .C interface.
>>>>
>>>> Maybe a lot of performance-/memory-problems can be solved
>>>> by rewriting already existing packages, by providing them
>>>> via .Call instead of .C.
>>>
>>> My understanding is that most packages use the .C interface
>>> because it's simpler to deal with and because they don't need
>>> to pass complicated objects at the C level, just atomic vectors.
>>> My guess is that it's probably rarely the case that the cost
>>> of copying the arguments passed to .C is significant, but,
>>> if that was the case, then they could always call .C() with
>>> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
>>> section in the man page).
>>>
>>> No need to switch to .Call
>>>
>>
>> I strongly disagree. I'm appalled to see that sentence here.
>
> Come on!
>
>> The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).
>
> So aleph will not support the .C interface? ;-)
>

It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with ".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;)

Cheers,
S



> H.
>
>>
>> Cheers,
>> Simon
>>
>>
>>
>>
>>
>>
>>> Cheers,
>>> H.
>>>
>>>>
>>>>
>>>> Ciao,
>>>>    Oliver
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
>>>>> S (and its derivatives and successors) promises that functions
>>>>> will not change their arguments, so in an expression like
>>>>>    val<- func(arg)
>>>>> you know that arg will not be changed.  You can
>>>>> do that by having func copy arg before doing anything,
>>>>> but that uses space and time that you want to conserve.
>>>>> If arg is not a named item in any environment then it
>>>>> should be fine to write over the original because there
>>>>> is no way the caller can detect that shortcut.  E.g., in
>>>>>     cx<- cos(runif(n))
>>>>> the cos function does not need to allocate new space for
>>>>> its output, it can just write over its input because, without
>>>>> a name attached to it, the caller has no way of looking
>>>>> at what runif(n) returned.  If you did
>>>>>     x<- runif(n)
>>>>>     cx<- cos(x)
>>>>> then cos would have to allocate new space for its output
>>>>> because overwriting its input would affect a subsequent
>>>>>     sum(x)
>>>>> I suppose that end-users and function-writers could learn
>>>>> to live with having to decide when to copy, but not having
>>>>> to make that decision makes S more pleasant (and safer) to use.
>>>>> I think that is a major reason that people are able to
>>>>> share S code so easily.
>>>>>
>>>>> Bill Dunlap
>>>>> Spotfire, TIBCO Software
>>>>> wdunlap tibco.com
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: oliver [mailto:[hidden email]]
>>>>>> Sent: Tuesday, March 06, 2012 1:12 AM
>>>>>> To: William Dunlap
>>>>>> Cc: Hervé Pagès; R-devel
>>>>>> Subject: Re: [Rd] Julia
>>>>>>
>>>>>> On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
>>>>>> [...]
>>>>>>> I find R's (&   S+'s&   S's) copy-on-write-if-not-copying-would-be-discoverable-
>>>>>>> by-the-uer machanism for giving the allusion of pass-by-value a good way
>>>>>>> to structure the contract between the function writer and the function user.
>>>>>> [...]
>>>>>>
>>>>>>
>>>>>> Can you elaborate more on this,
>>>>>> especially on the ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
>>>>>> stuff?
>>>>>>
>>>>>> What do you mean with discoverability of not-copying?
>>>>>>
>>>>>> Ciao,
>>>>>>    Oliver
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: [hidden email]
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
12