To improve my understanding of workspaces

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

To improve my understanding of workspaces

Kevin E. Thorpe
Hello.

I have grown accustomed to the .Data directory in S-Plus and so when
I came to R I continued that behaviour by saving my workspaces at
the end of each R session.  So, I have saved workspaces in various
directories where I have used R just as I would have had various
.Data directories where I had used S-Plus.

I have seen comments on the list, most recently from Prof. Ripley
that they don't routinely save their workspaces in this way.
So my questions are:

   1. What do people do instead to manage projects?
   2. Is there an "official" recommendation?

 From my reading I have learned that you can save data frames
(and other objects?) to disk and then attach them.  Does this
save memory?  If I have read correctly, I understand that
everything in the workspace is in memory, but haven't been able
to determine if objects in the search path are as well.

Kind Regards,

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [hidden email]  Tel: 416.946.8081  Fax: 416.946.3297

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Adaikalavan Ramasamy
I use emacs and ESS to develop the scripts. The new releases of R has
the script function already in built.

Typically I keep all the data and scripts related to a project in its
own folder, so I have minimal worry about paths.

To save large and associated objects, I use
   save(x, y, z, file="lala.rda", compress=TRUE)
and then to load x, y, z in another session or workspace I use
   load("lala.rda")

To save small dataframes and matrices, I use
   write.table(mat, file="lala.txt", sep="\t")
and to read it back I use
   mat <- read.delim(file="lala.txt", row.names=1)


The problem with .RData (via quit or save.image), is that it keeps all
intermediate objects which can be unnecessarily bloated and confusing.
Further you will have difficulty distinguishing one .RData from the
other by looking at the filename alone.

Regards, Adai



On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:

> Hello.
>
> I have grown accustomed to the .Data directory in S-Plus and so when
> I came to R I continued that behaviour by saving my workspaces at
> the end of each R session.  So, I have saved workspaces in various
> directories where I have used R just as I would have had various
> .Data directories where I had used S-Plus.
>
> I have seen comments on the list, most recently from Prof. Ripley
> that they don't routinely save their workspaces in this way.
> So my questions are:
>
>    1. What do people do instead to manage projects?
>    2. Is there an "official" recommendation?
>
>  From my reading I have learned that you can save data frames
> (and other objects?) to disk and then attach them.  Does this
> save memory?  If I have read correctly, I understand that
> everything in the workspace is in memory, but haven't been able
> to determine if objects in the search path are as well.
>
> Kind Regards,
>
> Kevin
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Duncan Murdoch
Other than Emacs, I use the same work habits as Adai.  An advantage of
this workflow is that almost everything is stored in text format, so it
is easy to compare different versions to see what has changed, and it
works very well with version control (I use Subversion).

The only thing I'd add to his recommendation is that you be sure to save
the scripts that produced the objects in the binary images (his
"lala.rda"), so that they can be reconstructed if necessary.  As long as
the reconstruction isn't too difficult, this means I don't need to
bother to save them in Subversion.

Duncan Murdoch



On 3/10/2006 8:25 AM, Adaikalavan Ramasamy wrote:

> I use emacs and ESS to develop the scripts. The new releases of R has
> the script function already in built.
>
> Typically I keep all the data and scripts related to a project in its
> own folder, so I have minimal worry about paths.
>
> To save large and associated objects, I use
>    save(x, y, z, file="lala.rda", compress=TRUE)
> and then to load x, y, z in another session or workspace I use
>    load("lala.rda")
>
> To save small dataframes and matrices, I use
>    write.table(mat, file="lala.txt", sep="\t")
> and to read it back I use
>    mat <- read.delim(file="lala.txt", row.names=1)
>
>
> The problem with .RData (via quit or save.image), is that it keeps all
> intermediate objects which can be unnecessarily bloated and confusing.
> Further you will have difficulty distinguishing one .RData from the
> other by looking at the filename alone.
>
> Regards, Adai
>
>
>
> On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:
>> Hello.
>>
>> I have grown accustomed to the .Data directory in S-Plus and so when
>> I came to R I continued that behaviour by saving my workspaces at
>> the end of each R session.  So, I have saved workspaces in various
>> directories where I have used R just as I would have had various
>> .Data directories where I had used S-Plus.
>>
>> I have seen comments on the list, most recently from Prof. Ripley
>> that they don't routinely save their workspaces in this way.
>> So my questions are:
>>
>>    1. What do people do instead to manage projects?
>>    2. Is there an "official" recommendation?
>>
>>  From my reading I have learned that you can save data frames
>> (and other objects?) to disk and then attach them.  Does this
>> save memory?  If I have read correctly, I understand that
>> everything in the workspace is in memory, but haven't been able
>> to determine if objects in the search path are as well.
>>
>> Kind Regards,
>>
>> Kevin
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Kevin E. Thorpe
In reply to this post by Adaikalavan Ramasamy
Thanks Adai.  A couple questions/comments about this.

Adaikalavan Ramasamy wrote:
> I use emacs and ESS to develop the scripts. The new releases of R has
> the script function already in built.

I use emacs and ESS too (in Linux).  I do not know about the script
function you mention.  It's not in my version (2.1.1) and I couldn't
find it in an RSiteSearch either.

> Typically I keep all the data and scripts related to a project in its
> own folder, so I have minimal worry about paths.

I do the same.

> To save large and associated objects, I use
>    save(x, y, z, file="lala.rda", compress=TRUE)
> and then to load x, y, z in another session or workspace I use
>    load("lala.rda")
>
> To save small dataframes and matrices, I use
>    write.table(mat, file="lala.txt", sep="\t")
> and to read it back I use
>    mat <- read.delim(file="lala.txt", row.names=1)

Am I correct that load() or read.<whatever>() or even data() will
bring the objects into the current workspace while attach() can
attach a save() data frame to the search path?  Is one approach
better than the other in general?

>
> The problem with .RData (via quit or save.image), is that it keeps all
> intermediate objects which can be unnecessarily bloated and confusing.
> Further you will have difficulty distinguishing one .RData from the
> other by looking at the filename alone.

If you don't save the workspace on q(), do you also lose the history for
that session (although when working in emacs, this is rarely a problem)?

> Regards, Adai

Thanks again,

Kevin

>
>
> On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:
>
>>Hello.
>>
>>I have grown accustomed to the .Data directory in S-Plus and so when
>>I came to R I continued that behaviour by saving my workspaces at
>>the end of each R session.  So, I have saved workspaces in various
>>directories where I have used R just as I would have had various
>>.Data directories where I had used S-Plus.
>>
>>I have seen comments on the list, most recently from Prof. Ripley
>>that they don't routinely save their workspaces in this way.
>>So my questions are:
>>
>>   1. What do people do instead to manage projects?
>>   2. Is there an "official" recommendation?
>>
>> From my reading I have learned that you can save data frames
>>(and other objects?) to disk and then attach them.  Does this
>>save memory?  If I have read correctly, I understand that
>>everything in the workspace is in memory, but haven't been able
>>to determine if objects in the search path are as well.
>>
>>Kind Regards,
>>
>>Kevin
>>
>
>
>


--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [hidden email]  Tel: 416.946.8081  Fax: 416.946.3297

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Sean Davis
In reply to this post by Duncan Murdoch
I


On 3/10/06 8:33 AM, "Duncan Murdoch" <[hidden email]> wrote:

> Other than Emacs, I use the same work habits as Adai.  An advantage of
> this workflow is that almost everything is stored in text format, so it
> is easy to compare different versions to see what has changed, and it
> works very well with version control (I use Subversion).
>
> The only thing I'd add to his recommendation is that you be sure to save
> the scripts that produced the objects in the binary images (his
> "lala.rda"), so that they can be reconstructed if necessary.  As long as
> the reconstruction isn't too difficult, this means I don't need to
> bother to save them in Subversion.

I would add a bit of detail here that I do.  ESS/xemacs allows one to create
a transcript file that you can then step through, executing each command as
it was originally executed.  I make one of these transcript files for each
project and save it with the data and any scripts that I have for the
project.  So, in the end, I have a set of Rda files, one or more transcript
files, and a Src directory that contains any function code (and ESS supports
saving scripts to this directory automatically).

Sean

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Adaikalavan Ramasamy
In reply to this post by Kevin E. Thorpe
A lot of programming style are personal choices and as such varies from
individual to individual. See my comments below.

On Fri, 2006-03-10 at 09:01 -0500, Kevin E. Thorpe wrote:
> Thanks Adai.  A couple questions/comments about this.
>
> Adaikalavan Ramasamy wrote:
> > I use emacs and ESS to develop the scripts. The new releases of R has
> > the script function already in built.
>
> I use emacs and ESS too (in Linux).  I do not know about the script
> function you mention.  It's not in my version (2.1.1) and I couldn't
> find it in an RSiteSearch either.

I meant to say in newer releases of R _for Windows only_ has script
function. Look under File->New scripts (untested). But however it does
not appear to have syntax highlighting or auto indenting that emacs has.


> > Typically I keep all the data and scripts related to a project in its
> > own folder, so I have minimal worry about paths.
>
> I do the same.
>
> > To save large and associated objects, I use
> >    save(x, y, z, file="lala.rda", compress=TRUE)
> > and then to load x, y, z in another session or workspace I use
> >    load("lala.rda")
> >
> > To save small dataframes and matrices, I use
> >    write.table(mat, file="lala.txt", sep="\t")
> > and to read it back I use
> >    mat <- read.delim(file="lala.txt", row.names=1)
>
> Am I correct that load() or read.<whatever>() or even data() will
> bring the objects into the current workspace while attach() can
> attach a save() data frame to the search path?  Is one approach
> better than the other in general?

I think you are correct.

The attach function appears to have two functions now :
 a) attach("lala.rda") loads objects from lala.rda into the search path
 b) attach(obj) makes the named columns of a dataframe or list available
in the search path. Therefore you only need to type 'aaa' instead of
obj$aaa or obj[ , "aaa"]

The second is the more popular form of usage.

Personally I would rather not use attach() and prefer to type obj$aaa or
use in the context of lm( aaa ~ ., data=obj ).



> > The problem with .RData (via quit or save.image), is that it keeps all
> > intermediate objects which can be unnecessarily bloated and confusing.
> > Further you will have difficulty distinguishing one .RData from the
> > other by looking at the filename alone.
>
> If you don't save the workspace on q(), do you also lose the history for
> that session (although when working in emacs, this is rarely a problem)?

I would argue that script file is a better way than history files
because I can clean up any test or wrong codes I might have in the
script file.


However if you prefer to save the history, you can use
savehistory(file="history.txt") at any point

Regards, Adai

<SNIP>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Thomas Lumley
On Fri, 10 Mar 2006, Adaikalavan Ramasamy wrote:
>
> The attach function appears to have two functions now :

Since R 1.1.0, in fact.

> a) attach("lala.rda") loads objects from lala.rda into the search path
> b) attach(obj) makes the named columns of a dataframe or list available
> in the search path. Therefore you only need to type 'aaa' instead of
> obj$aaa or obj[ , "aaa"]
>
> The second is the more popular form of usage.
>
> Personally I would rather not use attach() and prefer to type obj$aaa or
> use in the context of lm( aaa ~ ., data=obj ).

This distinction is relevant only to the second syntax for attach.
Attaching an .rda file is more like loading a package -- it makes the
whole object available, and is very similar to attach() in S-PLUS.

  -thomas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Kevin E. Thorpe
In reply to this post by Sean Davis
Sean Davis wrote:

>
> On 3/10/06 8:33 AM, "Duncan Murdoch" <[hidden email]> wrote:
>
>
>>Other than Emacs, I use the same work habits as Adai.  An advantage of
>>this workflow is that almost everything is stored in text format, so it
>>is easy to compare different versions to see what has changed, and it
>>works very well with version control (I use Subversion).
>>
>>The only thing I'd add to his recommendation is that you be sure to save
>>the scripts that produced the objects in the binary images (his
>>"lala.rda"), so that they can be reconstructed if necessary.  As long as
>>the reconstruction isn't too difficult, this means I don't need to
>>bother to save them in Subversion.

Version control sounds like a good idea Duncan, but I've always been a
bit intimidated by it.  How cumbersome is Subversion and what are the
advantages of version control?

>
> I would add a bit of detail here that I do.  ESS/xemacs allows one to create
> a transcript file that you can then step through, executing each command as
> it was originally executed.  I make one of these transcript files for each
> project and save it with the data and any scripts that I have for the
> project.  So, in the end, I have a set of Rda files, one or more transcript
> files, and a Src directory that contains any function code (and ESS supports
> saving scripts to this directory automatically).

Do you save your functions in Rda files to be loaded/attached or are
they sourced every time?  How do you tell ESS/emacs to save in ./src or
is that only possible with xemacs (I can use emacs to do what I need to
but don't know lisp so the config files and terminology are a bit
cryptic to me)?

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [hidden email]  Tel: 416.946.8081  Fax: 416.946.3297

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Sean Davis



On 3/10/06 1:53 PM, "Kevin E. Thorpe" <[hidden email]> wrote:

> Sean Davis wrote:
>>
>> On 3/10/06 8:33 AM, "Duncan Murdoch" <[hidden email]> wrote:
>>
>>
>>> Other than Emacs, I use the same work habits as Adai.  An advantage of
>>> this workflow is that almost everything is stored in text format, so it
>>> is easy to compare different versions to see what has changed, and it
>>> works very well with version control (I use Subversion).
>>>
>>> The only thing I'd add to his recommendation is that you be sure to save
>>> the scripts that produced the objects in the binary images (his
>>> "lala.rda"), so that they can be reconstructed if necessary.  As long as
>>> the reconstruction isn't too difficult, this means I don't need to
>>> bother to save them in Subversion.
>
> Version control sounds like a good idea Duncan, but I've always been a
> bit intimidated by it.  How cumbersome is Subversion and what are the
> advantages of version control?
>
>>
>> I would add a bit of detail here that I do.  ESS/xemacs allows one to create
>> a transcript file that you can then step through, executing each command as
>> it was originally executed.  I make one of these transcript files for each
>> project and save it with the data and any scripts that I have for the
>> project.  So, in the end, I have a set of Rda files, one or more transcript
>> files, and a Src directory that contains any function code (and ESS supports
>> saving scripts to this directory automatically).
>
> Do you save your functions in Rda files to be loaded/attached or are
> they sourced every time?  How do you tell ESS/emacs to save in ./src or
> is that only possible with xemacs (I can use emacs to do what I need to
> but don't know lisp so the config files and terminology are a bit
> cryptic to me)?

I tend to save as source for easier reading and sharing among projects.  I
should begin to use SVN for my smaller projects, but I haven't yet--only for
packages meant for release or future release make it into SVN with me.  SVN
is quite easy to use and there is at least one emacs package that allows SVN
version control from within emacs (although I do it from the command-line,
still).

As for your second question:

(setq ess-source-directory
      (lambda ()
        (concat ess-directory "Src/")))

is what I use.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: To improve my understanding of workspaces

Duncan Murdoch
In reply to this post by Kevin E. Thorpe
On 3/10/2006 1:53 PM, Kevin E. Thorpe wrote:

> Sean Davis wrote:
>>
>> On 3/10/06 8:33 AM, "Duncan Murdoch" <[hidden email]> wrote:
>>
>>
>>>Other than Emacs, I use the same work habits as Adai.  An advantage of
>>>this workflow is that almost everything is stored in text format, so it
>>>is easy to compare different versions to see what has changed, and it
>>>works very well with version control (I use Subversion).
>>>
>>>The only thing I'd add to his recommendation is that you be sure to save
>>>the scripts that produced the objects in the binary images (his
>>>"lala.rda"), so that they can be reconstructed if necessary.  As long as
>>>the reconstruction isn't too difficult, this means I don't need to
>>>bother to save them in Subversion.
>
> Version control sounds like a good idea Duncan, but I've always been a
> bit intimidated by it.  How cumbersome is Subversion and what are the
> advantages of version control?

It needn't be very cumbersome after you've set it up, but the setup
would be a bit daunting if you haven't used it before.  If you can find
someone who has used it before to do the setup for you, you'll find it a
lot less intimidating.  I'd be happy to do this for you if you come to
London for the SSC meeting in May.  (This offer doesn't just apply to
Kevin, but he's more likely to come to that meeting than most of the
readers of this list.  If anyone else is interested, drop me a line
privately.  And remember that's London, Canada, not the other one.)

If you're working in Windows, use the TortoiseSVN front-end as well as
the command line tools.  I started with the command line tools but use
TSVN most of the time now.

I also recommend reading the O'Reilly book, Version Control with
Subversion.  It's available online at http://svnbook.red-bean.com/.

Duncan Murdoch

>
>>
>> I would add a bit of detail here that I do.  ESS/xemacs allows one to create
>> a transcript file that you can then step through, executing each command as
>> it was originally executed.  I make one of these transcript files for each
>> project and save it with the data and any scripts that I have for the
>> project.  So, in the end, I have a set of Rda files, one or more transcript
>> files, and a Src directory that contains any function code (and ESS supports
>> saving scripts to this directory automatically).
>
> Do you save your functions in Rda files to be loaded/attached or are
> they sourced every time?  How do you tell ESS/emacs to save in ./src or
> is that only possible with xemacs (I can use emacs to do what I need to
> but don't know lisp so the config files and terminology are a bit
> cryptic to me)?
>
> Kevin
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html