Excel

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Excel

David Scott-6

A common process when data is obtained in an Excel spreadsheet is to save
the spreadsheet as a .csv file then read it into R. Experienced users
might have learned to be wary of dates (as I have) but possibly have not
experienced what just happened to me. I thought I might just share it with
r-help as a cautionary tale.

I received an Excel file giving patient details. Each patient had an ID
code in the form of three letters followed by four digits. (Actually a New
Zealand National Health Identification.) I saved the .xls file as .csv.
Then I opened up the .csv (with Excel) to look at it. In the column of ID
codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699.

In a column of character data, Excel had interpreted AUG2699 as a date.

The .csv did not actually have a date in that cell, but if I had saved the
.csv file it would have.

David Scott

_________________________________________________________________
David Scott Department of Statistics, Tamaki Campus
  The University of Auckland, PB 92019
  Auckland 1142,    NEW ZEALAND
Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
Email: [hidden email]

Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Moshe Olshansky-2
This is very consistent with Microsoft's philosophy:
they know better than you what you want to do.

--- David Scott <[hidden email]> wrote:

>
> A common process when data is obtained in an Excel
> spreadsheet is to save
> the spreadsheet as a .csv file then read it into R.
> Experienced users
> might have learned to be wary of dates (as I have)
> but possibly have not
> experienced what just happened to me. I thought I
> might just share it with
> r-help as a cautionary tale.
>
> I received an Excel file giving patient details.
> Each patient had an ID
> code in the form of three letters followed by four
> digits. (Actually a New
> Zealand National Health Identification.) I saved the
> .xls file as .csv.
> Then I opened up the .csv (with Excel) to look at
> it. In the column of ID
> codes I saw: Aug-99. Clicking on that entry it
> showed 1/08/2699.
>
> In a column of character data, Excel had interpreted
> AUG2699 as a date.
>
> The .csv did not actually have a date in that cell,
> but if I had saved the
> .csv file it would have.
>
> David Scott
>
>
_________________________________________________________________

> David Scott Department of Statistics, Tamaki Campus
>   The University of Auckland, PB 92019
>   Auckland 1142,    NEW ZEALAND
> Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
> Email: [hidden email]
>
> Graduate Officer, Department of Statistics
> Director of Consulting, Department of Statistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Robert A LaBudde
In reply to this post by David Scott-6
If you format the column as "Text", you won't have this problem. By
leaving the cells as "General", you leave it up to Excel to guess at
the correct interpretation.

You will note that the conversion to a date occurs immediately in
Excel when you enter the value. There are many formats to enter dates.

Either pre-format the column as Text, or prefix the individual entry
with an ' to indicate text.

A similar problem occurs in R's read.table() function when a factor
has levels that can be interpreted as numbers.

At 10:11 PM 8/27/2007, David wrote:

>A common process when data is obtained in an Excel spreadsheet is to save
>the spreadsheet as a .csv file then read it into R. Experienced users
>might have learned to be wary of dates (as I have) but possibly have not
>experienced what just happened to me. I thought I might just share it with
>r-help as a cautionary tale.
>
>I received an Excel file giving patient details. Each patient had an ID
>code in the form of three letters followed by four digits. (Actually a New
>Zealand National Health Identification.) I saved the .xls file as .csv.
>Then I opened up the .csv (with Excel) to look at it. In the column of ID
>codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699.
>
>In a column of character data, Excel had interpreted AUG2699 as a date.
>
>The .csv did not actually have a date in that cell, but if I had saved the
>.csv file it would have.
>
>David Scott

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

David Scott-6
On Tue, 28 Aug 2007, Robert A LaBudde wrote:

> If you format the column as "Text", you won't have this problem. By
> leaving the cells as "General", you leave it up to Excel to guess at
> the correct interpretation.
>

Not true actually. I had converted the column to Text because I saw the
interpretation as a date in the .xls file. I saved the .csv file *after*
the column had been converted to Text. Looking at the .csv file in a text
editor, the entry is correct.

I have just rechecked this.

On reopening the .csv using Excel, the entry AUG2699 had been interpreted
as a date, and was showing as Aug-99. Most bizarre is that the NHI value
of AUG1838 has *not* been interpreted as a date.

David Scott


> You will note that the conversion to a date occurs immediately in
> Excel when you enter the value. There are many formats to enter dates.
>
> Either pre-format the column as Text, or prefix the individual entry
> with an ' to indicate text.
>
> A similar problem occurs in R's read.table() function when a factor
> has levels that can be interpreted as numbers.
>
> At 10:11 PM 8/27/2007, David wrote:
>
>> A common process when data is obtained in an Excel spreadsheet is to save
>> the spreadsheet as a .csv file then read it into R. Experienced users
>> might have learned to be wary of dates (as I have) but possibly have not
>> experienced what just happened to me. I thought I might just share it with
>> r-help as a cautionary tale.
>>
>> I received an Excel file giving patient details. Each patient had an ID
>> code in the form of three letters followed by four digits. (Actually a New
>> Zealand National Health Identification.) I saved the .xls file as .csv.
>> Then I opened up the .csv (with Excel) to look at it. In the column of ID
>> codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699.
>>
>> In a column of character data, Excel had interpreted AUG2699 as a date.
>>
>> The .csv did not actually have a date in that cell, but if I had saved the
>> .csv file it would have.
>>
>> David Scott
>
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

_________________________________________________________________
David Scott Department of Statistics, Tamaki Campus
  The University of Auckland, PB 92019
  Auckland 1142,    NEW ZEALAND
Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
Email: [hidden email]

Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Moshe Olshansky-2
As far as I understand, changing the format changes
the way data is displayed by Excel but this does not
change the data itself - if while reading the data
Excel decided that it was a date, it is being
converted to an integer (the number of days since
January 1, 1900 - and they mistakenly think that 1900
was a leap year) and it is stored this way.

--- David Scott <[hidden email]> wrote:

> On Tue, 28 Aug 2007, Robert A LaBudde wrote:
>
> > If you format the column as "Text", you won't have
> this problem. By
> > leaving the cells as "General", you leave it up to
> Excel to guess at
> > the correct interpretation.
> >
>
> Not true actually. I had converted the column to
> Text because I saw the
> interpretation as a date in the .xls file. I saved
> the .csv file *after*
> the column had been converted to Text. Looking at
> the .csv file in a text
> editor, the entry is correct.
>
> I have just rechecked this.
>
> On reopening the .csv using Excel, the entry AUG2699
> had been interpreted
> as a date, and was showing as Aug-99. Most bizarre
> is that the NHI value
> of AUG1838 has *not* been interpreted as a date.
>
> David Scott
>
>
> > You will note that the conversion to a date occurs
> immediately in
> > Excel when you enter the value. There are many
> formats to enter dates.
> >
> > Either pre-format the column as Text, or prefix
> the individual entry
> > with an ' to indicate text.
> >
> > A similar problem occurs in R's read.table()
> function when a factor
> > has levels that can be interpreted as numbers.
> >
> > At 10:11 PM 8/27/2007, David wrote:
> >
> >> A common process when data is obtained in an
> Excel spreadsheet is to save
> >> the spreadsheet as a .csv file then read it into
> R. Experienced users
> >> might have learned to be wary of dates (as I
> have) but possibly have not
> >> experienced what just happened to me. I thought I
> might just share it with
> >> r-help as a cautionary tale.
> >>
> >> I received an Excel file giving patient details.
> Each patient had an ID
> >> code in the form of three letters followed by
> four digits. (Actually a New
> >> Zealand National Health Identification.) I saved
> the .xls file as .csv.
> >> Then I opened up the .csv (with Excel) to look at
> it. In the column of ID
> >> codes I saw: Aug-99. Clicking on that entry it
> showed 1/08/2699.
> >>
> >> In a column of character data, Excel had
> interpreted AUG2699 as a date.
> >>
> >> The .csv did not actually have a date in that
> cell, but if I had saved the
> >> .csv file it would have.
> >>
> >> David Scott
> >
> >
>
================================================================

> > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail:
> [hidden email]
> > Least Cost Formulations, Ltd.            URL:
> http://lcfltd.com/
> > 824 Timberlake Drive                     Tel:
> 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax:
> 757-467-2947
> >
> > "Vere scire est per causas scire"
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
>
>
_________________________________________________________________

> David Scott Department of Statistics, Tamaki Campus
>   The University of Auckland, PB 92019
>   Auckland 1142,    NEW ZEALAND
> Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
> Email: [hidden email]
>
> Graduate Officer, Department of Statistics
> Director of Consulting, Department of Statistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

JohnDee
In reply to this post by David Scott-6
On Monday 27 August 2007 22:21, David Scott wrote:

> On Tue, 28 Aug 2007, Robert A LaBudde wrote:
> > If you format the column as "Text", you won't have this problem. By
> > leaving the cells as "General", you leave it up to Excel to guess at
> > the correct interpretation.
>
> Not true actually. I had converted the column to Text because I saw the
> interpretation as a date in the .xls file. I saved the .csv file *after*
> the column had been converted to Text. Looking at the .csv file in a text
> editor, the entry is correct.
>
> I have just rechecked this.
>
> On reopening the .csv using Excel, the entry AUG2699 had been interpreted
> as a date, and was showing as Aug-99. Most bizarre is that the NHI value
> of AUG1838 has *not* been interpreted as a date.
>
Actually, in Excel 2000, he's right.  What you have to is be sure of is that
the "'" that denotes a text entry precedes EVERY entry that can be confused
with a date.  Selecting the entire column and setting the format to "text"
*before* data is entered does this.  It will also create an appropriate *.csv
file.  Excel is notable too because it will automatically convert "date-like"
entries as you type.  In a column of IDs or similar critical data, that
behaviour is really bad.  I have never tried the MS site, but I haven't been
able to find any entry about how to turn that particular automatic behaviour
off.

However, while I have not experimented extensively, as far as I have
experimented, OpenOffice spreadsheet does not behave this way.

JWDougherty

PS, I quit using Excel for most important work after it returned a negative
variance on some data I was collecting descriptive statistics on.

JWD

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Robert A LaBudde
In reply to this post by David Scott-6
At 01:21 AM 8/28/2007, David wrote:

>On Tue, 28 Aug 2007, Robert A LaBudde wrote:
>
>>If you format the column as "Text", you won't have this problem. By
>>leaving the cells as "General", you leave it up to Excel to guess at
>>the correct interpretation.
>
>Not true actually. I had converted the column to Text because I saw
>the interpretation as a date in the .xls file. I saved the .csv file
>*after* the column had been converted to Text. Looking at the .csv
>file in a text editor, the entry is correct.
><snip>

You need to convert the column to Text before you enter the data.
This tells Excel the presumption to use.

Converting to Text after you enter the values has no effect on
previously entered values.

I've tested this using Excel 2000.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

bogdan romocea
In reply to this post by David Scott-6
On a related note, there's one other amazingly stupid thing that Excel
(2002 SP3) does - it exports to CSV the numbers as you see them
displayed, and not as they were entered/imported in the first place.
For example, 1.2345678 will be exported to CSV/tab delimited as 1.23
if that column is formatted to show 2 decimals. Whoever doesn't pay
attention gets what s/he deserves for trusting Excel in the first
place.


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of David Scott
> Sent: Monday, August 27, 2007 10:11 PM
> To: [hidden email]
> Subject: [R] Excel
>
>
> A common process when data is obtained in an Excel
> spreadsheet is to save
> the spreadsheet as a .csv file then read it into R. Experienced users
> might have learned to be wary of dates (as I have) but
> possibly have not
> experienced what just happened to me. I thought I might just
> share it with
> r-help as a cautionary tale.
>
> I received an Excel file giving patient details. Each patient
> had an ID
> code in the form of three letters followed by four digits.
> (Actually a New
> Zealand National Health Identification.) I saved the .xls
> file as .csv.
> Then I opened up the .csv (with Excel) to look at it. In the
> column of ID
> codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699.
>
> In a column of character data, Excel had interpreted AUG2699
> as a date.
>
> The .csv did not actually have a date in that cell, but if I
> had saved the
> .csv file it would have.
>
> David Scott
>
> _________________________________________________________________
> David Scott Department of Statistics, Tamaki Campus
>   The University of Auckland, PB 92019
>   Auckland 1142,    NEW ZEALAND
> Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
> Email: [hidden email]
>
> Graduate Officer, Department of Statistics
> Director of Consulting, Department of Statistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Rolf Turner
In reply to this post by JohnDee

On 28/08/2007, at 7:16 PM, J Dougherty wrote:

        <snip>

> PS, I quit using Excel for most important work after it returned a  
> negative
> variance on some data I was collecting descriptive statistics on.

Those of you who have not seen it should have a look at Jonathan  
Cryer's commentary
on Excel, available at the URL:

                http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf

Executive summary:  Friends don't let friends use Excel for statistics.

                        cheers,

                                Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

christian.ritter
Hmm,

Excel bashing always brings joy ... but then again, it's a user's community of more than 100 million people, and if one is careful one can do quite a few interesting things with excel, in particular if it can be extended by R (using R(D)COM by T Baier and RExcel by E Neuwirth).

Typically, people in the R community are not used to the spreadsheet paradigm and need some time to be able to take advantage of automatic recalculation, cross tabulation (Pivot tables), automatic tabulation of nontrivial expressions (data tables) and do not know that many of the matrix calculations which we commonly do in R can also be carried out by array formulas in Excel (or Gnumeric, if you don't want to stay with a single spreadsheet). With a little experience one can program interactive tools such as to do multiple ridge regression including variables selection and exclusion/inclusion of observations directly in such a spreadsheet. Or, one can just program the interface in the spreadsheet and have R do the calculations.

I think that any serious statistical consultant should be able to combine the power of a spreadsheet with the one of a scripting language (and a relational data base in addition to this). Excel is interesting in this context since it is so widely availabe, since it has a scripting language, and since it can be coupled with R. Partial coupling is also possible in gnumeric (under linux) but not yet under windows (I asked for this a while ago, but as far as I know, the scripting interface - based on Python - doesn't work yet). An equivalent of RExcel/R(D)COM is under development for calc, the open office spreadsheet. However, so far I was not impressed about the quality of calc (can be very slow, hungry for memory, etc).

Here are a few additional comments related to the representation issue in .csv files:
What is said about the .csv files with respect to rounding also holds for the windows clipboard but not for the office clipboard. If you format data in an excel range, select this range and paste it on a different worksheet (within MS Office) the original representation is kept. That is, you can undo the formating in the new copy. However, if you read the data into R using the clipboard as a data source, only the formated version is transfered. I played a bit with options and it really seems a clipboard implementation issue (a job for Microsoft). Any lobbying wit MS to permit a better access to the office clipboard would be useful in this context.

Have a nice day,

Chris

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]]On Behalf Of Rolf Turner
Sent: Tuesday, 28 August, 2007 10:01 PM
To: J Dougherty
Cc: [hidden email]
Subject: Re: [R] Excel



On 28/08/2007, at 7:16 PM, J Dougherty wrote:

        <snip>

> PS, I quit using Excel for most important work after it returned a  
> negative
> variance on some data I was collecting descriptive statistics on.

Those of you who have not seen it should have a look at Jonathan  
Cryer's commentary
on Excel, available at the URL:

                http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf

Executive summary:  Friends don't let friends use Excel for statistics.

                        cheers,

                                Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel (off-topic, sort of)

ALBERTO VIEIRA FERREIRA MONTEIRO
Chris wrote:
>
> Typically, people in the R community are not used to the spreadsheet
> paradigm and need some time to be able to take advantage of
> automatic recalculation, (...)
>
Do you know what's in my wish list?

I wish spreadsheets and computer languages had gone one
step further.

I mean, it's nice to define Cell X to be "equal" to
Cell Y + 10, and then when we change Cell Y, magically we
see Cell X change.

But why can't it be the reverse? Why can't I change Cell X
_and see the change in Cell Y_?

Maybe I'll write a letter to Santa Claus [there are people
who write to congressman; they must have more faith than me].
I wish a language where I can write

  a = b + 10

and then when I write

  a = 20

the language automatically assigns b = 10.

There's a way to simulate this in any computer language, or even
in Excel: instead of "variables" or "cells", we have structures
with value and a flag. The flag dictates if it's input, undefined
or calculated. And then there's a list of relations. So the
program/language/spreadsheed loops through the list of relations,
detects whenever we can infer a new calculated value, and calculates
it, until there's nothing else to do or a contradiction is found.

Alberto Monteiro

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

John Kane-2
In reply to this post by Rolf Turner

--- Rolf Turner <[hidden email]> wrote:

>
> On 28/08/2007, at 7:16 PM, J Dougherty wrote:
>
> <snip>
>
> > PS, I quit using Excel for most important work
> after it returned a  
> > negative
> > variance on some data I was collecting descriptive
> statistics on.
>
> Those of you who have not seen it should have a look
> at Jonathan  
> Cryer's commentary
> on Excel, available at the URL:
>
> http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf
>
> Executive summary:  Friends don't let friends use
> Excel for statistics.
>
> cheers,
>
> Rolf Turner

And perhaps beware of them at any time?  They're great
tools but all–to–often are miss-used.
http://www.eusprig.org/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel (off-topic, sort of)

Philippe Grosjean
In reply to this post by ALBERTO VIEIRA FERREIRA MONTEIRO
Take a look at Mathematica or Maple. This is the kind of thing you do
with these languages.
Best,

Philippe Grosjean


Alberto Monteiro wrote:

> Chris wrote:
>> Typically, people in the R community are not used to the spreadsheet
>> paradigm and need some time to be able to take advantage of
>> automatic recalculation, (...)
>>
> Do you know what's in my wish list?
>
> I wish spreadsheets and computer languages had gone one
> step further.
>
> I mean, it's nice to define Cell X to be "equal" to
> Cell Y + 10, and then when we change Cell Y, magically we
> see Cell X change.
>
> But why can't it be the reverse? Why can't I change Cell X
> _and see the change in Cell Y_?
>
> Maybe I'll write a letter to Santa Claus [there are people
> who write to congressman; they must have more faith than me].
> I wish a language where I can write
>
>   a = b + 10
>
> and then when I write
>
>   a = 20
>
> the language automatically assigns b = 10.
>
> There's a way to simulate this in any computer language, or even
> in Excel: instead of "variables" or "cells", we have structures
> with value and a flag. The flag dictates if it's input, undefined
> or calculated. And then there's a list of relations. So the
> program/language/spreadsheed loops through the list of relations,
> detects whenever we can infer a new calculated value, and calculates
> it, until there's nothing else to do or a contradiction is found.
>
> Alberto Monteiro
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Duncan Murdoch
In reply to this post by christian.ritter
[hidden email] wrote:
> ...
>  

> Here are a few additional comments related to the representation issue in .csv files:
> What is said about the .csv files with respect to rounding also holds for the windows clipboard but not for the office clipboard. If you format data in an excel range, select this range and paste it on a different worksheet (within MS Office) the original representation is kept. That is, you can undo the formating in the new copy. However, if you read the data into R using the clipboard as a data source, only the formated version is transfered. I played a bit with options and it really seems a clipboard implementation issue (a job for Microsoft). Any lobbying wit MS to permit a better access to the office clipboard would be useful in this context.

I did write some code to give access to special formats in the Windows
clipboard.  If you can figure out what's there you should have access to
it from R using getClipboardFormats and readClipboard. I think the only
lobbying that would be needed would be to reveal the format, and that
may have already been done.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel (off-topic, sort of)

François Pinard
In reply to this post by ALBERTO VIEIRA FERREIRA MONTEIRO
[Alberto Monteiro]

> Maybe I'll write a letter to Santa Claus [there are people
> who write to congressman; they must have more faith than me].

:-) :-)

> I wish a language where I can write

>  a = b + 10

> and then when I write

>  a = 20

> the language automatically assigns b = 10.

METAFONT does this (and consequently, Metapost as well).  I still
remember my surprise when I found out that Donald Knuth resorts to such
sophisticated machinery for the sole purpose of designing font
characters.  Knuth surely did many wonderful things :-).

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel (off-topic, sort of)

Albicelli, Nicholas (Exchange)
In reply to this post by ALBERTO VIEIRA FERREIRA MONTEIRO
Except for the ability to perform circular recalculation, I believe that the closest programming analogy to a spreadsheet is a functional programming language.  Check out Haskell (or LISP or Erlang) to do what you describe.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of François Pinard
Sent: Wednesday, August 29, 2007 11:36 AM
To: Alberto Monteiro
Cc: [hidden email]
Subject: Re: [R] Excel (off-topic, sort of)

[Alberto Monteiro]

> Maybe I'll write a letter to Santa Claus [there are people
> who write to congressman; they must have more faith than me].

:-) :-)

> I wish a language where I can write

>  a = b + 10

> and then when I write

>  a = 20

> the language automatically assigns b = 10.

METAFONT does this (and consequently, Metapost as well).  I still
remember my surprise when I found out that Donald Knuth resorts to such
sophisticated machinery for the sole purpose of designing font
characters.  Knuth surely did many wonderful things :-).

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



***********************************************************************
Bear Stearns is not responsible for any recommendation, soli...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Erich Neuwirth
In reply to this post by John Kane-2
Excel bashing can be fun but also can be dangerous because
you are makeing your life harder than necessary.
Statisticians meanwhile know that the numerics of statistical
computation can be quite bad, therefore one should not use them.
But using our (we = Thomas Baier + Erich Neuwirth) RExcel addin either
with the R(D)COM server or with rcom (package on CRAN) allows you to use
all the nice features of Excel (yes, there are quite a few) and use R as
as the computational engine within Excel. The formula
=RApply("var",A1:A1000) in an Excel cell for example will use R to
compute the variance of the data in column A in Excel. If you change any
of the values in the range A1:A1000 will automatically recompute the
variance.

There is one feature in Excel which is extremely convenient, Pivot
tables. Anybody doing any work as statistical consultant really ought to
know about Pivot tables, and I am still surprised how many statisticians
do not know about it. Neither Gnumeric nor OpenOffice Calc offer
comparably convenient ways working with multidimensional tables.

I think the answer to the question
"Excel or R" of course is "Excel and R".



--
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Bert Gunter
Erich:

This is not a comment either for or against the use of Excel. I only wish to
point out that AFAICS, Hadley Wickham's reshape package offers all the pivot
table functionality and more.

If I am wrong about this, please let me and everyone else know.


Bert Gunter
Genentech Nonclinical Statistics


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Erich Neuwirth
Sent: Wednesday, August 29, 2007 11:43 AM
To: r-help
Subject: Re: [R] Excel

Excel bashing can be fun but also can be dangerous because
you are makeing your life harder than necessary.
Statisticians meanwhile know that the numerics of statistical
computation can be quite bad, therefore one should not use them.
But using our (we = Thomas Baier + Erich Neuwirth) RExcel addin either
with the R(D)COM server or with rcom (package on CRAN) allows you to use
all the nice features of Excel (yes, there are quite a few) and use R as
as the computational engine within Excel. The formula
=RApply("var",A1:A1000) in an Excel cell for example will use R to
compute the variance of the data in column A in Excel. If you change any
of the values in the range A1:A1000 will automatically recompute the
variance.

There is one feature in Excel which is extremely convenient, Pivot
tables. Anybody doing any work as statistical consultant really ought to
know about Pivot tables, and I am still surprised how many statisticians
do not know about it. Neither Gnumeric nor OpenOffice Calc offer
comparably convenient ways working with multidimensional tables.

I think the answer to the question
"Excel or R" of course is "Excel and R".



--
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Gabor Grothendieck
You would still need the interactive GUI to get to the point where its
at all comparable to Excel.  Using rpad you could construct such
an interface although its a bit of work.  Here is an example using
rpad and reshape:

http://www.rpad.org/Rpad/DataExplorer.Rpad

On 8/29/07, Bert Gunter <[hidden email]> wrote:

> Erich:
>
> This is not a comment either for or against the use of Excel. I only wish to
> point out that AFAICS, Hadley Wickham's reshape package offers all the pivot
> table functionality and more.
>
> If I am wrong about this, please let me and everyone else know.
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Erich Neuwirth
> Sent: Wednesday, August 29, 2007 11:43 AM
> To: r-help
> Subject: Re: [R] Excel
>
> Excel bashing can be fun but also can be dangerous because
> you are makeing your life harder than necessary.
> Statisticians meanwhile know that the numerics of statistical
> computation can be quite bad, therefore one should not use them.
> But using our (we = Thomas Baier + Erich Neuwirth) RExcel addin either
> with the R(D)COM server or with rcom (package on CRAN) allows you to use
> all the nice features of Excel (yes, there are quite a few) and use R as
> as the computational engine within Excel. The formula
> =RApply("var",A1:A1000) in an Excel cell for example will use R to
> compute the variance of the data in column A in Excel. If you change any
> of the values in the range A1:A1000 will automatically recompute the
> variance.
>
> There is one feature in Excel which is extremely convenient, Pivot
> tables. Anybody doing any work as statistical consultant really ought to
> know about Pivot tables, and I am still surprised how many statisticians
> do not know about it. Neither Gnumeric nor OpenOffice Calc offer
> comparably convenient ways working with multidimensional tables.
>
> I think the answer to the question
> "Excel or R" of course is "Excel and R".
>
>
>
> --
> Erich Neuwirth, University of Vienna
> Faculty of Computer Science
> Computer Supported Didactics Working Group
> Visit our SunSITE at http://sunsite.univie.ac.at
> Phone: +43-1-4277-39464 Fax: +43-1-4277-39459
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Excel

Gregory Snow
In reply to this post by Erich Neuwirth
Erich Neuwirth said:

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Erich Neuwirth
> Sent: Wednesday, August 29, 2007 12:43 PM
> To: r-help
> Subject: Re: [R] Excel
>
> Excel bashing can be fun but also can be dangerous because
> you are makeing your life harder than necessary.

My experience differs, so far using excel (other than as a table layout
program) has made my life harder more times than it has made it easier.

> Statisticians meanwhile know that the numerics of statistical
> computation can be quite bad, therefore one should not use them.

Yes, statisticians know this (or should), but what happens when one of
your clients sees you using the excel interface to R and thinks to
themselves that they know how to use excel and go back and use the
computations that we all know they shouldn't use?

Or do you trust all of your clients to know to use R(D)COM as well as
how to install and use it?

In my case, most of my clients I would trust with my life, but not  my
data (not even their data), I try to talk them into using at least
access if they have to stick with an MS product.

> But using our (we = Thomas Baier + Erich Neuwirth) RExcel
> addin either with the R(D)COM server or with rcom (package on
> CRAN) allows you to use all the nice features of Excel (yes,
> there are quite a few) and use R as as the computational
> engine within Excel.

I'll have to take your word that excel has good features, I have not
found them.

How do you tell the nice features from the erroneous features?  And how
do you tell the difference between the good features and the horrible
features that MS marketing has dressed up nice and convinced a bunch of
people that they are nice?

> The formula
> =RApply("var",A1:A1000) in an Excel cell for example will use
> R to compute the variance of the data in column A in Excel.
> If you change any of the values in the range A1:A1000 will
> automatically recompute the variance.

And what happens when you enter a value in cell A1001?

And what happens if you set cell B1 to =A1 and do the magic copy so that
b2=a2, b3=a3, ... Then put
=Rapply("var",B1:B1000) in a cell, does the answer match with
=Rapply("var",A1:A1000)?

Yes, the auto-recompute could be considered a nice feature, but does it
really save that much work compared to running a script in R after
updating the data? (a couple of clicks in Rgui, a couple of keystrokes
in ESS, one line of code in an R terminal)

> There is one feature in Excel which is extremely convenient,
> Pivot tables. Anybody doing any work as statistical
> consultant really ought to know about Pivot tables, and I am
> still surprised how many statisticians do not know about it.
> Neither Gnumeric nor OpenOffice Calc offer comparably
> convenient ways working with multidimensional tables.

I will admit that I have not learned the power of the pivot table, but
the main reason for that is everytime someone demonstrates the power of
the pivot table to me it is by creating a table, then showing that if
that is not what you want you can click here, here, here, then here and
it is a different table, then you click here and here, ....  At which
point I am completely lost as to which clicks did which.  There are
tools in R (the reshape package has been mentioned) than give you the
power to create the tables, but with a history rather than having to
remember which clicks are needed.  My experience with pivot tables so
far are that they are a post-hoc kludge to a poorly designed original
table.  I prefer working with a script where if the result is not what I
want, I can fix the source rather than admire how pretty the band-aid
is.  (Maybe there are ways to script a pivot table without the clicking,
but everyone who has tried to market them to me have been more impressed
with the band-aid than in preventing the original injury).

What happens when you make a nice pivot table for a client, then they
come back 2 months latter with a new spreadsheet and want a similar
table for this data?  I don't think I could remember the set of clicks
needed, but with an R script I can pull up the set of commands I used
before, change the variable and table names, and have the results in a
couple of minutes.

> I think the answer to the question
> "Excel or R" of course is "Excel and R".

I think the page by Patrick Burns gives a more complete answer
(http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html).

It is ok to use spreadsheets for jobs that spreadsheets are the best
tool, but too many people have the excel hammer and see everything as a
nail.
 
> --
> Erich Neuwirth, University of Vienna
> Faculty of Computer Science
> Computer Supported Didactics Working Group Visit our SunSITE
> at http://sunsite.univie.ac.at
> Phone: +43-1-4277-39464 Fax: +43-1-4277-39459
>



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
(801) 408-8111

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
123