"Missing value representation in Excel before extraction to R with RODBC"

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

"Missing value representation in Excel before extraction to R with RODBC"

Fredrik Lundgren
Dear list,

How should missing values be expressed in Excel before extraction to R
via RODBC. I'm bewildered. Sometimes the representation with NA in Excel
appears to work and shows up in R as <NA> but sometimes the use of NA in
Excel changes the whole vector to NA's. Blank or nothing or NA as
representation for missing values in Excel with dateformat gives NA's of
the whole vector in R but with  general format in Excel gives blanks for
missing values in R. How should I represent missing values in Excel?


Best wishes and thanks for any help
Fredrik Lundgren

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: "Missing value representation in Excel before extraction to R with RODBC"

PIKAL Petr
Hi

I believe it has something to do with the column identification
decision. When R decides what is in a column it uses only some values
from the beginning of a file.

I do not use RODBC as read.delim("clipboard", ...) is usually more
convenient but probably there is a way how to tell RODBC what is in
the column instead of let R decide from the top of the file.

But I may be completely mistaken.

HTH
Petr


On 6 Jan 2006 at 20:47, Fredrik Lundgren wrote:

From:           "Fredrik Lundgren" <[hidden email]>
To:             "R-help" <[hidden email]>
Date sent:       Fri, 6 Jan 2006 20:47:29 +0100
Subject:         [R] "Missing value representation in Excel before extraction to R
        with RODBC"

> Dear list,
>
> How should missing values be expressed in Excel before extraction to R
> via RODBC. I'm bewildered. Sometimes the representation with NA in
> Excel appears to work and shows up in R as <NA> but sometimes the use
> of NA in Excel changes the whole vector to NA's. Blank or nothing or
> NA as representation for missing values in Excel with dateformat gives
> NA's of the whole vector in R but with  general format in Excel gives
> blanks for missing values in R. How should I represent missing values
> in Excel?
>
>
> Best wishes and thanks for any help
> Fredrik Lundgren
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: "Missing value representation in Excel before extraction to R with RODBC"

Brian Ripley
On Mon, 9 Jan 2006, Petr Pikal wrote:

> Hi
>
> I believe it has something to do with the column identification
> decision. When R decides what is in a column it uses only some values
> from the beginning of a file.

Not R, Excel.  Excel tells ODBC what the column types are.

> I do not use RODBC as read.delim("clipboard", ...) is usually more
> convenient but probably there is a way how to tell RODBC what is in
> the column instead of let R decide from the top of the file.

Using as.is=TRUE stops RODBC doing any conversion.

> But I may be completely mistaken.
>
> HTH
> Petr
>
>
> On 6 Jan 2006 at 20:47, Fredrik Lundgren wrote:
>
> From:           "Fredrik Lundgren" <[hidden email]>
> To:             "R-help" <[hidden email]>
> Date sent:       Fri, 6 Jan 2006 20:47:29 +0100
> Subject:         [R] "Missing value representation in Excel before extraction to R
> with RODBC"
>
>> Dear list,
>>
>> How should missing values be expressed in Excel before extraction to R
>> via RODBC. I'm bewildered. Sometimes the representation with NA in
>> Excel appears to work and shows up in R as <NA> but sometimes the use
>> of NA in Excel changes the whole vector to NA's. Blank or nothing or
>> NA as representation for missing values in Excel with dateformat gives
>> NA's of the whole vector in R but with  general format in Excel gives
>> blanks for missing values in R. How should I represent missing values
>> in Excel?
>>
>>
>> Best wishes and thanks for any help
>> Fredrik Lundgren

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: "Missing value representation in Excel before extraction to R with RODBC"

Fredrik Lundgren
Dear list,

Well, those columns in Excel that starts with NA (actually 8 NA's in my
case) is imported as all NA in R but if the columns starts with at least
3 cells with values (i.e not NA) the are imported correctly to R. When
as.is=TRUE is used a simular conversion takes place but now as all <NA>
and dates are represented as date-and-time.
Is there any way to get this correct even when the Excel columns start
with several NA's?

Sincerely
Fredrik


----- Original Message -----
From: "Prof Brian Ripley" <[hidden email]>
To: "Petr Pikal" <[hidden email]>
Cc: "Fredrik Lundgren" <[hidden email]>; "R-help"
<[hidden email]>
Sent: Monday, January 09, 2006 9:36 AM
Subject: Re: [R] "Missing value representation in Excel before
extraction to R with RODBC"


> On Mon, 9 Jan 2006, Petr Pikal wrote:
>
>> Hi
>>
>> I believe it has something to do with the column identification
>> decision. When R decides what is in a column it uses only some values
>> from the beginning of a file.
>
> Not R, Excel.  Excel tells ODBC what the column types are.
>
>> I do not use RODBC as read.delim("clipboard", ...) is usually more
>> convenient but probably there is a way how to tell RODBC what is in
>> the column instead of let R decide from the top of the file.
>
> Using as.is=TRUE stops RODBC doing any conversion.
>
>> But I may be completely mistaken.
>>
>> HTH
>> Petr
>>
>>
>> On 6 Jan 2006 at 20:47, Fredrik Lundgren wrote:
>>
>> From:           "Fredrik Lundgren" <[hidden email]>
>> To:             "R-help" <[hidden email]>
>> Date sent:      Fri, 6 Jan 2006 20:47:29 +0100
>> Subject:        [R] "Missing value representation in Excel before
>> extraction to R
>> with RODBC"
>>
>>> Dear list,
>>>
>>> How should missing values be expressed in Excel before extraction to
>>> R
>>> via RODBC. I'm bewildered. Sometimes the representation with NA in
>>> Excel appears to work and shows up in R as <NA> but sometimes the
>>> use
>>> of NA in Excel changes the whole vector to NA's. Blank or nothing or
>>> NA as representation for missing values in Excel with dateformat
>>> gives
>>> NA's of the whole vector in R but with  general format in Excel
>>> gives
>>> blanks for missing values in R. How should I represent missing
>>> values
>>> in Excel?
>>>
>>>
>>> Best wishes and thanks for any help
>>> Fredrik Lundgren
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: "Missing value representation in Excel before extraction to R with RODBC"

PIKAL Petr
Hi

I tried to reproduce what you have told us by copy and paste

read.delim("clipboard")

but was not successful.

Even with several blank values in each column in Excel i got correct
import to R by this process. As I do not use RODBC I do not know all
possible settings and features. If colClasses is available you can
force the columns to by character, numeric, factor, Date or some
other class.

BTW Excel can be quite tricky and hides e.g. spaces in cells so you
see them as empty even if they are not. So if I get some weird
conversions of numeric columns there is often something hidden in
Excel.

HTH
Petr


On 9 Jan 2006 at 18:06, Fredrik Lundgren wrote:

From:           "Fredrik Lundgren" <[hidden email]>
To:             "Prof Brian Ripley" <[hidden email]>,
        "Petr Pikal" <[hidden email]>
Copies to:       "R-help" <[hidden email]>
Subject:         Re: [R] "Missing value representation in Excel before extraction to R with RODBC"
Date sent:       Mon, 9 Jan 2006 18:06:49 +0100

> Dear list,
>
> Well, those columns in Excel that starts with NA (actually 8 NA's in
> my case) is imported as all NA in R but if the columns starts with at
> least 3 cells with values (i.e not NA) the are imported correctly to
> R. When as.is=TRUE is used a simular conversion takes place but now as
> all <NA> and dates are represented as date-and-time. Is there any way
> to get this correct even when the Excel columns start with several
> NA's?
>
> Sincerely
> Fredrik
>
>
> ----- Original Message -----
> From: "Prof Brian Ripley" <[hidden email]>
> To: "Petr Pikal" <[hidden email]>
> Cc: "Fredrik Lundgren" <[hidden email]>; "R-help"
> <[hidden email]> Sent: Monday, January 09, 2006 9:36 AM
> Subject: Re: [R] "Missing value representation in Excel before
> extraction to R with RODBC"
>
>
> > On Mon, 9 Jan 2006, Petr Pikal wrote:
> >
> >> Hi
> >>
> >> I believe it has something to do with the column identification
> >> decision. When R decides what is in a column it uses only some
> >> values from the beginning of a file.
> >
> > Not R, Excel.  Excel tells ODBC what the column types are.
> >
> >> I do not use RODBC as read.delim("clipboard", ...) is usually more
> >> convenient but probably there is a way how to tell RODBC what is in
> >> the column instead of let R decide from the top of the file.
> >
> > Using as.is=TRUE stops RODBC doing any conversion.
> >
> >> But I may be completely mistaken.
> >>
> >> HTH
> >> Petr
> >>
> >>
> >> On 6 Jan 2006 at 20:47, Fredrik Lundgren wrote:
> >>
> >> From:           "Fredrik Lundgren"
> >> <[hidden email]> To:             "R-help"
> >> <[hidden email]> Date sent:      Fri, 6 Jan 2006 20:47:29
> >> +0100 Subject:        [R] "Missing value representation in Excel
> >> before extraction to R with RODBC"
> >>
> >>> Dear list,
> >>>
> >>> How should missing values be expressed in Excel before extraction
> >>> to R via RODBC. I'm bewildered. Sometimes the representation with
> >>> NA in Excel appears to work and shows up in R as <NA> but
> >>> sometimes the use of NA in Excel changes the whole vector to NA's.
> >>> Blank or nothing or NA as representation for missing values in
> >>> Excel with dateformat gives NA's of the whole vector in R but with
> >>>  general format in Excel gives blanks for missing values in R. How
> >>> should I represent missing values in Excel?
> >>>
> >>>
> >>> Best wishes and thanks for any help
> >>> Fredrik Lundgren
> >
> > --
> > Brian D. Ripley,                  [hidden email]
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self) 1
> > South Parks Road,                     +44 1865 272866 (PA) Oxford
> > OX1 3TG, UK                Fax:  +44 1865 272595
> >
>
>

Petr Pikal
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: "Missing value representation in Excel before extraction to R with RODBC"

Fredrik Lundgren
Dear Petr,

Thank you for your help. I have tried (and succeded) to import myfile
after conversion to text and with the help of read.table (also with the
file = 'clipboard' alternative). Both methods give correct results,
albeit dateformat turns up as a factor (minor problem). Also the
read.xls from library gdata has been successful, albeit with some
different representation of dateformat( minor problem). At least my
Excelfile isn't corrupted in such a way to make this three ways
impossible. No, my problem appears to be connected to the use of RODBC
and that was what I wanted to get working. The first 8 rows are excluded
from the file and columns with many NA's at the start are tranformed to
all NA. If the NA's at the beginning of a column are given values (i. e.
not NA) the tranformation of the column doesn't take place but the first
8 rows are still excluded. I have tried - not necessarily in a correct
way - to use as.is (keeps the dateformat correct) and  colClasses
(doesn't apply?) but haven't been able to sort the problem out with
these options.

Best wishes

Fredrik
----- Original Message -----
From: "Petr Pikal" <[hidden email]>
To: "Fredrik Lundgren" <[hidden email]>; "R-help"
<[hidden email]>
Sent: Tuesday, January 10, 2006 10:08 AM
Subject: Re: [R] "Missing value representation in Excel before
extraction to R with RODBC"


> Hi
>
> I tried to reproduce what you have told us by copy and paste
>
> read.delim("clipboard")
>
> but was not successful.
>
> Even with several blank values in each column in Excel i got correct
> import to R by this process. As I do not use RODBC I do not know all
> possible settings and features. If colClasses is available you can
> force the columns to by character, numeric, factor, Date or some
> other class.
>
> BTW Excel can be quite tricky and hides e.g. spaces in cells so you
> see them as empty even if they are not. So if I get some weird
> conversions of numeric columns there is often something hidden in
> Excel.
>
> HTH
> Petr
>
>
> On 9 Jan 2006 at 18:06, Fredrik Lundgren wrote:
>
> From:           "Fredrik Lundgren" <[hidden email]>
> To:             "Prof Brian Ripley" <[hidden email]>,
> "Petr Pikal" <[hidden email]>
> Copies to:      "R-help" <[hidden email]>
> Subject:        Re: [R] "Missing value representation in Excel before
> extraction to R with RODBC"
> Date sent:      Mon, 9 Jan 2006 18:06:49 +0100
>
>> Dear list,
>>
>> Well, those columns in Excel that starts with NA (actually 8 NA's in
>> my case) is imported as all NA in R but if the columns starts with at
>> least 3 cells with values (i.e not NA) the are imported correctly to
>> R. When as.is=TRUE is used a simular conversion takes place but now
>> as
>> all <NA> and dates are represented as date-and-time. Is there any way
>> to get this correct even when the Excel columns start with several
>> NA's?
>>
>> Sincerely
>> Fredrik
>>
>>
>> ----- Original Message -----
>> From: "Prof Brian Ripley" <[hidden email]>
>> To: "Petr Pikal" <[hidden email]>
>> Cc: "Fredrik Lundgren" <[hidden email]>; "R-help"
>> <[hidden email]> Sent: Monday, January 09, 2006 9:36 AM
>> Subject: Re: [R] "Missing value representation in Excel before
>> extraction to R with RODBC"
>>
>>
>> > On Mon, 9 Jan 2006, Petr Pikal wrote:
>> >
>> >> Hi
>> >>
>> >> I believe it has something to do with the column identification
>> >> decision. When R decides what is in a column it uses only some
>> >> values from the beginning of a file.
>> >
>> > Not R, Excel.  Excel tells ODBC what the column types are.
>> >
>> >> I do not use RODBC as read.delim("clipboard", ...) is usually more
>> >> convenient but probably there is a way how to tell RODBC what is
>> >> in
>> >> the column instead of let R decide from the top of the file.
>> >
>> > Using as.is=TRUE stops RODBC doing any conversion.
>> >
>> >> But I may be completely mistaken.
>> >>
>> >> HTH
>> >> Petr
>> >>
>> >>
>> >> On 6 Jan 2006 at 20:47, Fredrik Lundgren wrote:
>> >>
>> >> From:           "Fredrik Lundgren"
>> >> <[hidden email]> To:             "R-help"
>> >> <[hidden email]> Date sent:      Fri, 6 Jan 2006
>> >> 20:47:29
>> >> +0100 Subject:        [R] "Missing value representation in Excel
>> >> before extraction to R with RODBC"
>> >>
>> >>> Dear list,
>> >>>
>> >>> How should missing values be expressed in Excel before extraction
>> >>> to R via RODBC. I'm bewildered. Sometimes the representation with
>> >>> NA in Excel appears to work and shows up in R as <NA> but
>> >>> sometimes the use of NA in Excel changes the whole vector to
>> >>> NA's.
>> >>> Blank or nothing or NA as representation for missing values in
>> >>> Excel with dateformat gives NA's of the whole vector in R but
>> >>> with
>> >>>  general format in Excel gives blanks for missing values in R.
>> >>> How
>> >>> should I represent missing values in Excel?
>> >>>
>> >>>
>> >>> Best wishes and thanks for any help
>> >>> Fredrik Lundgren
>> >
>> > --
>> > Brian D. Ripley,                  [hidden email]
>> > Professor of Applied Statistics,
>> > http://www.stats.ox.ac.uk/~ripley/
>> > University of Oxford,             Tel:  +44 1865 272861 (self) 1
>> > South Parks Road,                     +44 1865 272866 (PA) Oxford
>> > OX1 3TG, UK                Fax:  +44 1865 272595
>> >
>>
>>
>
> Petr Pikal
> [hidden email]
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html