RODBC vs gdata

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

RODBC vs gdata

Christof Kluß
Hi

one col in my Excel file contains many numbers. But on line 3000 and
some other lines are strings like "FG 1". "RODBS" seems to omit this
lines. "gdata" works, but is much slower.

Is this a bug of RODBC or do I apply it wrong?

Example with the same "file.xlsx"


library(RODBC); excel <- odbcConnectExcel2007("file.xlsx")
tab <- sqlQuery(excel, 'select * from "Table 1$"'); str(tab)

col1: num  1 2 3 4 5 6 7 8 9 10 ...

library(gdata); tab <- read.xls("file.xlsx", sheet=1); str(tab)

col1: Factor w/ 51 levels "1","10","11",..: 1 12 23 34 41 42 43...


greeting
Christof

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RODBC vs gdata

Enrico Schumann

Hi Christof,

have a look at the manual of RODBC, and in particular the section on
Excel drivers.

RShowDoc("RODBC", package="RODBC")

Regards,
Enrico


Am 09.01.2012 19:02, schrieb Christof Kluß:

> Hi
>
> one col in my Excel file contains many numbers. But on line 3000 and
> some other lines are strings like "FG 1". "RODBS" seems to omit this
> lines. "gdata" works, but is much slower.
>
> Is this a bug of RODBC or do I apply it wrong?
>
> Example with the same "file.xlsx"
>
>
> library(RODBC); excel<- odbcConnectExcel2007("file.xlsx")
> tab<- sqlQuery(excel, 'select * from "Table 1$"'); str(tab)
>
> col1: num  1 2 3 4 5 6 7 8 9 10 ...
>
> library(gdata); tab<- read.xls("file.xlsx", sheet=1); str(tab)
>
> col1: Factor w/ 51 levels "1","10","11",..: 1 12 23 34 41 42 43...
>
>
> greeting
> Christof
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RODBC vs gdata

Christof Kluß
Hi Enrico,

thank you very much, so it is a known problem with the Microsoft Excel
ODBC drivers :(

"7 Excel Drivers
... There are at least two known problems with reading columns that do
not have a format set before data entry, and so start with format
`General'. First, the driver uses the fi rst few rows to determined the
column type, and is over-fond of declaring `Numeric' even when there are
non-numeric entries. ... Second, if a column is declared as `Text',
numeric entries will be read as SQL nulls and hence R NAs.
Unfortunately, in neither case does reformatting the column help."

So I think I have to use "gdata" to be sure to read all datas.

regards
Christof


Am 09-01-2012 19:29, schrieb Enrico Schumann:

>
> Hi Christof,
>
> have a look at the manual of RODBC, and in particular the section on
> Excel drivers.
>
> RShowDoc("RODBC", package="RODBC")
>
> Regards,
> Enrico
>
>
> Am 09.01.2012 19:02, schrieb Christof Kluß:
>> Hi
>>
>> one col in my Excel file contains many numbers. But on line 3000 and
>> some other lines are strings like "FG 1". "RODBS" seems to omit this
>> lines. "gdata" works, but is much slower.
>>
>> Is this a bug of RODBC or do I apply it wrong?
>>
>> Example with the same "file.xlsx"
>>
>>
>> library(RODBC); excel<- odbcConnectExcel2007("file.xlsx")
>> tab<- sqlQuery(excel, 'select * from "Table 1$"'); str(tab)
>>
>> col1: num  1 2 3 4 5 6 7 8 9 10 ...
>>
>> library(gdata); tab<- read.xls("file.xlsx", sheet=1); str(tab)
>>
>> col1: Factor w/ 51 levels "1","10","11",..: 1 12 23 34 41 42 43...
>>
>>
>> greeting
>> Christof
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RODBC vs gdata

Hasan Diwan-2
On 9 January 2012 10:46, Christof Kluß <[hidden email]> wrote:
> thank you very much, so it is a known problem with the Microsoft Excel
> ODBC drivers :(

As I was advised a few weeks ago, the best way for Excel to get into R
is to export the file as CSV and use read.csv, read.csv2, or
read.table. Perhaps this is an option for you, Christof? -- H
--
Sent from my mobile device
Envoyait de mon portable

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RODBC vs gdata

Enrico Schumann
In reply to this post by Christof Kluß

You could prepare the data in Excel as text, and then coerce them to
numeric in R (and approriately code your "FG 1" strings).

Depending on how large your file is, you could create a new file, format
the cells as text, and then copy the data into this new file. Or change
cell entries to text by prepending a single quote. For instance, "'100"
(without the outer quotes) would be displayed as "100" in Excel, but not
be evaluated as a number. (But I have used the latter approach only with
Excel 2003.)

But as someone else has suggested: if you do not have to use Excel, the
best thing is not to use it...

Regards,
Enrico

Am 09.01.2012 19:46, schrieb Christof Kluß:

> Hi Enrico,
>
> thank you very much, so it is a known problem with the Microsoft Excel
> ODBC drivers :(
>
> "7 Excel Drivers
> ... There are at least two known problems with reading columns that do
> not have a format set before data entry, and so start with format
> `General'. First, the driver uses the fi rst few rows to determined the
> column type, and is over-fond of declaring `Numeric' even when there are
> non-numeric entries. ... Second, if a column is declared as `Text',
> numeric entries will be read as SQL nulls and hence R NAs.
> Unfortunately, in neither case does reformatting the column help."
>
> So I think I have to use "gdata" to be sure to read all datas.
>
> regards
> Christof
>
>
> Am 09-01-2012 19:29, schrieb Enrico Schumann:
>>
>> Hi Christof,
>>
>> have a look at the manual of RODBC, and in particular the section on
>> Excel drivers.
>>
>> RShowDoc("RODBC", package="RODBC")
>>
>> Regards,
>> Enrico
>>
>>
>> Am 09.01.2012 19:02, schrieb Christof Kluß:
>>> Hi
>>>
>>> one col in my Excel file contains many numbers. But on line 3000 and
>>> some other lines are strings like "FG 1". "RODBS" seems to omit this
>>> lines. "gdata" works, but is much slower.
>>>
>>> Is this a bug of RODBC or do I apply it wrong?
>>>
>>> Example with the same "file.xlsx"
>>>
>>>
>>> library(RODBC); excel<- odbcConnectExcel2007("file.xlsx")
>>> tab<- sqlQuery(excel, 'select * from "Table 1$"'); str(tab)
>>>
>>> col1: num  1 2 3 4 5 6 7 8 9 10 ...
>>>
>>> library(gdata); tab<- read.xls("file.xlsx", sheet=1); str(tab)
>>>
>>> col1: Factor w/ 51 levels "1","10","11",..: 1 12 23 34 41 42 43...
>>>
>>>
>>> greeting
>>> Christof
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>

--
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RODBC vs gdata

Ajay Askoolum
If you use Excel and are prepared to use the RCOM package, another option for reading Excel workbooks becomes available:

ado<-comCreateObject("ADODB.RecordSet")
sql<-"SELECT * FROM [Sheet1$]"
con<-"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=XLSDataSource.xlsx;Extended Properties='Excel 12.0 Xml;IMEX=1;HDR=YES'"
comInvoke(ado,"Open",sql,con)
comInvoke(ado,"MoveFirst")
abc<-t(comInvoke(ado,"GetRows"))


To try this script:

1. Create a workbook named XLSDataSource.xlsx; save it where getwd() indicates.

2. In sheet1, type column names in Row 1, starting in Cell A1
3. Type some typical values under each column where yo have added names

Here's what my session shows:

> source("C:\\AJAY\\RSTATS\\SCRIPTS\\ADO.R")
> abc
     [,1] [,2]
[1,] 122  "Jan"
[2,] 133  "Feb"
attr(,"class")
[1] "rcomdata"
>


PS: If you want to use ADO, it is worthwhile getting to knpw all the methods applicable to the ResordSet object; it is very flexible.


________________________________
 From: Enrico Schumann <[hidden email]>
To: Christof Kluß <[hidden email]>
Cc: [hidden email]
Sent: Monday, 9 January 2012, 20:28
Subject: Re: [R] RODBC vs gdata


You could prepare the data in Excel as text, and then coerce them to
numeric in R (and approriately code your "FG 1" strings).

Depending on how large your file is, you could create a new file, format
the cells as text, and then copy the data into this new file. Or change
cell entries to text by prepending a single quote. For instance, "'100"
(without the outer quotes) would be displayed as "100" in Excel, but not
be evaluated as a number. (But I have used the latter approach only with
Excel 2003.)

But as someone else has suggested: if you do not have to use Excel, the
best thing is not to use it...

Regards,
Enrico

Am 09.01.2012 19:46, schrieb Christof Kluß:

> Hi Enrico,
>
> thank you very much, so it is a known problem with the Microsoft Excel
> ODBC drivers :(
>
> "7 Excel Drivers
> ... There are at least two known problems with reading columns that do
> not have a format set before data entry, and so start with format
> `General'. First, the driver uses the first few rows to determined the
> column type, and is over-fond of declaring `Numeric' even when there are
> non-numeric entries. ... Second, if a column is declared as `Text',
> numeric entries will be read as SQL nulls and hence R NAs.
> Unfortunately, in neither case does reformatting the column help."
>
> So I think I have to use "gdata" to be sure to read all datas.
>
> regards
> Christof
>
>
> Am 09-01-2012 19:29, schrieb Enrico Schumann:
>>
>> Hi Christof,
>>
>> have a look at the manual of RODBC, and in particular the section on
>> Excel drivers.
>>
>> RShowDoc("RODBC", package="RODBC")
>>
>> Regards,
>> Enrico
>>
>>
>> Am 09.01.2012 19:02, schrieb Christof Kluß:
>>> Hi
>>>
>>> one col in my Excel file contains many numbers. But on line 3000 and
>>> some other lines are strings like "FG 1". "RODBS" seems to omit this
>>> lines. "gdata" works, but is much slower.
>>>
>>> Is this a bug of RODBC or do I apply it wrong?
>>>
>>> Example with the same "file.xlsx"
>>>
>>>
>>> library(RODBC); excel<- odbcConnectExcel2007("file.xlsx")
>>> tab<- sqlQuery(excel, 'select * from "Table 1$"'); str(tab)
>>>
>>> col1: num  1 2 3 4 5 6 7 8 9 10 ...
>>>
>>> library(gdata); tab<- read.xls("file.xlsx", sheet=1); str(tab)
>>>
>>> col1: Factor w/ 51 levels "1","10","11",..: 1 12 23 34 41 42 43...
>>>
>>>
>>> greeting
>>> Christof
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
--
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RODBC vs gdata

Gabor Grothendieck
In reply to this post by Christof Kluß
On Mon, Jan 9, 2012 at 1:02 PM, Christof Kluß <[hidden email]> wrote:

> Hi
>
> one col in my Excel file contains many numbers. But on line 3000 and
> some other lines are strings like "FG 1". "RODBS" seems to omit this
> lines. "gdata" works, but is much slower.
>
> Is this a bug of RODBC or do I apply it wrong?
>
> Example with the same "file.xlsx"
>
>
> library(RODBC); excel <- odbcConnectExcel2007("file.xlsx")
> tab <- sqlQuery(excel, 'select * from "Table 1$"'); str(tab)
>
> col1: num  1 2 3 4 5 6 7 8 9 10 ...
>
> library(gdata); tab <- read.xls("file.xlsx", sheet=1); str(tab)
>
> col1: Factor w/ 51 levels "1","10","11",..: 1 12 23 34 41 42 43...

You might be able to speed up read.xls in gdata.  Try turning off
comment characters and setting colClasses, e.g.

DF <- read.xls(...whatever..., comment.char = "", colClasses = ...whatever...)

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.