SQL Database

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SQL Database

Doran, Harold
I'm doing some work now to learn which SQL database package is the most optimal for the task I am working on. There are many packages, and I'm reviewing the documentation on some of them now. I am seeking advice from those of you who might suggest a package to use for the task I am currently working with.

The work is currently as follows. My users currently use another tool to extract tables from a server, save those tables as .csv files, and then those csv files are read into R and stuff is done on the data in those files. This adds overhead that can be bypassed if users instead can directly access the database from within R and grab the tables they need and then those tables are data frames in the R session and available to do stuff.

The sequence of work (I think) I just this:

Step 1: Connect to the remote server (connection string and authenticate the user)
Step 2: Have a SQL query statement that grabs the tables from the remote server
Step 3: Close the connection

The two packages I have narrowed my studies to are Dbplyr and RODBC, both of which seem to be similar.

Any experiences out there to suggest these two packages are in fact right for this task, or would there be other packages that might be more optimal for this?

Thanks,
Harold

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: SQL Database

Bert Gunter-2
https://rviews.rstudio.com/2017/10/18/database-queries-with-r/

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Jul 25, 2018 at 7:57 AM, Doran, Harold <[hidden email]> wrote:

> I'm doing some work now to learn which SQL database package is the most
> optimal for the task I am working on. There are many packages, and I'm
> reviewing the documentation on some of them now. I am seeking advice from
> those of you who might suggest a package to use for the task I am currently
> working with.
>
> The work is currently as follows. My users currently use another tool to
> extract tables from a server, save those tables as .csv files, and then
> those csv files are read into R and stuff is done on the data in those
> files. This adds overhead that can be bypassed if users instead can
> directly access the database from within R and grab the tables they need
> and then those tables are data frames in the R session and available to do
> stuff.
>
> The sequence of work (I think) I just this:
>
> Step 1: Connect to the remote server (connection string and authenticate
> the user)
> Step 2: Have a SQL query statement that grabs the tables from the remote
> server
> Step 3: Close the connection
>
> The two packages I have narrowed my studies to are Dbplyr and RODBC, both
> of which seem to be similar.
>
> Any experiences out there to suggest these two packages are in fact right
> for this task, or would there be other packages that might be more optimal
> for this?
>
> Thanks,
> Harold
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: SQL Database

Martin Maechler
In reply to this post by Doran, Harold
>>>>> Doran, Harold
>>>>>     on Wed, 25 Jul 2018 14:57:13 +0000 writes:

    > I'm doing some work now to learn which SQL database
    > package is the most optimal for the task I am working on.

Hmm... we would have a problem with optimize() and optim() if
this was

       optimal << more optimal << most optimal

:-)  ;-)

Best,
Martin

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: SQL Database

R help mailing list-2
In reply to this post by Doran, Harold
From my point of view, the logic is this:

  If the external database is Oracle, use ROracle
  If the external database is MySQL, use RMySQL
and similarly for other databases

If there is no R package specific to the database, then you drop back to RODBC or RJDBC. Hopefully you can get the necessary drivers or java files to support the database

Your steps look good (I do them all the time with Oracle and MySQL), and realize that you don't have to grab an entire table; you can send SQL queries that join tables and subset rows, etc. You can also write results back to the database if that's useful.

I prefer to use packages that are based on the DBI package.

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 7/25/18, 7:57 AM, "R-help on behalf of Doran, Harold" <[hidden email] on behalf of [hidden email]> wrote:

    I'm doing some work now to learn which SQL database package is the most optimal for the task I am working on. There are many packages, and I'm reviewing the documentation on some of them now. I am seeking advice from those of you who might suggest a package to use for the task I am currently working with.
   
    The work is currently as follows. My users currently use another tool to extract tables from a server, save those tables as .csv files, and then those csv files are read into R and stuff is done on the data in those files. This adds overhead that can be bypassed if users instead can directly access the database from within R and grab the tables they need and then those tables are data frames in the R session and available to do stuff.
   
    The sequence of work (I think) I just this:
   
    Step 1: Connect to the remote server (connection string and authenticate the user)
    Step 2: Have a SQL query statement that grabs the tables from the remote server
    Step 3: Close the connection
   
    The two packages I have narrowed my studies to are Dbplyr and RODBC, both of which seem to be similar.
   
    Any experiences out there to suggest these two packages are in fact right for this task, or would there be other packages that might be more optimal for this?
   
    Thanks,
    Harold
   
    ______________________________________________
    [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
   

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: SQL Database

Doran, Harold
Thanks for this. I'm using the RODBC stuff now. It works well and is currently embedded in a shiny app. So, the entire SQL stuff is transparent to the user who simply interacts with the UI. It appears to be working in a local windows version. That is, I can successfully open the connection, do my sqlQuery, and save those data as objects in the R session.

But when I run the same code on my dev server (which runs Centos 7), the code is breaking and it is seemingly related to the driver. It just cannot open the connection. That portion of my code is (with certain things blanked out for security):

cn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};
        server=1.1.1.1;
        database=xyz;
        uid=*****;
        pwd=***;"
)

I'm doing my homework now on the right drivers that might be appropriate for centos, but if anyone happens to know, hints are appreciated

Harold


-----Original Message-----
From: MacQueen, Don [mailto:[hidden email]]
Sent: Thursday, July 26, 2018 11:26 AM
To: Doran, Harold <[hidden email]>; '[hidden email]' <[hidden email]>
Subject: Re: [R] SQL Database

From my point of view, the logic is this:

  If the external database is Oracle, use ROracle
  If the external database is MySQL, use RMySQL and similarly for other databases

If there is no R package specific to the database, then you drop back to RODBC or RJDBC. Hopefully you can get the necessary drivers or java files to support the database

Your steps look good (I do them all the time with Oracle and MySQL), and realize that you don't have to grab an entire table; you can send SQL queries that join tables and subset rows, etc. You can also write results back to the database if that's useful.

I prefer to use packages that are based on the DBI package.

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 7/25/18, 7:57 AM, "R-help on behalf of Doran, Harold" <[hidden email] on behalf of [hidden email]> wrote:

    I'm doing some work now to learn which SQL database package is the most optimal for the task I am working on. There are many packages, and I'm reviewing the documentation on some of them now. I am seeking advice from those of you who might suggest a package to use for the task I am currently working with.
   
    The work is currently as follows. My users currently use another tool to extract tables from a server, save those tables as .csv files, and then those csv files are read into R and stuff is done on the data in those files. This adds overhead that can be bypassed if users instead can directly access the database from within R and grab the tables they need and then those tables are data frames in the R session and available to do stuff.
   
    The sequence of work (I think) I just this:
   
    Step 1: Connect to the remote server (connection string and authenticate the user)
    Step 2: Have a SQL query statement that grabs the tables from the remote server
    Step 3: Close the connection
   
    The two packages I have narrowed my studies to are Dbplyr and RODBC, both of which seem to be similar.
   
    Any experiences out there to suggest these two packages are in fact right for this task, or would there be other packages that might be more optimal for this?
   
    Thanks,
    Harold
   
    ______________________________________________
    [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
   

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: SQL Database

R help mailing list-2
Harold,

I don't have much experience with ODBC/RODBC, but given that it's working on Win, a driver problem seems plausible.

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 7/26/18, 9:37 AM, "Doran, Harold" <[hidden email]> wrote:

    Thanks for this. I'm using the RODBC stuff now. It works well and is currently embedded in a shiny app. So, the entire SQL stuff is transparent to the user who simply interacts with the UI. It appears to be working in a local windows version. That is, I can successfully open the connection, do my sqlQuery, and save those data as objects in the R session.
   
    But when I run the same code on my dev server (which runs Centos 7), the code is breaking and it is seemingly related to the driver. It just cannot open the connection. That portion of my code is (with certain things blanked out for security):
   
    cn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};
    server=1.1.1.1;
    database=xyz;
    uid=*****;
    pwd=***;"
    )
   
    I'm doing my homework now on the right drivers that might be appropriate for centos, but if anyone happens to know, hints are appreciated
   
    Harold
   
   
    -----Original Message-----
    From: MacQueen, Don [mailto:[hidden email]]
    Sent: Thursday, July 26, 2018 11:26 AM
    To: Doran, Harold <[hidden email]>; '[hidden email]' <[hidden email]>
    Subject: Re: [R] SQL Database
   
    From my point of view, the logic is this:
   
      If the external database is Oracle, use ROracle
      If the external database is MySQL, use RMySQL and similarly for other databases
   
    If there is no R package specific to the database, then you drop back to RODBC or RJDBC. Hopefully you can get the necessary drivers or java files to support the database
   
    Your steps look good (I do them all the time with Oracle and MySQL), and realize that you don't have to grab an entire table; you can send SQL queries that join tables and subset rows, etc. You can also write results back to the database if that's useful.
   
    I prefer to use packages that are based on the DBI package.
   
    -Don
   
    --
    Don MacQueen
    Lawrence Livermore National Laboratory
    7000 East Ave., L-627
    Livermore, CA 94550
    925-423-1062
    Lab cell 925-724-7509
     
     
   
    On 7/25/18, 7:57 AM, "R-help on behalf of Doran, Harold" <[hidden email] on behalf of [hidden email]> wrote:
   
        I'm doing some work now to learn which SQL database package is the most optimal for the task I am working on. There are many packages, and I'm reviewing the documentation on some of them now. I am seeking advice from those of you who might suggest a package to use for the task I am currently working with.
       
        The work is currently as follows. My users currently use another tool to extract tables from a server, save those tables as .csv files, and then those csv files are read into R and stuff is done on the data in those files. This adds overhead that can be bypassed if users instead can directly access the database from within R and grab the tables they need and then those tables are data frames in the R session and available to do stuff.
       
        The sequence of work (I think) I just this:
       
        Step 1: Connect to the remote server (connection string and authenticate the user)
        Step 2: Have a SQL query statement that grabs the tables from the remote server
        Step 3: Close the connection
       
        The two packages I have narrowed my studies to are Dbplyr and RODBC, both of which seem to be similar.
       
        Any experiences out there to suggest these two packages are in fact right for this task, or would there be other packages that might be more optimal for this?
       
        Thanks,
        Harold
       
        ______________________________________________
        [hidden email] mailing list -- To UNSUBSCRIBE and more, see
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.
       
   
   

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.