Data Carpentry - Creating a New SQLite Database

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Carpentry - Creating a New SQLite Database

Phillip Heinrich
Working my way through a tutorial named Data Carpentry (https://datacarpentry.org/R-ecology-lesson/).  for the most part it is excellent but I’m stuck on the very last section (https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).

First, below are the packages I have loaded:
[1] "forcats"   "stringr"   "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "dbplyr"    "RMySQL"    "DBI"      
[12] "dplyr"     "RSQLite"   "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"    
 
     
            >  
     

Second,
Second, is the text of the last section of the last chapter titled “Creating a New SQLite Database”.
Second, below is the text from the tutorial.  The black type is from the tutorial.  The green and blue is the suggested R code.  My comments are in red.
Creating a new SQLite database
So far, we have used a previously prepared SQLite database. But we can also use R to create a new database, e.g. from existing csv files. Let’s recreate the mammals database that we’ve been working with, in R. First let’s download and read in the csv files. We’ll import tidyverse to gain access to the read_csv() function.

download.file("https://ndownloader.figshare.com/files/3299483",
              "data_raw/species.csv")
download.file("https://ndownloader.figshare.com/files/10717177",
              "data_raw/surveys.csv")
download.file("https://ndownloader.figshare.com/files/3299474",
              "data_raw/plots.csv")
library(tidyverse)
species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling three databases from the Web and saving them to a folder on my hard drive. (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots <- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating an R data files.  But here is where I loose it.  I’m creating something named my_db_file from another file named portal-database-output with an sqlite extension and then creating my_db from the My_db_file.  Not sure where the sqlite extension file came from. Creating a new SQLite database with dplyr is easy. You can re-use the same command we used above to open an existing .sqlite file. The create = TRUE argument instructs R to create a new, empty database instead.

Caution: When create = TRUE is added, any existing database at the same location is overwritten without warning.

my_db_file <- "data/portal-database-output.sqlite"
my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is empty, it doesn’t contain any tables:

my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
#> tbls:To add tables, we copy the existing data.frames into the database one by one:

copy_to(my_db, surveys)
copy_to(my_db, plots)
my_dbI can follow the directions to fill in my_db but I have no idea how to access the tables.  The text from the tutorial below says to check the location of our database.  Huh!  Can someone give me some direction.  Thanks.





If you check the location of our database you’ll see that data is automatically being written to disk. R and dplyr not only provide easy ways to query existing databases, they also allows you to easily create your own databases from flat files!



Here is where I loose it.  

   
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data Carpentry - Creating a New SQLite Database

Ivan Krylov
On Fri, 10 Jan 2020 11:31:58 -0700
"Phillip Heinrich" <[hidden email]> wrote:

> below is the text from the tutorial.  The black type is from the
> tutorial.  The green and blue is the suggested R code.  My comments
> are in red

R-help is a plain text mailing list, so the markup has been stripped
off (and since HTML-enabled mail clients don't quite care how the plain
text version of the e-mail looks, some paragraph breaks had to go, too).

> etc.surveys <- read_csv("data_raw/surveys.csv")
> plots <- read_csv("data_raw/plots.csv")

> Again no problem.  I’m just creating an R data files.

Note that it is not files that you are creating by running read_csv(),
but variables (of type "tibble", which is like "data.frame", either of
which should have been covered in earlier chapters in a good tutorial)
in the R environment. The files you downloaded previously are opened
in read only mode and are never changed.

> my_db_file <- "data/portal-database-output.sqlite"

> I’m creating something named my_db_file from another file named
> portal-database-output with an sqlite extension and then creating
> my_db from the My_db_file.

This something is just a text string that happens to contain a *path*
to a file. Just like the variable `greeting` in the following snippet:

greeting <- "Hello world"
print(greeting)

See [1] for more info on character vectors in R.

> Not sure where the sqlite extension file came from.

The authors of the tutorial decided that the file to be created should
be named like this. Feel free to change the extension (or the path) to
anything else: neither R, nor SQLite cares about it much (but the file
manager you use may display a different icon for it or become confused
if you name it .txt or .pdf).

> I can follow the directions to fill in my_db but I have no idea
> how to access the tables.

What exactly do you mean by "access"? At this point my_db should be a
dplyr "src" object, so the tools described in dplyr vignettes [2] should
be applicable. Try calling tbl() on it and passing the name of one of
the tables you have just created. Also try running:

example("src_sqlite")

> The text from the tutorial below says to check the location of our
> database.  Huh!  Can someone give me some direction.

The variable my_db_file contains the location of the file where the
database is stored. This is the same variable that you passed to the
src_sqlite() function.

--
Best regards,
Ivan

[1]
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Character-vectors

[2]
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data Carpentry - Creating a New SQLite Database

Bert Gunter-2
In reply to this post by Phillip Heinrich
Please note that tidyverse packages have their own support resources at
RStudio, whence they came; e.g. here:
https://education.rstudio.com/learn/beginner/
You may also do better asking about issues that concern them at their
support site:  https://support.rstudio.com/hc/en-us
 though, as you already found out, there are folks here who may help also.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jan 10, 2020 at 10:32 AM Phillip Heinrich <[hidden email]> wrote:

> Working my way through a tutorial named Data Carpentry (
> https://datacarpentry.org/R-ecology-lesson/).  for the most part it is
> excellent but I’m stuck on the very last section (
> https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).
>
> First, below are the packages I have loaded:
> [1] "forcats"   "stringr"   "purrr"     "readr"     "tidyr"     "tibble"
>   "ggplot2"   "tidyverse" "dbplyr"    "RMySQL"    "DBI"
> [12] "dplyr"     "RSQLite"   "stats"     "graphics"  "grDevices" "utils"
>    "datasets"  "methods"   "base"
>
>
>             >
>
>
> Second,
> Second, is the text of the last section of the last chapter titled
> “Creating a New SQLite Database”.
> Second, below is the text from the tutorial.  The black type is from the
> tutorial.  The green and blue is the suggested R code.  My comments are in
> red.
> Creating a new SQLite database
> So far, we have used a previously prepared SQLite database. But we can
> also use R to create a new database, e.g. from existing csv files. Let’s
> recreate the mammals database that we’ve been working with, in R. First
> let’s download and read in the csv files. We’ll import tidyverse to gain
> access to the read_csv() function.
>
> download.file("https://ndownloader.figshare.com/files/3299483",
>               "data_raw/species.csv")
> download.file("https://ndownloader.figshare.com/files/10717177",
>               "data_raw/surveys.csv")
> download.file("https://ndownloader.figshare.com/files/3299474",
>               "data_raw/plots.csv")
> library(tidyverse)
> species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling
> three databases from the Web and saving them to a folder on my hard drive.
> (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv")
> plots <- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating
> an R data files.  But here is where I loose it.  I’m creating something
> named my_db_file from another file named portal-database-output with an
> sqlite extension and then creating my_db from the My_db_file.  Not sure
> where the sqlite extension file came from. Creating a new SQLite database
> with dplyr is easy. You can re-use the same command we used above to open
> an existing .sqlite file. The create = TRUE argument instructs R to create
> a new, empty database instead.
>
> Caution: When create = TRUE is added, any existing database at the same
> location is overwritten without warning.
>
> my_db_file <- "data/portal-database-output.sqlite"
> my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database
> is empty, it doesn’t contain any tables:
>
> my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
> #> tbls:To add tables, we copy the existing data.frames into the database
> one by one:
>
> copy_to(my_db, surveys)
> copy_to(my_db, plots)
> my_dbI can follow the directions to fill in my_db but I have no idea how
> to access the tables.  The text from the tutorial below says to check the
> location of our database.  Huh!  Can someone give me some direction.
> Thanks.
>
>
>
>
>
> If you check the location of our database you’ll see that data is
> automatically being written to disk. R and dplyr not only provide easy ways
> to query existing databases, they also allows you to easily create your own
> databases from flat files!
>
>
>
> Here is where I loose it.
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data Carpentry - Creating a New SQLite Database

R help mailing list-2
In reply to this post by Phillip Heinrich
Hi Phillip,

Skipping to the last few lines of your email, did you download a
program to look at Sqlite databases (independent of R) as listed
below? Maybe that program ("DB Browser for SQLite") and/or the
instructions below can help you locate your database directory:

https://datacarpentry.org/semester-biology/computer-setup/
https://datacarpentry.org/semester-biology/materials/sql-for-dplyr-users/

If you do have that program, and you're still seeing an error, you
might consider looking for similar issues at the appropriate
'datacarpentry' repository on Github (or posting a new issue
yourself):

https://github.com/datacarpentry/R-ecology-lesson/issues

Finally, I really feel you'll benefit from reading over the documents
pertaining to "R Data Import/Export" on the www.r-project.org website.
No disrespect to the people at 'datacarpentry', but you'll find
similar (and possibly, easier) R code to follow at section 4.3.1
'Packages using DBI' :

https://cran.r-project.org/doc/manuals/r-release/R-data.html

HTH, Bill.

W. Michels, Ph.D.




On Fri, Jan 10, 2020 at 10:32 AM Phillip Heinrich <[hidden email]> wrote:

>
> Working my way through a tutorial named Data Carpentry (https://datacarpentry.org/R-ecology-lesson/).  for the most part it is excellent but I’m stuck on the very last section (https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).
>
> First, below are the packages I have loaded:
> [1] "forcats"   "stringr"   "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "dbplyr"    "RMySQL"    "DBI"
> [12] "dplyr"     "RSQLite"   "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"
>
>
>             >
>
>
> Second,
> Second, is the text of the last section of the last chapter titled “Creating a New SQLite Database”.
> Second, below is the text from the tutorial.  The black type is from the tutorial.  The green and blue is the suggested R code.  My comments are in red.
> Creating a new SQLite database
> So far, we have used a previously prepared SQLite database. But we can also use R to create a new database, e.g. from existing csv files. Let’s recreate the mammals database that we’ve been working with, in R. First let’s download and read in the csv files. We’ll import tidyverse to gain access to the read_csv() function.
>
> download.file("https://ndownloader.figshare.com/files/3299483",
>               "data_raw/species.csv")
> download.file("https://ndownloader.figshare.com/files/10717177",
>               "data_raw/surveys.csv")
> download.file("https://ndownloader.figshare.com/files/3299474",
>               "data_raw/plots.csv")
> library(tidyverse)
> species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling three databases from the Web and saving them to a folder on my hard drive. (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots <- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating an R data files.  But here is where I loose it.  I’m creating something named my_db_file from another file named portal-database-output with an sqlite extension and then creating my_db from the My_db_file.  Not sure where the sqlite extension file came from. Creating a new SQLite database with dplyr is easy. You can re-use the same command we used above to open an existing .sqlite file. The create = TRUE argument instructs R to create a new, empty database instead.
>
> Caution: When create = TRUE is added, any existing database at the same location is overwritten without warning.
>
> my_db_file <- "data/portal-database-output.sqlite"
> my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is empty, it doesn’t contain any tables:
>
> my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
> #> tbls:To add tables, we copy the existing data.frames into the database one by one:
>
> copy_to(my_db, surveys)
> copy_to(my_db, plots)
> my_dbI can follow the directions to fill in my_db but I have no idea how to access the tables.  The text from the tutorial below says to check the location of our database.  Huh!  Can someone give me some direction.  Thanks.
>
>
>
>
>
> If you check the location of our database you’ll see that data is automatically being written to disk. R and dplyr not only provide easy ways to query existing databases, they also allows you to easily create your own databases from flat files!
>
>
>
> Here is where I loose it.
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.