Function gutenberg_download in the gutenbergr package

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Function gutenberg_download in the gutenbergr package

Patrick Connolly-4

I've been working through https://www.tidytextmining.com/tidytext.html
wherein everything worked until I got to this part in section 1.5

> hgwells <- gutenberg_download(c(35, 36, 5230, 159))
Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
Error in open.connection(con, "rb") :
  Failed to connect to www.gutenberg.org port 80: Connection timed out

Which indicates the problem is at the very start:

  if (is.null(mirror)) {
    mirror <- gutenberg_get_mirror(verbose = verbose)
  }

The documentation for gutenberg_get_mirror indicates there's nothing
different I could set.

So I tried specifying my usual mirror:

> hgwells <- gutenberg_download(c(1260, 768, 969, 9182, 767), mirror = "http://cran.stat.auckland.ac.nz")
Error in read_zip_url(full_url) : could not find function "read_zip_url"
>

Which is, indeed, strange since according to

> help.search("read_zip_url")
Help files with alias or concept or title matching ‘read_zip_url’ using
regular expression matching:


gutenbergr::read_zip_url
                        Read a file from a .zip URL
  Aliases: read_zip_url

[...]

And according to
library(help = "gutenbergr")

[...]
Index:

gutenberg_authors       Metadata about Project Gutenberg authors
gutenberg_download      Download one or more works using a Project
                        Gutenberg ID
gutenberg_get_mirror    Get the recommended mirror for Gutenberg files
gutenberg_metadata      Gutenberg metadata about each work
gutenberg_strip         Strip header and footer content from a Project
                        Gutenberg book
gutenberg_subjects      Gutenberg metadata about the subject of each
                        work
gutenberg_works         Get a filtered table of Gutenberg work metadata
read_zip_url            Read a file from a .zip URL

[...]

However, when I look at the list for that part of the search(), there
is no read_zip_url but all the rest of that list are present.  So it's
not surprising that it isn't found.  But it puzzles me that it is not
there.

Ideas as to where I should proceed gratefully appreciated.


> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/hrapgc/local/R-3.4.2/lib/libRblas.so
LAPACK: /home/hrapgc/local/R-3.4.2/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8    
 [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8  
 [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] grDevices utils     stats     graphics  methods   base    

other attached packages:
 [1] sos_2.0-0          brew_1.0-6         gutenbergr_0.1.3   ggplot2_2.2.1    
 [5] stringr_1.2.0      bindrcpp_0.2       dplyr_0.7.4        janeaustenr_0.1.5
 [9] tidytext_0.1.6     FactoMineR_1.38    readxl_1.0.0       tm_0.7-3          
[13] NLP_0.1-11         wordcloud_2.5      RColorBrewer_1.1-2 lattice_0.20-35  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13         cellranger_1.1.0     compiler_3.4.2      
 [4] plyr_1.8.4           bindr_0.1            tokenizers_0.1.4    
 [7] tools_3.4.2          gtable_0.2.0         tibble_1.3.4        
[10] nlme_3.1-131         pkgconfig_2.0.1      rlang_0.1.2        
[13] Matrix_1.2-11        psych_1.7.8          curl_3.0            
[16] parallel_3.4.2       xml2_1.1.1           cluster_2.0.6      
[19] hms_0.3              flashClust_1.01-2    grid_3.4.2          
[22] scatterplot3d_0.3-40 glue_1.1.1           ellipse_0.3-8      
[25] R6_2.2.2             foreign_0.8-69       readr_1.1.1        
[28] purrr_0.2.4          tidyr_0.7.2          reshape2_1.4.2      
[31] magrittr_1.5         scales_0.5.0         SnowballC_0.5.1    
[34] MASS_7.3-47          leaps_3.0            assertthat_0.2.0    
[37] mnormt_1.5-5         colorspace_1.3-2     labeling_0.3        
[40] stringi_1.1.5        lazyeval_0.2.1       munsell_0.4.3      
[43] slam_0.1-42          broom_0.4.2        
>

--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.  
   ___    Patrick Connolly  
 {~._.~}                   Great minds discuss ideas    
 _( Y )_           Average minds discuss events
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                        ..... Eleanor Roosevelt
         
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Function gutenberg_download in the gutenbergr package

Jeff Newmiller
I have never used that package, but it seems obvious to me that you need to "reflect" on the meaning of the word "mirror". There is no reason to assume that a site hosting a mirror of the CRAN archive is also going to host a mirror of Project Gutenberg [1].

If, after you know you are giving reasonable inputs the package does not seem to work as designed, please remember that contributed packages have maintainers [2] and not all of them subscribe to r-help.

[1] https://www.gutenberg.org/MIRRORS.ALL
[2] ?maintainer
--
Sent from my phone. Please excuse my brevity.

On January 23, 2018 11:23:06 PM PST, Patrick Connolly <[hidden email]> wrote:

>
>I've been working through https://www.tidytextmining.com/tidytext.html
>wherein everything worked until I got to this part in section 1.5
>
>> hgwells <- gutenberg_download(c(35, 36, 5230, 159))
>Determining mirror for Project Gutenberg from
>http://www.gutenberg.org/robot/harvest
>Error in open.connection(con, "rb") :
>  Failed to connect to www.gutenberg.org port 80: Connection timed out
>
>Which indicates the problem is at the very start:
>
>  if (is.null(mirror)) {
>    mirror <- gutenberg_get_mirror(verbose = verbose)
>  }
>
>The documentation for gutenberg_get_mirror indicates there's nothing
>different I could set.
>
>So I tried specifying my usual mirror:
>
>> hgwells <- gutenberg_download(c(1260, 768, 969, 9182, 767), mirror =
>"http://cran.stat.auckland.ac.nz")
>Error in read_zip_url(full_url) : could not find function
>"read_zip_url"
>>
>
>Which is, indeed, strange since according to
>
>> help.search("read_zip_url")
>Help files with alias or concept or title matching ‘read_zip_url’ using
>regular expression matching:
>
>
>gutenbergr::read_zip_url
>                        Read a file from a .zip URL
>  Aliases: read_zip_url
>
>[...]
>
>And according to
>library(help = "gutenbergr")
>
>[...]
>Index:
>
>gutenberg_authors       Metadata about Project Gutenberg authors
>gutenberg_download      Download one or more works using a Project
>                        Gutenberg ID
>gutenberg_get_mirror    Get the recommended mirror for Gutenberg files
>gutenberg_metadata      Gutenberg metadata about each work
>gutenberg_strip         Strip header and footer content from a Project
>                        Gutenberg book
>gutenberg_subjects      Gutenberg metadata about the subject of each
>                        work
>gutenberg_works         Get a filtered table of Gutenberg work metadata
>read_zip_url            Read a file from a .zip URL
>
>[...]
>
>However, when I look at the list for that part of the search(), there
>is no read_zip_url but all the rest of that list are present.  So it's
>not surprising that it isn't found.  But it puzzles me that it is not
>there.
>
>Ideas as to where I should proceed gratefully appreciated.
>
>
>> sessionInfo()
>R version 3.4.2 (2017-09-28)
>Platform: x86_64-pc-linux-gnu (64-bit)
>Running under: Ubuntu 14.04.5 LTS
>
>Matrix products: default
>BLAS: /home/hrapgc/local/R-3.4.2/lib/libRblas.so
>LAPACK: /home/hrapgc/local/R-3.4.2/lib/libRlapack.so
>
>locale:
> [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8    
> [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8  
> [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C                
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
>[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C      
>
>attached base packages:
>[1] grDevices utils     stats     graphics  methods   base    
>
>other attached packages:
>[1] sos_2.0-0          brew_1.0-6         gutenbergr_0.1.3  
>ggplot2_2.2.1    
>[5] stringr_1.2.0      bindrcpp_0.2       dplyr_0.7.4      
>janeaustenr_0.1.5
>[9] tidytext_0.1.6     FactoMineR_1.38    readxl_1.0.0       tm_0.7-3  
>      
>[13] NLP_0.1-11         wordcloud_2.5      RColorBrewer_1.1-2
>lattice_0.20-35  
>
>loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13         cellranger_1.1.0     compiler_3.4.2      
> [4] plyr_1.8.4           bindr_0.1            tokenizers_0.1.4    
> [7] tools_3.4.2          gtable_0.2.0         tibble_1.3.4        
>[10] nlme_3.1-131         pkgconfig_2.0.1      rlang_0.1.2        
>[13] Matrix_1.2-11        psych_1.7.8          curl_3.0            
>[16] parallel_3.4.2       xml2_1.1.1           cluster_2.0.6      
>[19] hms_0.3              flashClust_1.01-2    grid_3.4.2          
>[22] scatterplot3d_0.3-40 glue_1.1.1           ellipse_0.3-8      
>[25] R6_2.2.2             foreign_0.8-69       readr_1.1.1        
>[28] purrr_0.2.4          tidyr_0.7.2          reshape2_1.4.2      
>[31] magrittr_1.5         scales_0.5.0         SnowballC_0.5.1    
>[34] MASS_7.3-47          leaps_3.0            assertthat_0.2.0    
>[37] mnormt_1.5-5         colorspace_1.3-2     labeling_0.3        
>[40] stringi_1.1.5        lazyeval_0.2.1       munsell_0.4.3      
>[43] slam_0.1-42          broom_0.4.2        
>>
>
>--
>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>  
>   ___    Patrick Connolly  
> {~._.~}                   Great minds discuss ideas    
> _( Y )_           Average minds discuss events
>(:_~*~_:)                  Small minds discuss people  
> (_)-(_)                        ..... Eleanor Roosevelt
>  
>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.