using xpath with xml2

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

using xpath with xml2

Ben Tupper-2
Hi,

I have mined XML extensively with R before now, but my xpath chops seem to be regressing recently. I know that I can roll up my sleeves and search through the child nodes of the root, but I can't noodle out why using the xpath description returns an empty nodeset.

Any suggestions and nudges most welcome.

### START

library(xml2)
library(httr)
library(magrittr)

daymet_uri <- "https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml"

# run the following to show the node in a browser
# httr::BROWSE(daymet_uri)

daymet <- httr::GET(daymet_uri) %>%
  httr::content(type = "text/xml", encoding = "UTF-8")

# list the children "service" and "dataset"
daymet %>% xml2::xml_children()
#{xml_nodeset (2)}
#[1] <service name="all" serviceType="Compound" base="">\n  <service name="odap" #serviceTyp ...
#[2] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Ve ...

# find all descendants of node name "dataset"
#
# according to this tutorial we should find 'dataset'
# https://www.w3schools.com/xml/xpath_syntax.asp
daymet %>% xml2::xml_find_all(xpath = "//dataset")
# {xml_nodeset (0)}

# I have also tried every other xpath combination I think of e.g.
#   ".//dataset", "./dataset", "/dataset" and "dataset"
# They each yield an empty nodeset

### END

> sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base    

other attached packages:
[1] magrittr_1.5 httr_1.4.1   xml2_1.2.2  

loaded via a namespace (and not attached):
[1] compiler_3.5.1 R6_2.4.0       tools_3.5.1    curl_4.2      
[5] yaml_2.2.0     Rcpp_1.0.3    


Thanks,
Ben

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using xpath with xml2

R help mailing list-2
> xml_ns(daymet)
d1    <-> http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
xlink <-> http://www.w3.org/1999/xlink
> daymet %>% xml2::xml_find_all(xpath = "d1:dataset")
{xml_nodeset (1)}
[1] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for
Nort ...

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Nov 12, 2019 at 11:35 AM Ben Tupper <[hidden email]> wrote:

> Hi,
>
> I have mined XML extensively with R before now, but my xpath chops seem to
> be regressing recently. I know that I can roll up my sleeves and search
> through the child nodes of the root, but I can't noodle out why using the
> xpath description returns an empty nodeset.
>
> Any suggestions and nudges most welcome.
>
> ### START
>
> library(xml2)
> library(httr)
> library(magrittr)
>
> daymet_uri <- "
> https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml"
>
> # run the following to show the node in a browser
> # httr::BROWSE(daymet_uri)
>
> daymet <- httr::GET(daymet_uri) %>%
>   httr::content(type = "text/xml", encoding = "UTF-8")
>
> # list the children "service" and "dataset"
> daymet %>% xml2::xml_children()
> #{xml_nodeset (2)}
> #[1] <service name="all" serviceType="Compound" base="">\n  <service
> name="odap" #serviceTyp ...
> #[2] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for
> North America, Ve ...
>
> # find all descendants of node name "dataset"
> #
> # according to this tutorial we should find 'dataset'
> # https://www.w3schools.com/xml/xpath_syntax.asp
> daymet %>% xml2::xml_find_all(xpath = "//dataset")
> # {xml_nodeset (0)}
>
> # I have also tried every other xpath combination I think of e.g.
> #   ".//dataset", "./dataset", "/dataset" and "dataset"
> # They each yield an empty nodeset
>
> ### END
>
> > sessionInfo()
>
> R version 3.5.1 (2018-07-02)
> Platform: x86_64-redhat-linux-gnu (64-bit)
> Running under: CentOS Linux 7 (Core)
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] magrittr_1.5 httr_1.4.1   xml2_1.2.2
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.1 R6_2.4.0       tools_3.5.1    curl_4.2
> [5] yaml_2.2.0     Rcpp_1.0.3
>
>
> Thanks,
> Ben
>
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org
>
> Ecological Forecasting: https://eco.bigelow.org/
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using xpath with xml2

Ben Tupper-2
Forehead smack!  Of course!

Thank you, Bill!

> On Nov 12, 2019, at 2:50 PM, William Dunlap <[hidden email]> wrote:
>
> > xml_ns(daymet)
> d1    <-> http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 <http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0>
> xlink <-> http://www.w3.org/1999/xlink <http://www.w3.org/1999/xlink>
> > daymet %>% xml2::xml_find_all(xpath = "d1:dataset")
> {xml_nodeset (1)}
> [1] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for Nort ...
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com/>
>
> On Tue, Nov 12, 2019 at 11:35 AM Ben Tupper <[hidden email] <mailto:[hidden email]>> wrote:
> Hi,
>
> I have mined XML extensively with R before now, but my xpath chops seem to be regressing recently. I know that I can roll up my sleeves and search through the child nodes of the root, but I can't noodle out why using the xpath description returns an empty nodeset.
>
> Any suggestions and nudges most welcome.
>
> ### START
>
> library(xml2)
> library(httr)
> library(magrittr)
>
> daymet_uri <- "https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml <https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml>"
>
> # run the following to show the node in a browser
> # httr::BROWSE(daymet_uri)
>
> daymet <- httr::GET(daymet_uri) %>%
>   httr::content(type = "text/xml", encoding = "UTF-8")
>
> # list the children "service" and "dataset"
> daymet %>% xml2::xml_children()
> #{xml_nodeset (2)}
> #[1] <service name="all" serviceType="Compound" base="">\n  <service name="odap" #serviceTyp ...
> #[2] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Ve ...
>
> # find all descendants of node name "dataset"
> #
> # according to this tutorial we should find 'dataset'
> # https://www.w3schools.com/xml/xpath_syntax.asp <https://www.w3schools.com/xml/xpath_syntax.asp>
> daymet %>% xml2::xml_find_all(xpath = "//dataset")
> # {xml_nodeset (0)}
>
> # I have also tried every other xpath combination I think of e.g.
> #   ".//dataset", "./dataset", "/dataset" and "dataset"
> # They each yield an empty nodeset
>
> ### END
>
> > sessionInfo()
>
> R version 3.5.1 (2018-07-02)
> Platform: x86_64-redhat-linux-gnu (64-bit)
> Running under: CentOS Linux 7 (Core)
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods  
> [7] base    
>
> other attached packages:
> [1] magrittr_1.5 httr_1.4.1   xml2_1.2.2  
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.1 R6_2.4.0       tools_3.5.1    curl_4.2      
> [5] yaml_2.2.0     Rcpp_1.0.3    
>
>
> Thanks,
> Ben
>
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org <http://www.bigelow.org/>
>
> Ecological Forecasting: https://eco.bigelow.org/ <https://eco.bigelow.org/>
>
> ______________________________________________
> [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.