subsetting/slicing xml2 nodesets

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

subsetting/slicing xml2 nodesets

Tobias Fellinger
Dear R-help members,

I'm working with the xml2 package to parse an xml document, and I don't
understand how subsetting / slicing of xml_nodesets works. I'd expect
xml_find_all to only return children of the nodes I selected with [ or
[[ but it returns all nodes found in the whole document. I did not find
any documentation on the [ and [[ operators for xml_nodeset. Below is a
small example and the sessionInfo.

thanks in advance, Tobias Fellinger



# load package
require(xml2)

# test document as text
test_chr <- "
<html>
<body>
<p>paragraph 1</p>
<p>paragraph 2</p>
</body>
</html>
"

# parse test document
test_doc <- read_xml(test_chr)

# extract nodeset
test_nodeset <- xml_find_all(test_doc, "//p")

# subset nodeset (working as expected)
test_nodeset[1]
# {xml_nodeset (1)}
# [1] <p>paragraph 1</p>
test_nodeset[[1]]
# {xml_node}
# <p>

# extract from subset (not working as expected)
xml_find_all(test_nodeset[1], "//p")
# {xml_nodeset (2)}
# [1] <p>paragraph 1</p>
# [2] <p>paragraph 2</p>
xml_find_all(test_nodeset[[1]], "//p")
# {xml_nodeset (2)}
# [1] <p>paragraph 1</p>
# [2] <p>paragraph 2</p>

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# Matrix products: default
#
# locale:
#   [1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252    
LC_MONETARY=German_Austria.1252 LC_NUMERIC=C                    
LC_TIME=German_Austria.1252
#
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods  
base
#
# other attached packages:
#   [1] xml2_1.2.2
#
# loaded via a namespace (and not attached):
#   [1] compiler_3.6.0 tools_3.6.0    Rcpp_1.0.2     packrat_0.5.0

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.