How to search only the first x lines within an XML file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How to search only the first x lines within an XML file

dstrick1
This post was updated on .
This being my first post, I'm sure I'll do something discordant with convention, so forgive me in advance.

Basically, I am trying to extract text from an html file using the CSS package in R. However, I am unable to do so because it seems that the text itself is not identified with any class and thus targeting it via the CSS function `cssApply` is difficult.

I'll provide some detailed information so that you may be able to spot something I've missed. Let's say I want to extract the latitude/longitude info from the following html: http://va.water.usgs.gov/duration_plots/htm_7/dp02059500.htm 

Here's what the initial portion of my code would look like:

install.packages('CSS')

library(CSS)

doc<-"http://va.water.usgs.gov/duration_plots/htm_7/dp02059500.htm"

doc<-htmlParse(doc)


Now, considering that the text I want to extract is under the following Xpath (c&p from Chrome DevTool): /html/body/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/font/text()[1]

Would the next move be to call the text from that path? If you need to see for yourself how the site's html is configured follow the link and use your respective browser's inspect element tool.

Any help would be appreciated. Thanks.

*UPDATE*

So I've decided to use the XML library within R and I'm now at a point where I can readily extract the desired data using the XML tags in the file. However, because the XML file repeats itself, the code currently outputs identical values as many times as the XML file is repeated - how can I search only the first, say, 10 lines of the XML file such that the desired value is output only once?

Thanks. David.