|
Hi,
I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content $`NULL` V1 1 2 Cookies disabled 3 4 Your browser currently does not accept cookies.\rCookies need to be enabled for Scopus to function properly.\rPlease enable session cookies in your browser and try again. $`NULL` V1 V2 V3 1 $`NULL` V1 1 Cookies disabled $`NULL` V1 1 2 3 I have carefully read section 4.4. from this: http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without succes: curl <- getCurlHandle() curlSetOpt(cookiejar = 'cookies.txt', curl = curl) Any suggestions on how to allow for cookies? Thanks. Math |
|
To just enable cookies and their management, use the cookiefile
option, e.g. txt = getURLContent(url, cookiefile = "") Then you can pass this to readHTMLTable(), best done as content = readHTMLTable(htmlParse(txt, asText = TRUE)) The function readHTMLTable() doesn't use RCurl and doesn't handle cookies. D. On 6/7/12 7:33 AM, mdvaan wrote: > Hi, > > I am trying to access a website and read its content. The website is a > restricted access website that I access through a proxy server (which > therefore requires me to enable cookies). I have problems in allowing Rcurl > to receive and send cookies. > > The following lines give me: > > library(RCurl) > library(XML) > > url <- "http://www.theurl.com" > content <- readHTMLTable(url) > > content > $`NULL` > > V1 > 1 > 2 > Cookies disabled > 3 > 4 Your browser currently does not accept cookies.\rCookies need to be > enabled for Scopus to function properly.\rPlease enable session cookies in > your browser and try again. > > $`NULL` > V1 V2 V3 > 1 > > $`NULL` > V1 > 1 Cookies disabled > > $`NULL` > V1 > 1 > 2 > 3 > > I have carefully read section 4.4. from this: > http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without > succes: > > curl <- getCurlHandle() > curlSetOpt(cookiejar = 'cookies.txt', curl = curl) > > Any suggestions on how to allow for cookies? > > Thanks. > > Math > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Apologies for following up on my own mail, but I forgot
to explicitly mention that you will need to specify the appropriate proxy information in the call to getURLContent(). D. On 6/7/12 8:31 AM, Duncan Temple Lang wrote: > To just enable cookies and their management, use the cookiefile > option, e.g. > > txt = getURLContent(url, cookiefile = "") > > Then you can pass this to readHTMLTable(), best done as > > content = readHTMLTable(htmlParse(txt, asText = TRUE)) > > > The function readHTMLTable() doesn't use RCurl and doesn't > handle cookies. > > D. > > On 6/7/12 7:33 AM, mdvaan wrote: >> Hi, >> >> I am trying to access a website and read its content. The website is a >> restricted access website that I access through a proxy server (which >> therefore requires me to enable cookies). I have problems in allowing Rcurl >> to receive and send cookies. >> >> The following lines give me: >> >> library(RCurl) >> library(XML) >> >> url <- "http://www.theurl.com" >> content <- readHTMLTable(url) >> >> content >> $`NULL` >> >> V1 >> 1 >> 2 >> Cookies disabled >> 3 >> 4 Your browser currently does not accept cookies.\rCookies need to be >> enabled for Scopus to function properly.\rPlease enable session cookies in >> your browser and try again. >> >> $`NULL` >> V1 V2 V3 >> 1 >> >> $`NULL` >> V1 >> 1 Cookies disabled >> >> $`NULL` >> V1 >> 1 >> 2 >> 3 >> >> I have carefully read section 4.4. from this: >> http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without >> succes: >> >> curl <- getCurlHandle() >> curlSetOpt(cookiejar = 'cookies.txt', curl = curl) >> >> Any suggestions on how to allow for cookies? >> >> Thanks. >> >> Math >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks for the fast response. I am not sure how to enter the proxy info in the call.
I am working via EZProxy (which I think, rewrites a URL). According to their website it does this: 1. Within the config.txt/ezproxy.cfg file, various hosts are identified that require access from a local IP address. 2. A remote user makes a web connection to port 2048 of your EZproxy server. 3. When the user authenticates successfully, a cookie is sent to the user's browser. 4. The user's browser presents this during each access to EZproxy. So, for example, if I enter URL 1, EZproxy dynamically changes it to URL 2: 1. http://www.scopus.com/results/... 2. http://www-scopus-com.ezproxy.cul.columbia.edu/results/... What kind of proxy information should I look for and where do I enter it in the call? Your help is very much appreciated. Thanks.
|
|
In reply to this post by Duncan Temple Lang
Hi,
I am using prof. Temple Lang's suggestions and I think I should be close but with the code below I get an error message which I don't fully get. Any suggestions? Thanks! Math library(RCurl) library(XML) setwd("C:/Comments") url <- getURLContent("http://www.scopus.com/results/results.url?sort=plf-f&src=s&sid=M8RcnaPRBgrtA1r_EvZtL7j%3a70&sot=a&sdt=a&sl=32&s=PMID%2811693556%29+OR+PMID%2812239288%29&origin=searchadvanced&txGid=M8RcnaPRBgrtA1r_EvZtL7j%3a7", options(RCurlOptions = list(proxy = "127.0.0.1:2048", proxyuserpwd = "username:password", proxyauth = "gci")), cookiefile = "/Rcookies") Error in curlOptions(..., .opts = .opts) : unnamed curl option(s): list(RCurlOptions = list(proxy = "127.0.0.1:2048", proxyuserpwd = "username:password", proxyauth = "gci"))
|
| Powered by Nabble | Edit this page |
