Quantcast

How to set cookies in RCurl

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to set cookies in RCurl

mdvaan
Hi,

I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies.

The following lines give me:

library(RCurl)
library(XML)

url <- "http://www.theurl.com"
content <- readHTMLTable(url)

content
$`NULL`
                                                                                                                                                                          V1
1                                                                                                                                                                          
2                                                                                                                                                           Cookies disabled
3                                                                                                                                                                          
4 Your browser currently does not accept cookies.\rCookies need to be enabled for Scopus to function properly.\rPlease enable session cookies in your browser and try again.

$`NULL`
  V1 V2 V3
1        

$`NULL`
                V1
1 Cookies disabled

$`NULL`
  V1
1  
2  
3  

I have carefully read section 4.4. from this: http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without succes:

curl <- getCurlHandle()
curlSetOpt(cookiejar = 'cookies.txt', curl = curl)

Any suggestions on how to allow for cookies?

Thanks.

Math
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to set cookies in RCurl

Duncan Temple Lang
To just enable cookies and their management, use the cookiefile
option, e.g.

  txt = getURLContent(url,  cookiefile = "")

Then you can pass this to readHTMLTable(), best done as

  content = readHTMLTable(htmlParse(txt, asText = TRUE))


The function readHTMLTable() doesn't use RCurl and doesn't
handle cookies.

   D.

On 6/7/12 7:33 AM, mdvaan wrote:

> Hi,
>
> I am trying to access a website and read its content. The website is a
> restricted access website that I access through a proxy server (which
> therefore requires me to enable cookies). I have problems in allowing Rcurl
> to receive and send cookies.
>
> The following lines give me:
>
> library(RCurl)
> library(XML)
>
> url <- "http://www.theurl.com"
> content <- readHTMLTable(url)
>
> content
> $`NULL`
>                                                                                                                                                                          
> V1
> 1                                                                                                                                                                          
> 2                                                                                                                                                          
> Cookies disabled
> 3                                                                                                                                                                          
> 4 Your browser currently does not accept cookies.\rCookies need to be
> enabled for Scopus to function properly.\rPlease enable session cookies in
> your browser and try again.
>
> $`NULL`
>   V1 V2 V3
> 1        
>
> $`NULL`
>                 V1
> 1 Cookies disabled
>
> $`NULL`
>   V1
> 1  
> 2  
> 3  
>
> I have carefully read section 4.4. from this:
> http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without
> succes:
>
> curl <- getCurlHandle()
> curlSetOpt(cookiejar = 'cookies.txt', curl = curl)
>
> Any suggestions on how to allow for cookies?
>
> Thanks.
>
> Math
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to set cookies in RCurl

Duncan Temple Lang
Apologies for following up on my own mail, but I forgot
to explicitly mention that you will need to specify the
appropriate proxy information in the call to getURLContent().

  D.

On 6/7/12 8:31 AM, Duncan Temple Lang wrote:

> To just enable cookies and their management, use the cookiefile
> option, e.g.
>
>   txt = getURLContent(url,  cookiefile = "")
>
> Then you can pass this to readHTMLTable(), best done as
>
>   content = readHTMLTable(htmlParse(txt, asText = TRUE))
>
>
> The function readHTMLTable() doesn't use RCurl and doesn't
> handle cookies.
>
>    D.
>
> On 6/7/12 7:33 AM, mdvaan wrote:
>> Hi,
>>
>> I am trying to access a website and read its content. The website is a
>> restricted access website that I access through a proxy server (which
>> therefore requires me to enable cookies). I have problems in allowing Rcurl
>> to receive and send cookies.
>>
>> The following lines give me:
>>
>> library(RCurl)
>> library(XML)
>>
>> url <- "http://www.theurl.com"
>> content <- readHTMLTable(url)
>>
>> content
>> $`NULL`
>>                                                                                                                                                                          
>> V1
>> 1                                                                                                                                                                          
>> 2                                                                                                                                                          
>> Cookies disabled
>> 3                                                                                                                                                                          
>> 4 Your browser currently does not accept cookies.\rCookies need to be
>> enabled for Scopus to function properly.\rPlease enable session cookies in
>> your browser and try again.
>>
>> $`NULL`
>>   V1 V2 V3
>> 1        
>>
>> $`NULL`
>>                 V1
>> 1 Cookies disabled
>>
>> $`NULL`
>>   V1
>> 1  
>> 2  
>> 3  
>>
>> I have carefully read section 4.4. from this:
>> http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without
>> succes:
>>
>> curl <- getCurlHandle()
>> curlSetOpt(cookiejar = 'cookies.txt', curl = curl)
>>
>> Any suggestions on how to allow for cookies?
>>
>> Thanks.
>>
>> Math
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to set cookies in RCurl

mdvaan
Thanks for the fast response. I am not sure how to enter the proxy info in the call.

I am working via EZProxy (which I think, rewrites a URL). According to their website it does this:

1. Within the config.txt/ezproxy.cfg file, various hosts are identified that require access from a local IP address.
2. A remote user makes a web connection to port 2048 of your EZproxy server.
3. When the user authenticates successfully, a cookie is sent to the user's browser.
4. The user's browser presents this during each access to EZproxy.

So, for example, if I enter URL 1, EZproxy dynamically changes it to URL 2:
1. http://www.scopus.com/results/...
2. http://www-scopus-com.ezproxy.cul.columbia.edu/results/...

What kind of proxy information should I look for and where do I enter it in the call?

Your help is very much appreciated.

Thanks.

Duncan Temple Lang wrote
Apologies for following up on my own mail, but I forgot
to explicitly mention that you will need to specify the
appropriate proxy information in the call to getURLContent().

  D.

On 6/7/12 8:31 AM, Duncan Temple Lang wrote:
> To just enable cookies and their management, use the cookiefile
> option, e.g.
>
>   txt = getURLContent(url,  cookiefile = "")
>
> Then you can pass this to readHTMLTable(), best done as
>
>   content = readHTMLTable(htmlParse(txt, asText = TRUE))
>
>
> The function readHTMLTable() doesn't use RCurl and doesn't
> handle cookies.
>
>    D.
>
> On 6/7/12 7:33 AM, mdvaan wrote:
>> Hi,
>>
>> I am trying to access a website and read its content. The website is a
>> restricted access website that I access through a proxy server (which
>> therefore requires me to enable cookies). I have problems in allowing Rcurl
>> to receive and send cookies.
>>
>> The following lines give me:
>>
>> library(RCurl)
>> library(XML)
>>
>> url <- "http://www.theurl.com"
>> content <- readHTMLTable(url)
>>
>> content
>> $`NULL`
>>                                                                                                                                                                          
>> V1
>> 1                                                                                                                                                                          
>> 2                                                                                                                                                          
>> Cookies disabled
>> 3                                                                                                                                                                          
>> 4 Your browser currently does not accept cookies.\rCookies need to be
>> enabled for Scopus to function properly.\rPlease enable session cookies in
>> your browser and try again.
>>
>> $`NULL`
>>   V1 V2 V3
>> 1        
>>
>> $`NULL`
>>                 V1
>> 1 Cookies disabled
>>
>> $`NULL`
>>   V1
>> 1  
>> 2  
>> 3  
>>
>> I have carefully read section 4.4. from this:
>> http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without
>> succes:
>>
>> curl <- getCurlHandle()
>> curlSetOpt(cookiejar = 'cookies.txt', curl = curl)
>>
>> Any suggestions on how to allow for cookies?
>>
>> Thanks.
>>
>> Math
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to set cookies in RCurl

mdvaan
In reply to this post by Duncan Temple Lang
Hi,

I am using prof. Temple Lang's suggestions and I think I should be close but with the code below I get an error message which I don't fully get. Any suggestions? Thanks!

Math

library(RCurl)
library(XML)
setwd("C:/Comments")
url <- getURLContent("http://www.scopus.com/results/results.url?sort=plf-f&src=s&sid=M8RcnaPRBgrtA1r_EvZtL7j%3a70&sot=a&sdt=a&sl=32&s=PMID%2811693556%29+OR+PMID%2812239288%29&origin=searchadvanced&txGid=M8RcnaPRBgrtA1r_EvZtL7j%3a7", options(RCurlOptions = list(proxy = "127.0.0.1:2048", proxyuserpwd = "username:password", proxyauth = "gci")),  cookiefile = "/Rcookies")

Error in curlOptions(..., .opts = .opts) :
  unnamed curl option(s): list(RCurlOptions = list(proxy = "127.0.0.1:2048", proxyuserpwd = "username:password", proxyauth = "gci"))


Duncan Temple Lang wrote
Apologies for following up on my own mail, but I forgot
to explicitly mention that you will need to specify the
appropriate proxy information in the call to getURLContent().

  D.

On 6/7/12 8:31 AM, Duncan Temple Lang wrote:
> To just enable cookies and their management, use the cookiefile
> option, e.g.
>
>   txt = getURLContent(url,  cookiefile = "")
>
> Then you can pass this to readHTMLTable(), best done as
>
>   content = readHTMLTable(htmlParse(txt, asText = TRUE))
>
>
> The function readHTMLTable() doesn't use RCurl and doesn't
> handle cookies.
>
>    D.
>
> On 6/7/12 7:33 AM, mdvaan wrote:
>> Hi,
>>
>> I am trying to access a website and read its content. The website is a
>> restricted access website that I access through a proxy server (which
>> therefore requires me to enable cookies). I have problems in allowing Rcurl
>> to receive and send cookies.
>>
>> The following lines give me:
>>
>> library(RCurl)
>> library(XML)
>>
>> url <- "http://www.theurl.com"
>> content <- readHTMLTable(url)
>>
>> content
>> $`NULL`
>>                                                                                                                                                                          
>> V1
>> 1                                                                                                                                                                          
>> 2                                                                                                                                                          
>> Cookies disabled
>> 3                                                                                                                                                                          
>> 4 Your browser currently does not accept cookies.\rCookies need to be
>> enabled for Scopus to function properly.\rPlease enable session cookies in
>> your browser and try again.
>>
>> $`NULL`
>>   V1 V2 V3
>> 1        
>>
>> $`NULL`
>>                 V1
>> 1 Cookies disabled
>>
>> $`NULL`
>>   V1
>> 1  
>> 2  
>> 3  
>>
>> I have carefully read section 4.4. from this:
>> http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without
>> succes:
>>
>> curl <- getCurlHandle()
>> curlSetOpt(cookiejar = 'cookies.txt', curl = curl)
>>
>> Any suggestions on how to allow for cookies?
>>
>> Thanks.
>>
>> Math
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...