postForm() in RCurl and library RHTMLForms

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

postForm() in RCurl and library RHTMLForms

sayan dasgupta
Hi RUsers,

Suppose I want to see the data on the website
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"

for the index "S&P CNX NIFTY" for
dates "FromDate"="01-11-2010","ToDate"="02-11-2010"

then read the html table from the page using readHTMLtable()

I am using this code
webpage <- postForm(url,.params=list(
                       "FromDate"="01-11-2010",
                       "ToDate"="02-11-2010",
                       "IndexType"="S&P CNX NIFTY",
                       "Indicesdata"="Get Details"),
                 .opts=list(useragent = getOption("HTTPUserAgent")))

But it doesn't give me desired result

Also I was trying to use the function getHTMLFormDescription from the
package RHTMLForms but there we can't use the argument
.opts=list(useragent = getOption("HTTPUserAgent")) which is needed for this
particular website


Thanks and Regards
Sayan Dasgupta

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

Santosh Srinivas
I don’t have the implementation in the way you want it …. Sorry … but
someone here will definitely know

The group showed me to do it this way though ….

library(zoo)
library("RCurl")

sNiftyURL =
"http://nseindia.com/content/indices/histdata/S&P%20CNX%20NIFTY01-01-2000-02
-11-2010.csv"
Nifty_Dat = getURLContent(sNiftyURL, verbose = TRUE, useragent =
getOption("HTTPUserAgent"))
tblNifty <- read.csv(textConnection(Nifty_Dat))
tblNifty <- subset(tblNifty,select=c(Date,Close))
tblNifty$Date <- as.Date(tblNifty$Date, format ="%d-%b-%Y")
tblNifty <-read.zoo((tblNifty))
closeAllConnections()

HTH.
S

From: sayan dasgupta [mailto:[hidden email]]
Sent: 04 November 2010 15:09
To: [hidden email]
Cc: [hidden email]; [hidden email]
Subject: postForm() in RCurl and library RHTMLForms

Hi RUsers,

Suppose I want to see the data on the website 
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"

for the index "S&P CNX NIFTY" for
dates "FromDate"="01-11-2010","ToDate"="02-11-2010"

then read the html table from the page using readHTMLtable()

I am using this code 
webpage <- postForm(url,.params=list(
                       "FromDate"="01-11-2010",
                       "ToDate"="02-11-2010",
                       "IndexType"="S&P CNX NIFTY",
                       "Indicesdata"="Get Details"),
                 .opts=list(useragent = getOption("HTTPUserAgent")))

But it doesn't give me desired result 

Also I was trying to use the function getHTMLFormDescription from the
package RHTMLForms but there we can't use the argument 
.opts=list(useragent = getOption("HTTPUserAgent")) which is needed for this
particular website 


Thanks and Regards
Sayan Dasgupta

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

Duncan Temple Lang
In reply to this post by sayan dasgupta


On 11/4/10 2:39 AM, sayan dasgupta wrote:

> Hi RUsers,
>
> Suppose I want to see the data on the website
> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>
> for the index "S&P CNX NIFTY" for
> dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
>
> then read the html table from the page using readHTMLtable()
>
> I am using this code
> webpage <- postForm(url,.params=list(
>                        "FromDate"="01-11-2010",
>                        "ToDate"="02-11-2010",
>                        "IndexType"="S&P CNX NIFTY",
>                        "Indicesdata"="Get Details"),
>                  .opts=list(useragent = getOption("HTTPUserAgent")))
>
> But it doesn't give me desired result

You need to be more specific about how it fails to give the desired result.

You are in fact posting to the wrong URL. The form is submitted to a different
URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp



>
> Also I was trying to use the function getHTMLFormDescription from the
> package RHTMLForms but there we can't use the argument
> .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for this
> particular website

That's not the case. The function RHTMLForms will generate for you does support
the .opts parameter.

What you want is something along the lines:


 # Set default options for RCurl
 # requests
options(RCurlOptions = list(useragent = "R"))
library(RCurl)

 # Read the HTML page since we cannot use htmlParse() directly
 # as it does not specify the user agent or an
 # Accept:*.*

url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
wp = getURLContent(url)

 # Now that we have the page, parse it and use the RHTMLForms
 # package to create an R function that will act as an interface
 # to the form.
library(RHTMLForms)
library(XML)
doc = htmlParse(wp, asText = TRUE)
  # need to set the URL for this document since we read it from
  # text, rather than from the URL directly

docName(doc) = url

  # Create the form description and generate the R
  # function "call" the

form = getHTMLFormDescription(doc)[[1]]
fun = createFunction(form)


  # now we can invoke the form from R. We only need 2
  # inputs  - FromDate and ToDate

o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")

  # Having looked at the tables, I think we want the the 3rd
  # one.
table = readHTMLTable(htmlParse(o, asText = TRUE),
                        which = 3,
                        header = TRUE,
                        stringsAsFactors = FALSE)
table




Yes it is marginally involved. But that is because we cannot simply read
the HTML document directly from htmlParse() because the lack of Accept(& useragent)
HTTP header.

>
>
> Thanks and Regards
> Sayan Dasgupta
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

sayan dasgupta
Thanks a lot thats exactly what I was looking for

Just a quick question I agree the form gets submitted to the URL
"http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp"

and I am filling up the form in the page
"http://www.nseindia.com/content/indices/ind_histvalues.htm"

How do I submit the arguments like FromDate, ToDate, Symbol using postForm()
and submit the query to get the similar table.







On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang
<[hidden email]>wrote:

>
>
> On 11/4/10 2:39 AM, sayan dasgupta wrote:
> > Hi RUsers,
> >
> > Suppose I want to see the data on the website
> > url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
> >
> > for the index "S&P CNX NIFTY" for
> > dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
> >
> > then read the html table from the page using readHTMLtable()
> >
> > I am using this code
> > webpage <- postForm(url,.params=list(
> >                        "FromDate"="01-11-2010",
> >                        "ToDate"="02-11-2010",
> >                        "IndexType"="S&P CNX NIFTY",
> >                        "Indicesdata"="Get Details"),
> >                  .opts=list(useragent = getOption("HTTPUserAgent")))
> >
> > But it doesn't give me desired result
>
> You need to be more specific about how it fails to give the desired result.
>
> You are in fact posting to the wrong URL. The form is submitted to a
> different
> URL -
> http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp
>
>
>
> >
> > Also I was trying to use the function getHTMLFormDescription from the
> > package RHTMLForms but there we can't use the argument
> > .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
> this
> > particular website
>
> That's not the case. The function RHTMLForms will generate for you does
> support
> the .opts parameter.
>
> What you want is something along the lines:
>
>
>  # Set default options for RCurl
>  # requests
> options(RCurlOptions = list(useragent = "R"))
> library(RCurl)
>
>  # Read the HTML page since we cannot use htmlParse() directly
>  # as it does not specify the user agent or an
>  # Accept:*.*
>
> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
> wp = getURLContent(url)
>
>  # Now that we have the page, parse it and use the RHTMLForms
>  # package to create an R function that will act as an interface
>  # to the form.
> library(RHTMLForms)
> library(XML)
> doc = htmlParse(wp, asText = TRUE)
>  # need to set the URL for this document since we read it from
>  # text, rather than from the URL directly
>
> docName(doc) = url
>
>  # Create the form description and generate the R
>  # function "call" the
>
> form = getHTMLFormDescription(doc)[[1]]
> fun = createFunction(form)
>
>
>  # now we can invoke the form from R. We only need 2
>  # inputs  - FromDate and ToDate
>
> o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")
>
>  # Having looked at the tables, I think we want the the 3rd
>  # one.
> table = readHTMLTable(htmlParse(o, asText = TRUE),
>                        which = 3,
>                        header = TRUE,
>                        stringsAsFactors = FALSE)
> table
>
>
>
>
> Yes it is marginally involved. But that is because we cannot simply read
> the HTML document directly from htmlParse() because the lack of Accept(&
> useragent)
> HTTP header.
>
> >
> >
> > Thanks and Regards
> > Sayan Dasgupta
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

Duncan Temple Lang


On 11/4/10 11:31 PM, sayan dasgupta wrote:

> Thanks a lot thats exactly what I was looking for
>
> Just a quick question I agree the form gets submitted to the URL
> "http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp"
>
> and I am filling up the form in the page
> "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>
> How do I submit the arguments like FromDate, ToDate, Symbol using postForm()
> and submit the query to get the similar table.
>

Well that is what the function that RHTMLForms creates does.
So you can look at that code and see that it calls formQuery()
which ends in a call to postForm(). You could use

   debug(postForm)

and examine the arguments to it.

postForm("...jsp", FromDate = "10-"


The answer is

o = postForm("http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp",
              FromDate = "01-11-2010", ToDate = "04-11-2010",
              IndexType = "S&P CNX NIFTY", check = "new",
             style = "POST" )


>
>
>
>
>
>
> On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang
> <[hidden email]>wrote:
>
>>
>>
>> On 11/4/10 2:39 AM, sayan dasgupta wrote:
>>> Hi RUsers,
>>>
>>> Suppose I want to see the data on the website
>>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>>>
>>> for the index "S&P CNX NIFTY" for
>>> dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
>>>
>>> then read the html table from the page using readHTMLtable()
>>>
>>> I am using this code
>>> webpage <- postForm(url,.params=list(
>>>                        "FromDate"="01-11-2010",
>>>                        "ToDate"="02-11-2010",
>>>                        "IndexType"="S&P CNX NIFTY",
>>>                        "Indicesdata"="Get Details"),
>>>                  .opts=list(useragent = getOption("HTTPUserAgent")))
>>>
>>> But it doesn't give me desired result
>>
>> You need to be more specific about how it fails to give the desired result.
>>
>> You are in fact posting to the wrong URL. The form is submitted to a
>> different
>> URL -
>> http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp
>>
>>
>>
>>>
>>> Also I was trying to use the function getHTMLFormDescription from the
>>> package RHTMLForms but there we can't use the argument
>>> .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
>> this
>>> particular website
>>
>> That's not the case. The function RHTMLForms will generate for you does
>> support
>> the .opts parameter.
>>
>> What you want is something along the lines:
>>
>>
>>  # Set default options for RCurl
>>  # requests
>> options(RCurlOptions = list(useragent = "R"))
>> library(RCurl)
>>
>>  # Read the HTML page since we cannot use htmlParse() directly
>>  # as it does not specify the user agent or an
>>  # Accept:*.*
>>
>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>> wp = getURLContent(url)
>>
>>  # Now that we have the page, parse it and use the RHTMLForms
>>  # package to create an R function that will act as an interface
>>  # to the form.
>> library(RHTMLForms)
>> library(XML)
>> doc = htmlParse(wp, asText = TRUE)
>>  # need to set the URL for this document since we read it from
>>  # text, rather than from the URL directly
>>
>> docName(doc) = url
>>
>>  # Create the form description and generate the R
>>  # function "call" the
>>
>> form = getHTMLFormDescription(doc)[[1]]
>> fun = createFunction(form)
>>
>>
>>  # now we can invoke the form from R. We only need 2
>>  # inputs  - FromDate and ToDate
>>
>> o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")
>>
>>  # Having looked at the tables, I think we want the the 3rd
>>  # one.
>> table = readHTMLTable(htmlParse(o, asText = TRUE),
>>                        which = 3,
>>                        header = TRUE,
>>                        stringsAsFactors = FALSE)
>> table
>>
>>
>>
>>
>> Yes it is marginally involved. But that is because we cannot simply read
>> the HTML document directly from htmlParse() because the lack of Accept(&
>> useragent)
>> HTTP header.
>>
>>>
>>>
>>> Thanks and Regards
>>> Sayan Dasgupta
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

veepsirtt
This post was updated on .
In reply to this post by Duncan Temple Lang
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

veepsirtt
This post has NOT been accepted by the mailing list yet.
Hi Rusers
While fetching the data from this url

http://nseindia.com/products/content/equities/indices/historical_index_data.htm

the following errors occurs .
Let me know how to correct it.
with regards
veepsirtt
---------------------------------------------------------------------------------------------------------------

 #install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")
>
> library(RHTMLForms)
> options(RCurlOptions = list(useragent = "R"))
> library(RCurl)
> library(XML)  
> library(timeDate)
>
>
>
> urlNifty <- "http://nseindia.com/products/content/equities/indices/historical_index_data.htm";
> contentNifty = getURLContent(urlNifty)
>  # Now that we have the page, parse it and use the RHTMLForms
>  # package to create an R function that will act as an interface
>  # to the form.
>
> docNifty = htmlParse(contentNifty, asText = TRUE)
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
Attribute id redefined
Unexpected end tag : span
>   # need to set the URL for this document since we read it from
>   # text, rather than from the URL directly
>
> docName(docNifty) = urlNifty
>   # Create the form description and generate the R
>   # function "call" the
>
> formNifty = getHTMLFormDescription(docNifty)[[1]]
Error in getHTMLFormDescription(docNifty)[[1]] : subscript out of bounds
> funNifty = createFunction(formNifty)
Error in inherits(formDescription, "HTMLFormDescription") :
  object 'formNifty' not found
>
-----------------------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

veepsirtt
This post has NOT been accepted by the mailing list yet.

library(RCurl)
fii = getForm("http://www.bseindia.com/histdata/categorywise_turnover.asp",
        yyy="2012",
        mmm="8",
     .opts = list(followlocation = TRUE,ssl.verifypeer = FALSE , verbose = TRUE))
tbl<-read.csv(textConnection(fii),skip=2,header=TRUE)
tbl

Help me  to get the data for the month of August 2012.
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

veepsirtt
This post has NOT been accepted by the mailing list yet.
Hi all

This gives me current month's data.
How to get the previous month's data?.

library(XML)
url <- "http://www.bseindia.com/histdata/categorywise_turnover.asp"
total <- readHTMLTable(url)
n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
df<-as.data.frame(total[[which.max(n.rows)]])
na.omit(df)
df

thanks
veepsirtt
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

veepsirtt
This post has NOT been accepted by the mailing list yet.
In reply to this post by veepsirtt
Hi
How to pass the parameters for the year='2012" and month="August" to this url?.
It gives me no tables.why?.
thanks
veepsirtt

options(RCurlOptions = list(useragent = "R"))
library(RCurl)
url <- "http://www.bseindia.com/histdata/categorywise_turnover.asp"
wp = getURLContent(url)

library(RHTMLForms)
library(XML)
doc = htmlParse(wp, asText = TRUE)
form = getHTMLFormDescription(doc)[[1]]
fun = createFunction(form)
 o = fun(mmm = "9", yyy = "2012",url="http://www.bseindia.com/histdata/categorywise_turnover.asp")

table = readHTMLTable(htmlParse(o, asText = TRUE),                        
                        header = TRUE,
                        stringsAsFactors = FALSE)
table
Reply | Threaded
Open this post in threaded view
|

Re: postForm() in RCurl and library RHTMLForms

veepsirtt
In reply to this post by veepsirtt
Why I am getting this error?
Error in getHTMLFormDescription(docNifty)[[1]] : subscript out of bounds