citEntry handling of encoded URLs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

citEntry handling of encoded URLs

Martin Morgan
The following citEntry includes a url with %3A and other encodings

citEntry(entry="article",
          title = "Software for Computing and Annotating Genomic Ranges",
          author = personList( as.person("Michael Lawrence" )),
          year = 2013,
          journal = "{PLoS} Computational Biology",
          volume = "9",
          issue = "8",
          doi = "10.1371/journal.pcbi.1003118",
          url =
"http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
          textVersion = "Lawrence M..." )

Evaluating this as R code doesn't parse correctly and generates a warning

Lawrence M (2013). “Software for Computing and Annotating Genomic
Ranges.” _PLoS Computational Biology_, *9*. <URL:
http://dx.doi.org/10.1371/journal.pcbi.1003118>, <URL:
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118}.>
Warning message:
In parse_Rd(Rd, encoding = encoding, fragment = fragment, ...) :
   <connection>:5: unexpected END_OF_INPUT '
'

A work-around is, apparently, to quote the %, \\%3A etc., but is this the
intention?

Also, citEntry points to bibentry points to *Entry Fields*, but the 'url' tag is
not mentioned there, even though url appears in the examples; if the list of
supported tags is not easy to enumerate, perhaps some insight can be provided at
this point as to how the supported tags are determined?

Thanks

Martin Morgan
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: citEntry handling of encoded URLs

Achim Zeileis-4
On Thu, 22 May 2014, Martin Morgan wrote:

> The following citEntry includes a url with %3A and other encodings
>
> citEntry(entry="article",
>         title = "Software for Computing and Annotating Genomic Ranges",
>         author = personList( as.person("Michael Lawrence" )),
>         year = 2013,
>         journal = "{PLoS} Computational Biology",
>         volume = "9",
>         issue = "8",
>         doi = "10.1371/journal.pcbi.1003118",
>         url =
> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
>         textVersion = "Lawrence M..." )
>
> Evaluating this as R code doesn't parse correctly and generates a warning

The citEntry (or bibentry) itself is parsed without problem. Some printing
styles cause the warning, specifically when the Rd parser is used for
formatting. Depending on how you want to print it, the warning doesn't
occur though. Using bibentry() directly, we can do:

b <- bibentry("Article",
   title = "Software for Computing and Annotating Genomic Ranges",
   author = "Michael Lawrence and others",
   year = "2013",
   journal = "PLoS Comptuational Biology",
   volume = "9",
   number = "8",
   doi = "10.1371/journal.pcbi.1003118",
   url = "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
   textVersion = "Lawrence M et al. (2013) ..."
)

Then the default

print(b)

issues a warning because the Rd parser thinks that the % are comments.
However,

print(b, style = "BibTeX")
print(b, style = "citation")

don't issue warnings and also produce output that one might expect.

> A work-around is, apparently, to quote the %, \\%3A etc., but is this the
> intention?

In that case the default print(b) yields the desired output without
warning but print(b, style = "BibTeX") or print(b, style = "citation") are
possibly not in the desired format. I'm not sure though how the different
BibTeX style files actually handle the URLs. I think some .bst files
handle the "url" field verbatim (i.e., don't need escaping) while others
treat it as text (i.e., need escaping). Personally, I would hence avoid
the problem and only use the DOI URL here as this will be robust across
BibTeX styles.

Nevertheless it is not ideal that there is a discrepancy between the
different printing styles. I think currently this can only be avoided if
custom macros are employed. But Duncan might be able to say more about
this. A similar situation occurs if you use commands that are not part of
the Rd markup, e.g.

n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution",
   author = "Foo Bar", year = "2014")
print(n01) # warning
print(n01, style = "BibTeX") # ok

> Also, citEntry points to bibentry points to *Entry Fields*, but the
> 'url' tag is not mentioned there, even though url appears in the
> examples; if the list of supported tags is not easy to enumerate,
> perhaps some insight can be provided at this point as to how the
> supported tags are determined?

This follows the BibTeX conventions. Thus, you can use any tag that you
wish to use and it will depend on the style whether it is displayed or
not. The only restriction is that certain bibtypes require certain
fields, e.g., an "Article" has to specify: author, title, journal, year.
But beyond that you can add any additional field. For example, in your
bibentry above you used the "issue" field which is ignored by most BibTeX
styles. My adaptation uses the "number" field instead which is processed
by most standard BibTeX styles.

The default print(..., style = "text") uses a bibstyle that is modeled
after jss.bst, the BibTeX style employed by the Journal of Statistical
Software. But you could plug in other .bibstyle arguments, e.g. one that
processes the "issue" field etc.

Hope that helps,
Z

> Thanks
>
> Martin Morgan
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: citEntry handling of encoded URLs

Martin Morgan
On 05/23/2014 05:35 AM, Achim Zeileis wrote:

> On Thu, 22 May 2014, Martin Morgan wrote:
>
>> The following citEntry includes a url with %3A and other encodings
>>
>> citEntry(entry="article",
>>         title = "Software for Computing and Annotating Genomic Ranges",
>>         author = personList( as.person("Michael Lawrence" )),
>>         year = 2013,
>>         journal = "{PLoS} Computational Biology",
>>         volume = "9",
>>         issue = "8",
>>         doi = "10.1371/journal.pcbi.1003118",
>>         url =
>> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
>>
>>         textVersion = "Lawrence M..." )
>>
>> Evaluating this as R code doesn't parse correctly and generates a warning
>
> The citEntry (or bibentry) itself is parsed without problem. Some printing
> styles cause the warning, specifically when the Rd parser is used for
> formatting. Depending on how you want to print it, the warning doesn't occur
> though. Using bibentry() directly, we can do:
>
> b <- bibentry("Article",
>    title = "Software for Computing and Annotating Genomic Ranges",
>    author = "Michael Lawrence and others",
>    year = "2013",
>    journal = "PLoS Comptuational Biology",
>    volume = "9",
>    number = "8",
>    doi = "10.1371/journal.pcbi.1003118",
>    url =
> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
>    textVersion = "Lawrence M et al. (2013) ..."
> )
>
> Then the default
>
> print(b)
>
> issues a warning because the Rd parser thinks that the % are comments. However,
>
> print(b, style = "BibTeX")
> print(b, style = "citation")
>
> don't issue warnings and also produce output that one might expect.

Thanks for clarifying. For what it's worth, I was aiming for

     print(b, style="html")

>> A work-around is, apparently, to quote the %, \\%3A etc., but is this the
>> intention?
>
> In that case the default print(b) yields the desired output without warning but
> print(b, style = "BibTeX") or print(b, style = "citation") are possibly not in
> the desired format. I'm not sure though how the different BibTeX style files
> actually handle the URLs. I think some .bst files handle the "url" field
> verbatim (i.e., don't need escaping) while others treat it as text (i.e., need
> escaping). Personally, I would hence avoid the problem and only use the DOI URL
> here as this will be robust across BibTeX styles.
>
> Nevertheless it is not ideal that there is a discrepancy between the different
> printing styles. I think currently this can only be avoided if custom macros are
> employed. But Duncan might be able to say more about this. A similar situation
> occurs if you use commands that are not part of the Rd markup, e.g.
>
> n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution",
>    author = "Foo Bar", year = "2014")
> print(n01) # warning
> print(n01, style = "BibTeX") # ok
>
>> Also, citEntry points to bibentry points to *Entry Fields*, but the 'url' tag
>> is not mentioned there, even though url appears in the examples; if the list
>> of supported tags is not easy to enumerate, perhaps some insight can be
>> provided at this point as to how the supported tags are determined?
>
> This follows the BibTeX conventions. Thus, you can use any tag that you wish to
> use and it will depend on the style whether it is displayed or not. The only
> restriction is that certain bibtypes require certain fields, e.g., an "Article"
> has to specify: author, title, journal, year. But beyond that you can add any
> additional field. For example, in your bibentry above you used the "issue" field
> which is ignored by most BibTeX styles. My adaptation uses the "number" field
> instead which is processed by most standard BibTeX styles.
>
> The default print(..., style = "text") uses a bibstyle that is modeled after
> jss.bst, the BibTeX style employed by the Journal of Statistical Software. But
> you could plug in other .bibstyle arguments, e.g. one that processes the "issue"
> field etc.
>
> Hope that helps,

Yes, that helps a lot, thanks,

Martin

> Z
>
>> Thanks
>>
>> Martin Morgan
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: citEntry handling of encoded URLs

Duncan Murdoch-2
In reply to this post by Achim Zeileis-4
On 23/05/2014 8:35 AM, Achim Zeileis wrote:

> On Thu, 22 May 2014, Martin Morgan wrote:
>
> > The following citEntry includes a url with %3A and other encodings
> >
> > citEntry(entry="article",
> >         title = "Software for Computing and Annotating Genomic Ranges",
> >         author = personList( as.person("Michael Lawrence" )),
> >         year = 2013,
> >         journal = "{PLoS} Computational Biology",
> >         volume = "9",
> >         issue = "8",
> >         doi = "10.1371/journal.pcbi.1003118",
> >         url =
> > "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
> >         textVersion = "Lawrence M..." )
> >
> > Evaluating this as R code doesn't parse correctly and generates a warning
>
> The citEntry (or bibentry) itself is parsed without problem. Some printing
> styles cause the warning, specifically when the Rd parser is used for
> formatting. Depending on how you want to print it, the warning doesn't
> occur though. Using bibentry() directly, we can do:
>
> b <- bibentry("Article",
>     title = "Software for Computing and Annotating Genomic Ranges",
>     author = "Michael Lawrence and others",
>     year = "2013",
>     journal = "PLoS Comptuational Biology",
>     volume = "9",
>     number = "8",
>     doi = "10.1371/journal.pcbi.1003118",
>     url = "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
>     textVersion = "Lawrence M et al. (2013) ..."
> )
>
> Then the default
>
> print(b)
>
> issues a warning because the Rd parser thinks that the % are comments.
> However,
>
> print(b, style = "BibTeX")
> print(b, style = "citation")
>
> don't issue warnings and also produce output that one might expect.
>
> > A work-around is, apparently, to quote the %, \\%3A etc., but is this the
> > intention?
>
> In that case the default print(b) yields the desired output without
> warning but print(b, style = "BibTeX") or print(b, style = "citation") are
> possibly not in the desired format. I'm not sure though how the different
> BibTeX style files actually handle the URLs. I think some .bst files
> handle the "url" field verbatim (i.e., don't need escaping) while others
> treat it as text (i.e., need escaping). Personally, I would hence avoid
> the problem and only use the DOI URL here as this will be robust across
> BibTeX styles.
>
> Nevertheless it is not ideal that there is a discrepancy between the
> different printing styles. I think currently this can only be avoided if
> custom macros are employed. But Duncan might be able to say more about
> this. A similar situation occurs if you use commands that are not part of
> the Rd markup, e.g.

I'd go further than "not ideal", I think we need to define what kind of
markup is permissible in this context.  If it needs to be Rd markup,
then the default print method should be fixed to hide it (and \mathcal
should not be allowed); if it needs to be plain text, then some escaping
should be done.

>
> n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution",
>     author = "Foo Bar", year = "2014")
> print(n01) # warning
> print(n01, style = "BibTeX") # ok
>
> > Also, citEntry points to bibentry points to *Entry Fields*, but the
> > 'url' tag is not mentioned there, even though url appears in the
> > examples; if the list of supported tags is not easy to enumerate,
> > perhaps some insight can be provided at this point as to how the
> > supported tags are determined?
>
> This follows the BibTeX conventions. Thus, you can use any tag that you
> wish to use and it will depend on the style whether it is displayed or
> not. The only restriction is that certain bibtypes require certain
> fields, e.g., an "Article" has to specify: author, title, journal, year.
> But beyond that you can add any additional field. For example, in your
> bibentry above you used the "issue" field which is ignored by most BibTeX
> styles. My adaptation uses the "number" field instead which is processed
> by most standard BibTeX styles.
>
> The default print(..., style = "text") uses a bibstyle that is modeled
> after jss.bst, the BibTeX style employed by the Journal of Statistical
> Software. But you could plug in other .bibstyle arguments, e.g. one that
> processes the "issue" field etc.
>
> Hope that helps,
> Z
>
> > Thanks
> >
> > Martin Morgan
> > --
> > Computational Biology / Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N.
> > PO Box 19024 Seattle, WA 98109
> >
> > Location: Arnold Building M1 B861
> > Phone: (206) 667-2793
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: citEntry handling of encoded URLs

Achim Zeileis-4
On Fri, 23 May 2014, Duncan Murdoch wrote:

> On 23/05/2014 8:35 AM, Achim Zeileis wrote:
>> On Thu, 22 May 2014, Martin Morgan wrote:
>>
>> > The following citEntry includes a url with %3A and other encodings
>> >
>> > citEntry(entry="article",
>> >         title = "Software for Computing and Annotating Genomic Ranges",
>> >         author = personList( as.person("Michael Lawrence" )),
>> >         year = 2013,
>> >         journal = "{PLoS} Computational Biology",
>> >         volume = "9",
>> >         issue = "8",
>> >         doi = "10.1371/journal.pcbi.1003118",
>> >         url =
>> >
>> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
>> >         textVersion = "Lawrence M..." )
>> >
>> > Evaluating this as R code doesn't parse correctly and generates a warning
>>
>> The citEntry (or bibentry) itself is parsed without problem. Some printing
>> styles cause the warning, specifically when the Rd parser is used for
>> formatting. Depending on how you want to print it, the warning doesn't
>> occur though. Using bibentry() directly, we can do:
>>
>> b <- bibentry("Article",
>>     title = "Software for Computing and Annotating Genomic Ranges",
>>     author = "Michael Lawrence and others",
>>     year = "2013",
>>     journal = "PLoS Comptuational Biology",
>>     volume = "9",
>>     number = "8",
>>     doi = "10.1371/journal.pcbi.1003118",
>>     url =
>> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",
>>     textVersion = "Lawrence M et al. (2013) ..."
>> )
>>
>> Then the default
>>
>> print(b)
>>
>> issues a warning because the Rd parser thinks that the % are comments.
>> However,
>>
>> print(b, style = "BibTeX")
>> print(b, style = "citation")
>>
>> don't issue warnings and also produce output that one might expect.
>>
>> > A work-around is, apparently, to quote the %, \\%3A etc., but is this the
>> > intention?
>>
>> In that case the default print(b) yields the desired output without
>> warning but print(b, style = "BibTeX") or print(b, style = "citation") are
>> possibly not in the desired format. I'm not sure though how the different
>> BibTeX style files actually handle the URLs. I think some .bst files
>> handle the "url" field verbatim (i.e., don't need escaping) while others
>> treat it as text (i.e., need escaping). Personally, I would hence avoid
>> the problem and only use the DOI URL here as this will be robust across
>> BibTeX styles.
>>
>> Nevertheless it is not ideal that there is a discrepancy between the
>> different printing styles. I think currently this can only be avoided if
>> custom macros are employed. But Duncan might be able to say more about
>> this. A similar situation occurs if you use commands that are not part of
>> the Rd markup, e.g.
>
> I'd go further than "not ideal", I think we need to define what kind of
> markup is permissible in this context.  If it needs to be Rd markup,
> then the default print method should be fixed to hide it (and \mathcal
> should not be allowed); if it needs to be plain text, then some escaping
> should be done.

I would argue that any LaTeX-style markup should be permitted in the
bibentry objects so that you can work with your BibTeX files in R. For the
"text" and "html" print output, I would be happy if these used some
approximation for unknown markup, e.g., omitting \mathcal and $ or
something like that. Then only a small subset of LaTeX-style Rd markup
commands would be properly processed.

>> n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution",
>>     author = "Foo Bar", year = "2014")
>> print(n01) # warning
>> print(n01, style = "BibTeX") # ok
>>
>> > Also, citEntry points to bibentry points to *Entry Fields*, but the
>> > 'url' tag is not mentioned there, even though url appears in the
>> > examples; if the list of supported tags is not easy to enumerate,
>> > perhaps some insight can be provided at this point as to how the
>> > supported tags are determined?
>>
>> This follows the BibTeX conventions. Thus, you can use any tag that you
>> wish to use and it will depend on the style whether it is displayed or
>> not. The only restriction is that certain bibtypes require certain
>> fields, e.g., an "Article" has to specify: author, title, journal, year.
>> But beyond that you can add any additional field. For example, in your
>> bibentry above you used the "issue" field which is ignored by most BibTeX
>> styles. My adaptation uses the "number" field instead which is processed
>> by most standard BibTeX styles.
>>
>> The default print(..., style = "text") uses a bibstyle that is modeled
>> after jss.bst, the BibTeX style employed by the Journal of Statistical
>> Software. But you could plug in other .bibstyle arguments, e.g. one that
>> processes the "issue" field etc.
>>
>> Hope that helps,
>> Z
>>
>> > Thanks
>> >
>> > Martin Morgan
>> > --
>> > Computational Biology / Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N.
>> > PO Box 19024 Seattle, WA 98109
>> >
>> > Location: Arnold Building M1 B861
>> > Phone: (206) 667-2793
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel