# citEntry handling of encoded URLs

5 messages
Open this post in threaded view
|

## citEntry handling of encoded URLs

 The following citEntry includes a url with %3A and other encodings citEntry(entry="article",           title = "Software for Computing and Annotating Genomic Ranges",           author = personList( as.person("Michael Lawrence" )),           year = 2013,           journal = "{PLoS} Computational Biology",           volume = "9",           issue = "8",           doi = "10.1371/journal.pcbi.1003118",           url = "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",           textVersion = "Lawrence M..." ) Evaluating this as R code doesn't parse correctly and generates a warning Lawrence M (2013). “Software for Computing and Annotating Genomic Ranges.” _PLoS Computational Biology_, *9*. , Warning message: In parse_Rd(Rd, encoding = encoding, fragment = fragment, ...) :    :5: unexpected END_OF_INPUT ' ' A work-around is, apparently, to quote the %, \\%3A etc., but is this the intention? Also, citEntry points to bibentry points to *Entry Fields*, but the 'url' tag is not mentioned there, even though url appears in the examples; if the list of supported tags is not easy to enumerate, perhaps some insight can be provided at this point as to how the supported tags are determined? Thanks Martin Morgan -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: citEntry handling of encoded URLs

 On Thu, 22 May 2014, Martin Morgan wrote: > The following citEntry includes a url with %3A and other encodings > > citEntry(entry="article", >         title = "Software for Computing and Annotating Genomic Ranges", >         author = personList( as.person("Michael Lawrence" )), >         year = 2013, >         journal = "{PLoS} Computational Biology", >         volume = "9", >         issue = "8", >         doi = "10.1371/journal.pcbi.1003118", >         url = > "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", >         textVersion = "Lawrence M..." ) > > Evaluating this as R code doesn't parse correctly and generates a warning The citEntry (or bibentry) itself is parsed without problem. Some printing styles cause the warning, specifically when the Rd parser is used for formatting. Depending on how you want to print it, the warning doesn't occur though. Using bibentry() directly, we can do: b <- bibentry("Article",    title = "Software for Computing and Annotating Genomic Ranges",    author = "Michael Lawrence and others",    year = "2013",    journal = "PLoS Comptuational Biology",    volume = "9",    number = "8",    doi = "10.1371/journal.pcbi.1003118",    url = "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118",    textVersion = "Lawrence M et al. (2013) ..." ) Then the default print(b) issues a warning because the Rd parser thinks that the % are comments. However, print(b, style = "BibTeX") print(b, style = "citation") don't issue warnings and also produce output that one might expect. > A work-around is, apparently, to quote the %, \\%3A etc., but is this the > intention? In that case the default print(b) yields the desired output without warning but print(b, style = "BibTeX") or print(b, style = "citation") are possibly not in the desired format. I'm not sure though how the different BibTeX style files actually handle the URLs. I think some .bst files handle the "url" field verbatim (i.e., don't need escaping) while others treat it as text (i.e., need escaping). Personally, I would hence avoid the problem and only use the DOI URL here as this will be robust across BibTeX styles. Nevertheless it is not ideal that there is a discrepancy between the different printing styles. I think currently this can only be avoided if custom macros are employed. But Duncan might be able to say more about this. A similar situation occurs if you use commands that are not part of the Rd markup, e.g. n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution",    author = "Foo Bar", year = "2014") print(n01) # warning print(n01, style = "BibTeX") # ok > Also, citEntry points to bibentry points to *Entry Fields*, but the > 'url' tag is not mentioned there, even though url appears in the > examples; if the list of supported tags is not easy to enumerate, > perhaps some insight can be provided at this point as to how the > supported tags are determined? This follows the BibTeX conventions. Thus, you can use any tag that you wish to use and it will depend on the style whether it is displayed or not. The only restriction is that certain bibtypes require certain fields, e.g., an "Article" has to specify: author, title, journal, year. But beyond that you can add any additional field. For example, in your bibentry above you used the "issue" field which is ignored by most BibTeX styles. My adaptation uses the "number" field instead which is processed by most standard BibTeX styles. The default print(..., style = "text") uses a bibstyle that is modeled after jss.bst, the BibTeX style employed by the Journal of Statistical Software. But you could plug in other .bibstyle arguments, e.g. one that processes the "issue" field etc. Hope that helps, Z > Thanks > > Martin Morgan > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: citEntry handling of encoded URLs

 On 05/23/2014 05:35 AM, Achim Zeileis wrote: > On Thu, 22 May 2014, Martin Morgan wrote: > >> The following citEntry includes a url with %3A and other encodings >> >> citEntry(entry="article", >>         title = "Software for Computing and Annotating Genomic Ranges", >>         author = personList( as.person("Michael Lawrence" )), >>         year = 2013, >>         journal = "{PLoS} Computational Biology", >>         volume = "9", >>         issue = "8", >>         doi = "10.1371/journal.pcbi.1003118", >>         url = >> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", >> >>         textVersion = "Lawrence M..." ) >> >> Evaluating this as R code doesn't parse correctly and generates a warning > > The citEntry (or bibentry) itself is parsed without problem. Some printing > styles cause the warning, specifically when the Rd parser is used for > formatting. Depending on how you want to print it, the warning doesn't occur > though. Using bibentry() directly, we can do: > > b <- bibentry("Article", >    title = "Software for Computing and Annotating Genomic Ranges", >    author = "Michael Lawrence and others", >    year = "2013", >    journal = "PLoS Comptuational Biology", >    volume = "9", >    number = "8", >    doi = "10.1371/journal.pcbi.1003118", >    url = > "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", >    textVersion = "Lawrence M et al. (2013) ..." > ) > > Then the default > > print(b) > > issues a warning because the Rd parser thinks that the % are comments. However, > > print(b, style = "BibTeX") > print(b, style = "citation") > > don't issue warnings and also produce output that one might expect. Thanks for clarifying. For what it's worth, I was aiming for      print(b, style="html") >> A work-around is, apparently, to quote the %, \\%3A etc., but is this the >> intention? > > In that case the default print(b) yields the desired output without warning but > print(b, style = "BibTeX") or print(b, style = "citation") are possibly not in > the desired format. I'm not sure though how the different BibTeX style files > actually handle the URLs. I think some .bst files handle the "url" field > verbatim (i.e., don't need escaping) while others treat it as text (i.e., need > escaping). Personally, I would hence avoid the problem and only use the DOI URL > here as this will be robust across BibTeX styles. > > Nevertheless it is not ideal that there is a discrepancy between the different > printing styles. I think currently this can only be avoided if custom macros are > employed. But Duncan might be able to say more about this. A similar situation > occurs if you use commands that are not part of the Rd markup, e.g. > > n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution", >    author = "Foo Bar", year = "2014") > print(n01) # warning > print(n01, style = "BibTeX") # ok > >> Also, citEntry points to bibentry points to *Entry Fields*, but the 'url' tag >> is not mentioned there, even though url appears in the examples; if the list >> of supported tags is not easy to enumerate, perhaps some insight can be >> provided at this point as to how the supported tags are determined? > > This follows the BibTeX conventions. Thus, you can use any tag that you wish to > use and it will depend on the style whether it is displayed or not. The only > restriction is that certain bibtypes require certain fields, e.g., an "Article" > has to specify: author, title, journal, year. But beyond that you can add any > additional field. For example, in your bibentry above you used the "issue" field > which is ignored by most BibTeX styles. My adaptation uses the "number" field > instead which is processed by most standard BibTeX styles. > > The default print(..., style = "text") uses a bibstyle that is modeled after > jss.bst, the BibTeX style employed by the Journal of Statistical Software. But > you could plug in other .bibstyle arguments, e.g. one that processes the "issue" > field etc. > > Hope that helps, Yes, that helps a lot, thanks, Martin > Z > >> Thanks >> >> Martin Morgan >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel>> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
 In reply to this post by Achim Zeileis-4 On 23/05/2014 8:35 AM, Achim Zeileis wrote: > On Thu, 22 May 2014, Martin Morgan wrote: > > > The following citEntry includes a url with %3A and other encodings > > > > citEntry(entry="article", > >         title = "Software for Computing and Annotating Genomic Ranges", > >         author = personList( as.person("Michael Lawrence" )), > >         year = 2013, > >         journal = "{PLoS} Computational Biology", > >         volume = "9", > >         issue = "8", > >         doi = "10.1371/journal.pcbi.1003118", > >         url = > > "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", > >         textVersion = "Lawrence M..." ) > > > > Evaluating this as R code doesn't parse correctly and generates a warning > > The citEntry (or bibentry) itself is parsed without problem. Some printing > styles cause the warning, specifically when the Rd parser is used for > formatting. Depending on how you want to print it, the warning doesn't > occur though. Using bibentry() directly, we can do: > > b <- bibentry("Article", >     title = "Software for Computing and Annotating Genomic Ranges", >     author = "Michael Lawrence and others", >     year = "2013", >     journal = "PLoS Comptuational Biology", >     volume = "9", >     number = "8", >     doi = "10.1371/journal.pcbi.1003118", >     url = "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", >     textVersion = "Lawrence M et al. (2013) ..." > ) > > Then the default > > print(b) > > issues a warning because the Rd parser thinks that the % are comments. > However, > > print(b, style = "BibTeX") > print(b, style = "citation") > > don't issue warnings and also produce output that one might expect. > > > A work-around is, apparently, to quote the %, \\%3A etc., but is this the > > intention? > > In that case the default print(b) yields the desired output without > warning but print(b, style = "BibTeX") or print(b, style = "citation") are > possibly not in the desired format. I'm not sure though how the different > BibTeX style files actually handle the URLs. I think some .bst files > handle the "url" field verbatim (i.e., don't need escaping) while others > treat it as text (i.e., need escaping). Personally, I would hence avoid > the problem and only use the DOI URL here as this will be robust across > BibTeX styles. > > Nevertheless it is not ideal that there is a discrepancy between the > different printing styles. I think currently this can only be avoided if > custom macros are employed. But Duncan might be able to say more about > this. A similar situation occurs if you use commands that are not part of > the Rd markup, e.g. I'd go further than "not ideal", I think we need to define what kind of markup is permissible in this context.  If it needs to be Rd markup, then the default print method should be fixed to hide it (and \mathcal should not be allowed); if it needs to be plain text, then some escaping should be done. > > n01 <- bibentry("Misc", title = "The $\\mathcal{N}(0, 1)$ Distribution", >     author = "Foo Bar", year = "2014") > print(n01) # warning > print(n01, style = "BibTeX") # ok > > > Also, citEntry points to bibentry points to *Entry Fields*, but the > > 'url' tag is not mentioned there, even though url appears in the > > examples; if the list of supported tags is not easy to enumerate, > > perhaps some insight can be provided at this point as to how the > > supported tags are determined? > > This follows the BibTeX conventions. Thus, you can use any tag that you > wish to use and it will depend on the style whether it is displayed or > not. The only restriction is that certain bibtypes require certain > fields, e.g., an "Article" has to specify: author, title, journal, year. > But beyond that you can add any additional field. For example, in your > bibentry above you used the "issue" field which is ignored by most BibTeX > styles. My adaptation uses the "number" field instead which is processed > by most standard BibTeX styles. > > The default print(..., style = "text") uses a bibstyle that is modeled > after jss.bst, the BibTeX style employed by the Journal of Statistical > Software. But you could plug in other .bibstyle arguments, e.g. one that > processes the "issue" field etc. > > Hope that helps, > Z > > > Thanks > > > > Martin Morgan > > -- > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. > > PO Box 19024 Seattle, WA 98109 > > > > Location: Arnold Building M1 B861 > > Phone: (206) 667-2793 > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel> > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
 On Fri, 23 May 2014, Duncan Murdoch wrote: > On 23/05/2014 8:35 AM, Achim Zeileis wrote: >> On Thu, 22 May 2014, Martin Morgan wrote: >> >> > The following citEntry includes a url with %3A and other encodings >> > >> > citEntry(entry="article", >> >         title = "Software for Computing and Annotating Genomic Ranges", >> >         author = personList( as.person("Michael Lawrence" )), >> >         year = 2013, >> >         journal = "{PLoS} Computational Biology", >> >         volume = "9", >> >         issue = "8", >> >         doi = "10.1371/journal.pcbi.1003118", >> >         url = >> > >> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", >> >         textVersion = "Lawrence M..." ) >> > >> > Evaluating this as R code doesn't parse correctly and generates a warning >> >> The citEntry (or bibentry) itself is parsed without problem. Some printing >> styles cause the warning, specifically when the Rd parser is used for >> formatting. Depending on how you want to print it, the warning doesn't >> occur though. Using bibentry() directly, we can do: >> >> b <- bibentry("Article", >>     title = "Software for Computing and Annotating Genomic Ranges", >>     author = "Michael Lawrence and others", >>     year = "2013", >>     journal = "PLoS Comptuational Biology", >>     volume = "9", >>     number = "8", >>     doi = "10.1371/journal.pcbi.1003118", >>     url = >> "http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118", >>     textVersion = "Lawrence M et al. (2013) ..." >> ) >> >> Then the default >> >> print(b) >> >> issues a warning because the Rd parser thinks that the % are comments. >> However, >> >> print(b, style = "BibTeX") >> print(b, style = "citation") >> >> don't issue warnings and also produce output that one might expect. >> >> > A work-around is, apparently, to quote the %, \\%3A etc., but is this the >> > intention? >> >> In that case the default print(b) yields the desired output without >> warning but print(b, style = "BibTeX") or print(b, style = "citation") are >> possibly not in the desired format. I'm not sure though how the different >> BibTeX style files actually handle the URLs. I think some .bst files >> handle the "url" field verbatim (i.e., don't need escaping) while others >> treat it as text (i.e., need escaping). Personally, I would hence avoid >> the problem and only use the DOI URL here as this will be robust across >> BibTeX styles. >> >> Nevertheless it is not ideal that there is a discrepancy between the >> different printing styles. I think currently this can only be avoided if >> custom macros are employed. But Duncan might be able to say more about >> this. A similar situation occurs if you use commands that are not part of >> the Rd markup, e.g. > > I'd go further than "not ideal", I think we need to define what kind of > markup is permissible in this context.  If it needs to be Rd markup, > then the default print method should be fixed to hide it (and \mathcal > should not be allowed); if it needs to be plain text, then some escaping > should be done. I would argue that any LaTeX-style markup should be permitted in the bibentry objects so that you can work with your BibTeX files in R. For the "text" and "html" print output, I would be happy if these used some approximation for unknown markup, e.g., omitting \mathcal and $or something like that. Then only a small subset of LaTeX-style Rd markup commands would be properly processed. >> n01 <- bibentry("Misc", title = "The$\\mathcal{N}(0, 1)\$ Distribution", >>     author = "Foo Bar", year = "2014") >> print(n01) # warning >> print(n01, style = "BibTeX") # ok >> >> > Also, citEntry points to bibentry points to *Entry Fields*, but the >> > 'url' tag is not mentioned there, even though url appears in the >> > examples; if the list of supported tags is not easy to enumerate, >> > perhaps some insight can be provided at this point as to how the >> > supported tags are determined? >> >> This follows the BibTeX conventions. Thus, you can use any tag that you >> wish to use and it will depend on the style whether it is displayed or >> not. The only restriction is that certain bibtypes require certain >> fields, e.g., an "Article" has to specify: author, title, journal, year. >> But beyond that you can add any additional field. For example, in your >> bibentry above you used the "issue" field which is ignored by most BibTeX >> styles. My adaptation uses the "number" field instead which is processed >> by most standard BibTeX styles. >> >> The default print(..., style = "text") uses a bibstyle that is modeled >> after jss.bst, the BibTeX style employed by the Journal of Statistical >> Software. But you could plug in other .bibstyle arguments, e.g. one that >> processes the "issue" field etc. >> >> Hope that helps, >> Z >> >> > Thanks >> > >> > Martin Morgan >> > -- >> > Computational Biology / Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N. >> > PO Box 19024 Seattle, WA 98109 >> > >> > Location: Arnold Building M1 B861 >> > Phone: (206) 667-2793 >> > >> > ______________________________________________ >> > [hidden email] mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel>> > >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel> > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel