Quantcast

Find out what "native.enc" corresponds to

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Find out what "native.enc" corresponds to

Milan Bouchet-Valat
Hi!

I'm using R2HTML in my RcmdrPlugin.temis package to output localized
strings to a HTML file. Thus, I insert a simple header at the top of the
file to specify what encoding is used; if I don't do that, Web browsers
assume it is latin1, which is not always true.

My problem is, I could not find a way to detect what encoding is used by
R2HTML in the most general case. R2HTML simply calls cat() with the file
name, which means the text connection is opened using file(encoding =
getOption("encoding")). This is fine, except that when
getOption("encoding")) is set to "native.enc", I'm not able to find out
the real encoding that was used for output.

Of course, ideally I would tell R2HTML to output everything as UTF-8,
and I would add this information to the header. But AFAICT this is not
possible in the current state of this package. So I would be very
grateful if somebody could provide me with a solution to resolve
"native.enc" to the encoding name.

Thanks for your help

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find out what "native.enc" corresponds to

Prof Brian Ripley
On 05/08/2012 09:54, Milan Bouchet-Valat wrote:

> Hi!
>
> I'm using R2HTML in my RcmdrPlugin.temis package to output localized
> strings to a HTML file. Thus, I insert a simple header at the top of the
> file to specify what encoding is used; if I don't do that, Web browsers
> assume it is latin1, which is not always true.
>
> My problem is, I could not find a way to detect what encoding is used by
> R2HTML in the most general case. R2HTML simply calls cat() with the file
> name, which means the text connection is opened using file(encoding =
> getOption("encoding")). This is fine, except that when
> getOption("encoding")) is set to "native.enc", I'm not able to find out
> the real encoding that was used for output.
>
> Of course, ideally I would tell R2HTML to output everything as UTF-8,
> and I would add this information to the header. But AFAICT this is not
> possible in the current state of this package. So I would be very
> grateful if somebody could provide me with a solution to resolve
> "native.enc" to the encoding name.

?options points you to ?connections, which does explain this.  See
Sys.getlocale("LC_CTYPE") to see

'the internal encoding of the current locale'

(or at least, what the OS claims it to be: e.g. some lie about 'C' locales).

As for a name, iconv() knows this as "" (and some OSes do make it rather
hard to find a name if it is not part of the locale name).

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find out what "native.enc" corresponds to

Milan Bouchet-Valat
Le dimanche 05 août 2012 à 10:04 +0100, Prof Brian Ripley a écrit :

> On 05/08/2012 09:54, Milan Bouchet-Valat wrote:
> > Hi!
> >
> > I'm using R2HTML in my RcmdrPlugin.temis package to output localized
> > strings to a HTML file. Thus, I insert a simple header at the top of the
> > file to specify what encoding is used; if I don't do that, Web browsers
> > assume it is latin1, which is not always true.
> >
> > My problem is, I could not find a way to detect what encoding is used by
> > R2HTML in the most general case. R2HTML simply calls cat() with the file
> > name, which means the text connection is opened using file(encoding =
> > getOption("encoding")). This is fine, except that when
> > getOption("encoding")) is set to "native.enc", I'm not able to find out
> > the real encoding that was used for output.
> >
> > Of course, ideally I would tell R2HTML to output everything as UTF-8,
> > and I would add this information to the header. But AFAICT this is not
> > possible in the current state of this package. So I would be very
> > grateful if somebody could provide me with a solution to resolve
> > "native.enc" to the encoding name.
>
> ?options points you to ?connections, which does explain this.  See
> Sys.getlocale("LC_CTYPE") to see
>
> 'the internal encoding of the current locale'
>
> (or at least, what the OS claims it to be: e.g. some lie about 'C' locales).
Thanks for the pointers, but the issue is/was that LC_CTYPE does not
provide a valid encoding name. But your reply prompted me to read ?iconv
again, and I discovered the existence of localeToCharset(), which seems
to provide me with the encoding name I'm looking for.

> As for a name, iconv() knows this as "" (and some OSes do make it rather
> hard to find a name if it is not part of the locale name).
I'm afraid I don't understand what you mean. Do you suggest I encode
data to/from the current encoding?


Regards

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...