Problem with ONE of the Special German Characters

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with ONE of the Special German Characters

Michael Stegh
Dear List,

I have data which contain the special German characters "ä", "ö", "ü" etc. After reading the
text files into R those characters are displayed strangely, e. g. "ä" is  "ä". The first step is to
replace those with their typical transcription, e. g. "ä" becomes "ae" by using the gsub
command.

Until I upgraded to version 2.10.1 (from 2.8.0) this worked perfectly for all characters. Now it
works for all characters but "Ü".

temp1<-gsub("Ãoe","Ue",temp1)

This letter is displayed as "Ãoe" (as before), but R is no longer able to find this character. The
problem seems to be linked to the "oe" part, since I could substitute for "Ã" without a problem.
Strangely if I get the two characters by extracting them with the substr command to a variable
and then using the variable I am able to substitute without a problem. Any ideas, what I am
missing?

Thanks,

Michael

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with ONE of the Special German Characters

Duncan Murdoch
On 15/04/2010 12:22 PM, Michael Stegh wrote:
> Dear List,
>
> I have data which contain the special German characters "ä", "ö", "ü" etc. After reading the
> text files into R those characters are displayed strangely, e. g. "ä" is  "ä". The first step is to
> replace those with their typical transcription, e. g. "ä" becomes "ae" by using the gsub
> command.
>  

Your example of  "ä" is what you would see if you stored it in UTF-8
encoding, then read it in Latin1.  So I suspect you need to declare the
encoding of the files you are reading before reading them.  You can do
this as follows:

con <- file("foo.txt", encoding="UTF-8", open="r")
readLines(con)
close(con)

By default, R assumes the encoding of files matches the default encoding
on your system.
> Until I upgraded to version 2.10.1 (from 2.8.0) this worked perfectly for all characters. Now it
> works for all characters but "Ü".
>
> temp1<-gsub("Ãoe","Ue",temp1)
>  

You might want to try perl=TRUE in the gsub() call; it seems to handle
strange characters in regular expressions better than the default TRE
library does.

Duncan Murdoch

> This letter is displayed as "Ãoe" (as before), but R is no longer able to find this character. The
> problem seems to be linked to the "oe" part, since I could substitute for "Ã" without a problem.
> Strangely if I get the two characters by extracting them with the substr command to a variable
> and then using the variable I am able to substitute without a problem. Any ideas, what I am
> missing?
>
> Thanks,
>
> Michael
>
> [[alternative HTML version deleted]]
>
>  
> ------------------------------------------------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.