Writing Persian (Arabic) in a data frame

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Writing Persian (Arabic) in a data frame

Vahid
I am trying to make a data frame including two vectors. The first vector is
a vector of Persian names, and the second vector is a vector of numbers. My
code is as follows:

A<-data.frame(x=c("مریم","ماریا"),y=c(1,1))
A

But when I run these codes I do not receive my desired output. Indeed the
column of x is not in Persian. The output is like this:

                         x                 y1
<U+0645><U+0631><U+06CC><U+0645> 12
<U+0645><U+0627><U+0631><U+06CC><U+0627> 1

I want to have the column of x in *Persian language*. Could you please help
me how I can do it?

(I should say when I make a vector of Persian names and I run it, I receive
the correct output in Persian, like below:

x=c("مریم","ماریا")
x[1] "مریم"  "ماریا"

But in regard to the data frame I have the above problem)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Writing Persian (Arabic) in a data frame

Duncan Murdoch-2
On 28/07/2020 2:01 a.m., Vahid Borji wrote:

> I am trying to make a data frame including two vectors. The first vector is
> a vector of Persian names, and the second vector is a vector of numbers. My
> code is as follows:
>
> A<-data.frame(x=c("مریم","ماریا"),y=c(1,1))
> A
>
> But when I run these codes I do not receive my desired output. Indeed the
> column of x is not in Persian. The output is like this:
>
>                           x                 y1
> <U+0645><U+0631><U+06CC><U+0645> 12
> <U+0645><U+0627><U+0631><U+06CC><U+0627> 1
>
> I want to have the column of x in *Persian language*. Could you please help
> me how I can do it?

You need to work on a system that uses a UTF-8 locale.  Otherwise R
tries to express strings in the local encoding, finds that won't work,
and shows Unicode escapes instead.

For decades Windows had no UTF-8 locale, so your only choice was to move
to a different OS.  There are rumours now that it finally has one, but I
don't know how to enable it, and I'm not certain that R will handle it
properly:  you may need a very recent version (perhaps unreleased) for R
not to automatically assume that Windows can't do it.

Duncan Murdoch

>
> (I should say when I make a vector of Persian names and I run it, I receive
> the correct output in Persian, like below:
>
> x=c("مریم","ماریا")
> x[1] "مریم"  "ماریا"
>
> But in regard to the data frame I have the above problem)
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Writing Persian (Arabic) in a data frame

Ivan Krylov
In reply to this post by Vahid
On Tue, 28 Jul 2020 10:31:07 +0430
Vahid Borji <[hidden email]> wrote:

> A<-data.frame(x=c("مریم","ماریا"),y=c(1,1))

> The output is like this:
>
>                          x                 y1
> <U+0645><U+0631><U+06CC><U+0645> 12
> <U+0645><U+0627><U+0631><U+06CC><U+0627> 1

This is one of those problems heavily affected by your version of R
(does it have stringsAsFactors = TRUE or FALSE by default?), your
operating system and locale (see [*] for a description of
Unicode-related problems in R on Windows).

Here is a similar problem from 9 years ago where Unicode characters
were displayed as escapes on Windows with US English (ANSI-1251)
locale when data.frame() converted strings to factors:
https://r.789695.n4.nabble.com/gsub-with-unicode-and-escape-character-td3672737.html

--
Best regards,
Ivan

P.S.

> [[alternative HTML version deleted]]

Please post in plain text, not HTML.

[*]
https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Writing Persian (Arabic) in a data frame

John Kane-3
Just to agree with the other responses, note my default encoding is UTF-8.

> A<-data.frame(x=c("مریم","ماریا"),y=c(1,1))
> A
      x y
1  مریم 1
2 ماریا 1
> str(A)
'data.frame': 2 obs. of  2 variables:
 $ x: chr  "مریم" "ماریا"
 $ y: num  1 1
>  sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
 LC_PAPER=en_CA.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

On Tue, 28 Jul 2020 at 08:32, Ivan Krylov <[hidden email]> wrote:

> On Tue, 28 Jul 2020 10:31:07 +0430
> Vahid Borji <[hidden email]> wrote:
>
> > A<-data.frame(x=c("مریم","ماریا"),y=c(1,1))
>
> > The output is like this:
> >
> >                          x                 y1
> > <U+0645><U+0631><U+06CC><U+0645> 12
> > <U+0645><U+0627><U+0631><U+06CC><U+0627> 1
>
> This is one of those problems heavily affected by your version of R
> (does it have stringsAsFactors = TRUE or FALSE by default?), your
> operating system and locale (see [*] for a description of
> Unicode-related problems in R on Windows).
>
> Here is a similar problem from 9 years ago where Unicode characters
> were displayed as escapes on Windows with US English (ANSI-1251)
> locale when data.frame() converted strings to factors:
>
> https://r.789695.n4.nabble.com/gsub-with-unicode-and-escape-character-td3672737.html
>
> --
> Best regards,
> Ivan
>
> P.S.
>
> >       [[alternative HTML version deleted]]
>
> Please post in plain text, not HTML.
>
> [*]
>
> https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
John Kane
Kingston ON Canada

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.