Problem related to multibyte string in CSV file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem related to multibyte string in CSV file

Fisher Dennis
R 3.6.1
OS X

Colleagues,

I read the first line of a CSV file using the readLines command; the only option was n=1 (I am interested in only the first line of the file)
        STRING <- readLines(FILE, n=1)
to which R responded:
        Warning message:
        In readLines(FILE, n = 1) : line 1 appears to contain an embedded nul

I then attempted to determine the number of characters in that string:
        nchar(STRING)
to which R responded:
        Error in nchar(STRING) : invalid multibyte string, element 1
       
I then went to examine the string:
        print(STRING)
        [1] "\xff\xfet”
and:
        cat(STRING, "\n”)
        ??t

I was surprised to see the difference in the output of cat vs. string (see above).  But I assume this results from the multibyte characters.

Now to my question:  I am trying to automate this process and I would like to see the output from the print command but without the [1] that precedes the string.  
If I am working at the command line, RGUI, or RStudio, I can type
        STRING<CR>
However, in a script, I need to preface STRING with either “print” or “cat” (or something else).
Short of writing my own print method, is there any simple way to accomplish this?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Problem related to multibyte string in CSV file

Ivan Krylov
On Thu, 14 Nov 2019 09:34:30 -0800
Dennis Fisher <[hidden email]> wrote:

> Warning message:
> In readLines(FILE, n = 1) : line 1 appears to contain an
> embedded nul

<...>

> print(STRING)
> [1] "\xff\xfet”

Most probably, this means that the FILE is UCS-2LE-encoded (or maybe
UTF-16). Unlike UTF-8, text encoded using UCS-2LE may contain NUL bytes
if the code points in question are U+00FF and below. You should decode
it before processing it in R; one of the examples in ?readLines shows
how to do it:

# read a 'Windows Unicode' file
A <- readLines(con <- file("Unicode.txt", encoding = "UCS-2LE"))
close(con)
 
> Now to my question:  I am trying to automate this process and I would
> like to see the output from the print command but without the [1]
> that precedes the string.

Try encodeString combined with cat or message.

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.