*except* that the data may contain the LF character which R treats as
end-of-line and then barfs that there are too few elements on that line.
Any suggestions for how to process this one efficiently in R? There is
probably a solution using read.table(..., nrows = 1, ...) to get the
header, split it on '@', build a list with that many character(0)
elements, and then using scan(..., multi.line=TRUE, ...) ..... but that
all sounds very complicated.
Ugly, but it worked for me. You can change the first perl regular
expression to do different things with line terminating \n versus
in-field \n characters but I just dropped them all. The tail command
drops the byte-order-mark (which we do not need for utf-8) and the
second perl command drops blanks and a stupid SQL tool output.
Thanks to Prof. Brian Ripley who, essentially, pointed out that with
embedded linefeed characters my file was a binary file and not really a
text file. Her Majesty's government respectfully begs to disagree 
but that's the R definition so we'll use it on this list.
> I have a text file that is UTF-16LE encoded with CRLF line endings and
> '@' as field separators that I want to read in R on a Linux system.
> Which would be fine as
> read.table("foo.txt", file.encoding = "UTF-16LE", sep = "@", ...)
> *except* that the data may contain the LF character which R treats as
> end-of-line and then barfs that there are too few elements on that line.
> Any suggestions for how to process this one efficiently in R? [...]