bug: write.dcf converts hyphen in field name to period

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

bug: write.dcf converts hyphen in field name to period

Michael Chirico
write.dcf(list('my-field' = 1L), tmp <- tempfile())

cat(readLines(tmp))
# my.field: 1

However there's nothing wrong with hyphenated fields per the Debian
standard:

https://www.debian.org/doc/debian-policy/ch-controlfields.html

And in fact we see them using hyphenated fields there, and indeed read.dcf
handles this just fine:

writeLines(gsub('.', '-', readLines(tmp), fixed = TRUE), tmp)
read.dcf(tmp)
#      my-field
# [1,] "1"

The guilty line is as.data.frame:

if(!is.data.frame(x)) x <- as.data.frame(x, stringsAsFactors = FALSE)

For my case, simply adding check.names=FALSE to this call would solve the
issue in my case, but I think not in general. Here's what I see in the
standard:

> The field name is composed of US-ASCII characters excluding control
characters, space, and colon (i.e., characters in the ranges U+0021 (!)
through U+0039 (9), and U+003B (;) through U+007E (~), inclusive). Field
names must not begin with the comment character (U+0023 #), nor with the
hyphen character (U+002D -).

This could be handled by an adjustment to the next line:

nmx <- names(x)

becomes

nmx <- gsub('^[#-]', '', gsub('[^\U{0021}-\U{0039}\U{003B}-\U{007E}]', '.',
names(x)))

(Or maybe errors for having invalid names)

Michael Chirico

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel