read.epiinfo() returns wrong data when reading epiinfo files with \032 at the end

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

read.epiinfo() returns wrong data when reading epiinfo files with \032 at the end

Artur Neumann
Sorry to send this report by email, but I cannot see a way how to create
a login on https://bugs.r-project.org

Problem-Description:
I'm using the foreign package to read EPIINFO 6 files. (.REC)
All my .REC files end with a single character after the last line break.
Octal: 032 / Hex: 1A
The output data frame of read.epiinfo() has an extra line that has the
content of the first line but with shifted data and \032 added at the
beginning

Expected result:
the last line (content: " \032") would be ignored

How to reproduce:

Read an epiinfo 6 file with Octal: 032 / Hex: 1A as the only character
in the last line

--------------------------------------------------
earData2<-read.epiinfo("EAR35.REC")
Warnmeldungen:
1: In read.epiinfo("EAR35.REC") : wrong number of records
2: In matrix(datalines, nrow = multiline) :
  Datenlänge [3226] ist kein Teiler oder Vielfaches der Anzahl der
Zeilen [3]
--------------------------------------------------

Now the first data line is added at the end again and the data is partly
shifted

------------------------------------------------------------------------------------
earData2[1,]
             FIRSTNAME         SURNAME AGEYEARS MTHS SEX  VDC SYRINGED
1 OM LAL               SUBEDI                41   NA   M BUR     FALSE
  AUDIO FIRSTEARMA OTHEREARMA SNHLDETAIL CHLOTOSCLE DUMBYN CSOMDETAIL
1  TRUE        CSO        CSO       <NA>         NA     NA         TT
  OTHERDIAGN OPERATIONY GROMMETS HEARINGAID MYRINGTYMP EARDROPS
1       <NA>       TRUE     <NA>       <NA>          2       NA
  MASTOIDECT ORALANTIBI STAPEDECTO OTHERTR OTHEROP TREATMENTD
1       <NA>         NA       <NA>      NA    <NA>       <NA>

earData2[nrow(earData2),]
                   FIRSTNAME         SURNAME AGEYEARS MTHS  SEX  VDC
1076 \032OM LAL               SUBEDI                4    1 <NA> MBUR
     SYRINGED AUDIO FIRSTEARMA OTHEREARMA SNHLDETAIL CHLOTOSCLE DUMBYN
1076       NA FALSE        YCS        OCS        O           NA     NA
     CSOMDETAIL
1076          T
                                                            OTHERDIAGN
1076 T
     OPERATIONY GROMMETS HEARINGAID MYRINGTYMP EARDROPS MASTOIDECT
1076         NA        Y       <NA>       <NA>       NA       <NA>
     ORALANTIBI STAPEDECTO OTHERTR OTHEROP TREATMENTD
1076         NA       <NA>      NA    <NA>          !
------------------------------------------------------------------------------------


Debugging:
The problem is in
https://github.com/cran/foreign/blob/master/R/read.epiinfo.R#L74

after row 66 datalines look like that:

.........
[2737] "GOBINDA             RAJ            34  MTIJ NNO.EO.E
             !"
[2738] "                                              N   Y
             !"
[2739] " BETNESOL                 !"

[2740] "AMAR                RAJ            40  MKAL NNMYRNOR
             !"
[2741] "                                              N   Y
             !"
[2742] " GENT HC                  !"

[2743] "\032"

Warning is shown in line 73


after line 74:

datalines[,1]
[1] "HARI SUNDAR         SHRESTHA       37  MDIP NNNORNOR
EPISTAXIS          !"
[2] "                                              N
          !"
[3] " NEOSPORIN                !"

and

datalines[,915]
[1] "\032"

[2] "HARI SUNDAR         SHRESTHA       37  MDIP NNNORNOR
EPISTAXIS          !"
[3] "

this does result in a wrong result.
I propose to ignore a last line that only contains "\032"
Maybe something like this in line 67:

if(identical(tail(datalines, n=1),c("\032"))) {
length(datalines)<-(length(datalines)-1) }

------------------------------------------------------------------
Package: foreign
 Version: 0.8-67
 Maintainer: R Core Team <[hidden email]>
 Built: R 3.3.1; x86_64-pc-linux-gnu; 2016-09-26 12:57:55 UTC; unix

R Version:
 platform = x86_64-pc-linux-gnu
 arch = x86_64
 os = linux-gnu
 system = x86_64, linux-gnu
 status =
 major = 3
 minor = 3.1
 year = 2016
 month = 06
 day = 21
 svn rev = 70800
 language = R
 version.string = R version 3.3.1 (2016-06-21)
 nickname = Bug in Your Hair

Locale:
 LC_CTYPE=de_DE.utf8;LC_NUMERIC=C;LC_TIME=de_DE.utf8;LC_COLLATE=de_DE.utf8;LC_MONETARY=de_DE.utf8;LC_MESSAGES=de_DE.utf8;LC_PAPER=de_DE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.utf8;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices,
 package:utils, package:datasets, package:methods, Autoloads,
 package:base


This package has a bug submission web page, which we will now attempt
to open.  The information above may be useful in your report. If the web
page doesn't work, you should send email to the maintainer,
R Core Team <[hidden email]>.


Mit freundlichen Grüßen
Artur Neumann

--
www.individual-it-services.de
EDV Lösungen, die auf Ihre Wünsche und Anforderungen angepasst sind.
Blog: http://individualit.wordpress.com/
Aktuelle Infos: http://twitter.com/INDIVIDUALIT

Bankverbindung:
KtoNr:    46201786
BLZ:     47650130
Sparkasse Detmold

Steuernummer:   313/5277/1775

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.