Correcting dates in research records using R - 2 attempt

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Correcting dates in research records using R - 2 attempt

Nic-2
Hi,

I apologize - sent the first time around accidentally as HTML and not
text.  Thanks to the responder for pointing this out and providing
some feed back too.

I'm not that well versed with R - I'm trying to correct the dates of
service in a de-identified research medical record of several
subjects.  The correct dates come from a csv file, in the VisitDate
column,  that looks like this in Excel.  The empty cells have other
data in them that I don't need and the  file name is DateR.csv:


Id1       Id2                         VisitDate
12345 12345                       4/3/2018


The research medical record is a text file and the "DATE OF SERVICE"
in the top matter is in error in all of the subjects and needs to be
replaced with the "VisitDate" in the csv file.  The file name for the
medical records is test3.NEW.  Here is a screen grab of the top matter
of the research medical record; below this data excerpt is other
gathered data for that subject:


================================================================================

PATIENT NAME: CONFIDENTIAL,#12345
PATIENT ID #: 12345
DATE OF SERVICE: 04/10/2018
ACCESSION NUMBER: RR1234567

TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS       NORMAL VALUES


As described above, I need to update the text file DATE OF SERVICE:
date with the VisitDate in the csv file.

I made several attempts at this to failure and so now I turn to you.
Here is the code that exhibits my attempts:


clinicVdate <- read.csv("DateR.csv")

rownames(clinicVdate) <- as.character(clinicVdate[,'Id2'])

Id2 <- NA

input_data <- readLines("D:/test/test3.NEW")
output_data <- c()

for(input_line in input_data){
  output_line = input_line
  if(length(grep('PATIENT ID #:', input_line))>0)  {
    Id2 = as.character(strsplit(input_line, ':')[[1]][2])
  }

  if (length(grep( 'DATE OF SERVICE: ', input_line))){

    output_line = paste('DATE OF SERVICE', clinicVdate[Id2,
'VisitDate'], sep=':')

  }
  output_data = paste(output_data, output_line, sep='\n')
}

cat(output_data)


The results of the above remove the erroneous date and replace it with
NA.  Here is an example of the results:


================================================================================

PATIENT NAME: CONFIDENTIAL,#12345
PATIENT ID #: 12345
DATE OF SERVICE: NA
ACCESSION NUMBER: RR1234567

TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS       NORMAL VALUES


Where am I going wrong?  If I didn't pose my question appropriately,
please let me know too!!  Any help with this would be greatly
appreciated!!

Kind regards,

Nic Cecchino

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.