Correcting dates in research / medical record using R

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Correcting dates in research / medical record using R

Nic-2
Hi,

I'm not that well versed with R - I'm trying to correct the dates of
service in a de-identified research medical record of several subjects. 
The correct dates come from a csv file, in the VisitDate column,  that
looks like this in Excel.  The empty cells have other data in them that
I don't need and the  file name is DateR.csv:


Id1 Id2
       
       
       
       
        VisitDate
12345 12345
       
       
       
       
        4/3/2018


The research medical record is a text file and the "DATE OF SERVICE" in
the top matter is in error in all of the subjects and needs to be
replaced with the "VisitDate" in the csv file.  The file name for the
medical records is test3.NEW.  Here is a screen grab of the top matter
of the research medical record; below this data excerpt is other
gathered data for that subject:


================================================================================

PATIENT NAME: CONFIDENTIAL,#12345
PATIENT ID #: 12345
DATE OF SERVICE: 04/10/2018
ACCESSION NUMBER: RR1234567

TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES


As described above, I need to update the text file DATE OF SERVICE: 
date with the VisitDate in the csv file.

I made several attempts at this to failure and so now I turn to you. 
Here is the code that exhibits my attempts:


clinicVdate <- read.csv("DateR.csv")

rownames(clinicVdate) <- as.character(clinicVdate[,'Id2'])

Id2 <- NA

input_data <- readLines("D:/test/test3.NEW")
output_data <- c()

for(input_line in input_data){
   output_line = input_line
   if(length(grep('PATIENT ID #:', input_line))>0)  {
     Id2 = as.character(strsplit(input_line, ':')[[1]][2])
   }

   if (length(grep( 'DATE OF SERVICE: ', input_line))){

     output_line = paste('DATE OF SERVICE', clinicVdate[Id2,
'VisitDate'], sep=':')

   }
   output_data = paste(output_data, output_line, sep='\n')
}

cat(output_data)


The results of the above remove the erroneous date and replace it with
NA.  Here is an example of the results:


================================================================================

PATIENT NAME: CONFIDENTIAL,#12345
PATIENT ID #: 12345
DATE OF SERVICE: NA
ACCESSION NUMBER: RR1234567

TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES


Where am I going wrong?  If I didn't pose my question appropriately,
please let me know too!!  Any help with this would be greatly appreciated!!

Kind regards,

Nic Cecchino




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correcting dates in research / medical record using R

PIKAL Petr
Hi

First of all you should not use HTML formated posts, it is big chance that it gets scrambled.

You should compare your ld2 after for cycle and result of

clinicVdate[Id2, 'VisitDate'], sep=':')

Most probably ld2 after for cycle does not conform to row names of clinicVdate.

Cheers
Petr


> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Nicola Cecchino
> Sent: Wednesday, September 12, 2018 3:50 AM
> To: [hidden email]
> Subject: [R] Correcting dates in research / medical record using R
>
> Hi,
>
> I'm not that well versed with R - I'm trying to correct the dates of
> service in a de-identified research medical record of several subjects.
> The correct dates come from a csv file, in the VisitDate column,  that
> looks like this in Excel.  The empty cells have other data in them that
> I don't need and the  file name is DateR.csv:
>
>
> Id1 Id2
>
>
>
>
> VisitDate
> 12345 12345
>
>
>
>
> 4/3/2018
>
>
> The research medical record is a text file and the "DATE OF SERVICE" in
> the top matter is in error in all of the subjects and needs to be
> replaced with the "VisitDate" in the csv file.  The file name for the
> medical records is test3.NEW.  Here is a screen grab of the top matter
> of the research medical record; below this data excerpt is other
> gathered data for that subject:
>
>
> ===================================================================
> =============
>
> PATIENT NAME: CONFIDENTIAL,#12345
> PATIENT ID #: 12345
> DATE OF SERVICE: 04/10/2018
> ACCESSION NUMBER: RR1234567
>
> TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES
>
>
> As described above, I need to update the text file DATE OF SERVICE:
> date with the VisitDate in the csv file.
>
> I made several attempts at this to failure and so now I turn to you.
> Here is the code that exhibits my attempts:
>
>
> clinicVdate <- read.csv("DateR.csv")
>
> rownames(clinicVdate) <- as.character(clinicVdate[,'Id2'])
>
> Id2 <- NA
>
> input_data <- readLines("D:/test/test3.NEW")
> output_data <- c()
>
> for(input_line in input_data){
>    output_line = input_line
>    if(length(grep('PATIENT ID #:', input_line))>0)  {
>      Id2 = as.character(strsplit(input_line, ':')[[1]][2])
>    }
>
>    if (length(grep( 'DATE OF SERVICE: ', input_line))){
>
>      output_line = paste('DATE OF SERVICE', clinicVdate[Id2,
> 'VisitDate'], sep=':')
>
>    }
>    output_data = paste(output_data, output_line, sep='\n')
> }
>
> cat(output_data)
>
>
> The results of the above remove the erroneous date and replace it with
> NA.  Here is an example of the results:
>
>
> ===================================================================
> =============
>
> PATIENT NAME: CONFIDENTIAL,#12345
> PATIENT ID #: 12345
> DATE OF SERVICE: NA
> ACCESSION NUMBER: RR1234567
>
> TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES
>
>
> Where am I going wrong?  If I didn't pose my question appropriately,
> please let me know too!!  Any help with this would be greatly appreciated!!
>
> Kind regards,
>
> Nic Cecchino
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correcting dates in research / medical record using R

PIKAL Petr
Hi

You should send your responses to R helplist, others could offer better/different solutions.

I myself am not an expert for regex so if all your files are formated in the same way I would use strsplit.

# I read header to test object
test<-readLines("clipboard")
str(test)
 chr [1:4] "PATIENT NAME: CONFIDENTIAL,#12345" "PATIENT ID #: 12345" ...

# here is something similar to your csv file
test2<-read.table("clipboard")
test2
    Id1   Id2 VisitDate
1 12345 12345  4/3/2018
2 11111 11111  5/4/2018

# here I split second line of patient record, select 4th item and compare with Id2 value from csv file.

sel<-which(test2$Id2 == as.numeric(unlist(strsplit(test[2], " "))[4]))

# I take third line of patient record and split it
out<-unlist(strsplit(test[3], split=" "))

# and change 4th item with selected value from csv VisitDate
out[4] <- as.character(test2$VisitDate[sel])

# here you should be aware of difference between factors and characters
# and finally make collapsed line, which could be used to change third line in patient record
paste(out, collapse=" ")
[1] "DATE OF SERVICE: 4/3/2018"

But what you want to do with it? It actually manipulates objects in your R session and not original files. I believe that there are other tools more suitable for such tasks.

Cheers
Petr

> -----Original Message-----
> From: Nicola Cecchino <[hidden email]>
> Sent: Thursday, September 13, 2018 5:04 AM
> To: PIKAL Petr <[hidden email]>
> Subject: Re: [R] Correcting dates in research / medical record using R
>
> Hi Petr,
>
> Thank you for your help but I'm not sure what that code is supposed to do?  I'm
> really new to regular expressions and am having difficulties with this whole
> thing.
>
> Nic
>
>
>
>
> On 9/12/2018 2:26 AM, PIKAL Petr wrote:
> > Hi
> >
> > First of all you should not use HTML formated posts, it is big chance that it
> gets scrambled.
> >
> > You should compare your ld2 after for cycle and result of
> >
> > clinicVdate[Id2, 'VisitDate'], sep=':')
> >
> > Most probably ld2 after for cycle does not conform to row names of
> clinicVdate.
> >
> > Cheers
> > Petr
> >
> >
> >> -----Original Message-----
> >> From: R-help <[hidden email]> On Behalf Of Nicola
> >> Cecchino
> >> Sent: Wednesday, September 12, 2018 3:50 AM
> >> To: [hidden email]
> >> Subject: [R] Correcting dates in research / medical record using R
> >>
> >> Hi,
> >>
> >> I'm not that well versed with R - I'm trying to correct the dates of
> >> service in a de-identified research medical record of several subjects.
> >> The correct dates come from a csv file, in the VisitDate column,
> >> that looks like this in Excel.  The empty cells have other data in
> >> them that I don't need and the  file name is DateR.csv:
> >>
> >>
> >> Id1 Id2
> >>
> >>
> >>
> >>
> >> VisitDate
> >> 12345 12345
> >>
> >>
> >>
> >>
> >> 4/3/2018
> >>
> >>
> >> The research medical record is a text file and the "DATE OF SERVICE"
> >> in the top matter is in error in all of the subjects and needs to be
> >> replaced with the "VisitDate" in the csv file.  The file name for the
> >> medical records is test3.NEW.  Here is a screen grab of the top
> >> matter of the research medical record; below this data excerpt is
> >> other gathered data for that subject:
> >>
> >>
> >>
> ===================================================================
> >> =============
> >>
> >> PATIENT NAME: CONFIDENTIAL,#12345
> >> PATIENT ID #: 12345
> >> DATE OF SERVICE: 04/10/2018
> >> ACCESSION NUMBER: RR1234567
> >>
> >> TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES
> >>
> >>
> >> As described above, I need to update the text file DATE OF SERVICE:
> >> date with the VisitDate in the csv file.
> >>
> >> I made several attempts at this to failure and so now I turn to you.
> >> Here is the code that exhibits my attempts:
> >>
> >>
> >> clinicVdate <- read.csv("DateR.csv")
> >>
> >> rownames(clinicVdate) <- as.character(clinicVdate[,'Id2'])
> >>
> >> Id2 <- NA
> >>
> >> input_data <- readLines("D:/test/test3.NEW") output_data <- c()
> >>
> >> for(input_line in input_data){
> >>     output_line = input_line
> >>     if(length(grep('PATIENT ID #:', input_line))>0)  {
> >>       Id2 = as.character(strsplit(input_line, ':')[[1]][2])
> >>     }
> >>
> >>     if (length(grep( 'DATE OF SERVICE: ', input_line))){
> >>
> >>       output_line = paste('DATE OF SERVICE', clinicVdate[Id2,
> >> 'VisitDate'], sep=':')
> >>
> >>     }
> >>     output_data = paste(output_data, output_line, sep='\n') }
> >>
> >> cat(output_data)
> >>
> >>
> >> The results of the above remove the erroneous date and replace it
> >> with NA.  Here is an example of the results:
> >>
> >>
> >>
> ===================================================================
> >> =============
> >>
> >> PATIENT NAME: CONFIDENTIAL,#12345
> >> PATIENT ID #: 12345
> >> DATE OF SERVICE: NA
> >> ACCESSION NUMBER: RR1234567
> >>
> >> TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES
> >>
> >>
> >> Where am I going wrong?  If I didn't pose my question appropriately,
> >> please let me know too!!  Any help with this would be greatly appreciated!!
> >>
> >> Kind regards,
> >>
> >> Nic Cecchino
> >>
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > Osobní údaje: Informace o zpracování a ochraně osobních údajů
> > obchodních partnerů PRECHEZA a.s. jsou zveřejněny na:
> > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> > about processing and protection of business partner’s personal data
> > are available on website:
> > https://www.precheza.cz/en/personal-data-protection-principles/
> > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> > documents attached to it may be confidential and are subject to the
> > legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.