Hello, I have a .txt file with many clinical exams reports (two examples of which are attached to the message).
I have to create a data frame with as many rows as the number of clinical exams reports in the text file and 24 columns:
the first (to be labelled as "ID") with a number (representing an identification code) which is the number in the 13th line of the clinical report following the string "Acc.ne n. "
the second (to be labelled as "DATE") with a date (indicating date of blood sampling), which is the date, again in the 13th line, following the identification code
the following 22 columns (to be labelled with the name of parameters at lines from 20 to 41, as "GLICEMIA" ... "COLESTEROLO LDL")
I did search in the mailing list and tried to begin something like:
#read the text file
reports <- readLines("ClinicalReports.txt")
#processing the file starting at each "Acc.ne n. "
serologic <- lapply(which(grepl("^Acc.ne n.", reports)), function(.line )....
but I'm a biostatistician whith almost no expertise in programming and I really need your hepl! Please!!!
I forgot to specify that for each raw, containing records of the clinical reports , the values of the 22 parameter measurement have to be reported. For example, first raw, first 5 columns:
ID DATE GLICEMIA AZOTEMIA CREATININEMIA SODIEMIA ... ... ...
0000185 05/12/2011 115 33.6 0.99 136 ... ... ...