|
I have 2 large data files that I need to compare and find the differences between data file x and data file y in order to correct data entry error. Theoretically both data files should be identical. I am trying to figure out a way to do this in R. Any help would be great!
______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Here is some ways:
all.equal(readLines(file1), readLines(file2)) You could try compare md5sum of the files: library(tools) identical(md5sum(file1), md5sum(file2)) On Tue, Oct 19, 2010 at 8:23 PM, Nicole Brandt <[hidden email]> wrote: > I have 2 large data files that I need to compare and find the differences > between data file x and data file y in order to correct data entry error. > Theoretically both data files should be identical. I am trying to figure out > a way to do this in R. Any help would be great! > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Nicole Brandt
---------------------------------------- > From: [hidden email] > Date: Tue, 19 Oct 2010 18:23:27 -0400 > To: [hidden email] > Subject: [R] comparing two data files > > I have 2 large data files that I need to compare and find the differences between data file x and data file y in order to correct data entry error. Theoretically both data files should be identical. I am trying to figure ou[[elided Hotmail spam]] I'm not sure why you want to use R for this, there may be very good reasons, but generally I use text processing utilities like "diff" ( see linux or cygwin docs) along with grep,sed, awk, and maybe perl. Generally these are not sophisticated with numbers and just process strings so if your validation and correction relies on R features it may be worthwhile. If you are really just looking for diffs in strings, these others could be a good alternative and possibly worth the learning curve for you if you largest motivation for doing this in R is to learn more R. I guess the next question is, "what do you want to do if they are not equal?" ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
