Hello,
I'm coming straight to the point: I have 65 .txt-Files named "XYZ_1.txt" to "XYZ_65.txt" (each number represents a test subject). I have to open them in Microsoft Excel to see the exact structure. In each of those .txt-files there are reaction time values (in milliseconds) from line 15, column H to line 166, column H for each test subject (and a couple of other data in the other colums of course). My problem is, that I only need the arithmetic mean for all of these reaction times per test subject. --> Again: I have 65 test subjects and according to Excel 152 reaction times for each test subject / in each .txt-file. Is there an easy way to only extract the arithmetic mean for each test subject in an Excel file column? Thanks for your answers! |
Since you did not provide an example of the file, I will take a guess
at the content and show to to extract the values and take the mean of all of them since you did not say if you want the mean of each file, or a single means. myData <- do.call(c, lapply(1:65, function(.file){ x <- read.csv(paste0("XYZ_", .file, ".txt")) x[15:166, 'colH'] }))) mean(myData) On Sun, Jul 8, 2012 at 7:01 PM, vimmster <[hidden email]> wrote: > Hello, > > I'm coming straight to the point: > > I have 65 .txt-Files named "XYZ_1.txt" to "XYZ_65.txt" (each number > represents a test subject). > > I have to open them in Microsoft Excel to see the exact structure. > > In each of those .txt-files there are reaction time values (in milliseconds) > from line 15, column H to line 166, column H for each test subject (and a > couple of other data in the other colums of course). > > My problem is, that I only need the arithmetic mean for all of these > reaction times per test subject. > > --> Again: I have 65 test subjects and according to Excel 152 reaction times > for each test subject / in each .txt-file. > > Is there an easy way to only extract the arithmetic mean for each test > subject in an Excel file column? > > Thanks for your answers! > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
This post was updated on .
Dear Mr. Holtman,
thank you for your reply. I think I did say: "all of these reaction times per test subject. ", which means that I need a file with the mean of reaction times of each file / of each test subject (because file XYZ_34.txt is identical with subject 34's data). There are 65 x 152 reaction times and I need 65 x mean(152 reaction times per test subject file) = 65 mean reaction times. I have now provided an example for a test subject 34 (please open in Excel): XYZ_34.txt Kind regards |
Hello,
Your data example has dots in the column of interess. If those values are ntegers, this might do it. fun <- function(x){ dat <- read.table(x, skip=14) H <- as.numeric(gsub("\\.", "", dat[, 8])) mean(H) } sapply(list.files(pattern="XYZ.*\\.txt"), fun) Now do what you want with the result, for instance, write.table(). Hope this helps. Rui Barradas Em 09-07-2012 12:20, vimmster escreveu: > Dear Mr. Holtman, > > thank you for your reply. > > I think I did say which mean I needed: "all of these reaction times per test > subject. ", which means that I need a file with the mean of reaction times > of each file / of each test subject (because file XYZ_34.txt is identical > with subject 34's data). > > There are 65 x 152 reaction times and I need 65 x mean(152 reaction times > per test subject file) = 65 mean reaction times. > > I have now provided an example for a test subject 34: > > http://r.789695.n4.nabble.com/file/n4635834/XYZ_34.txt XYZ_34.txt > > Kind regards > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4635834.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Mr. Barradas,
your solution comes very close to what I want. But I have two questions left: First question: If "R" computes the mean for the reaction times of test subject 34 (the example I provided above), it says "310112.0", but if I use the "mean"-function in Excel it says "345.210". Apart from the dots in the column of interest (which you mentioned before), the mean is obviously not the same. Do you have any idea why? Second question: Why are the dots in the column of interest problematic? Kind regards -- View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4635854.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
I think the real problem is the first data line:
2 1 1 3 27 0 6 1.200.995 Notice the two periods in the value. The previous solution was getting rid of all the periods. If you leave out this value, you get 339.5. if you change it to 1200.995, you get 345.21, so you data is incorrect. On Mon, Jul 9, 2012 at 9:54 AM, vimmster <[hidden email]> wrote: > Dear Mr. Barradas, > > your solution comes very close to what I want. > > But I have two questions left: > > > First question: If "R" computes the mean for the reaction times of test > subject 34 (the example I provided above), it says "310112.0", but if I use > the "mean"-function in Excel it says "345.210". Apart from the dots in the > column of interest (which you mentioned before), the mean is obviously not > the same. Do you have any idea why? > > Second question: Why are the dots in the column of interest problematic? > > Kind regards > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4635854.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by SMartin
Hello,
There must be a difference in the file you are processing and in the one excel and I are: > fun <- function(x){ + dat <- read.table(x, skip=14) + dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) + mean(dat[, 8]) + } > > sapply(list.files(pattern="XYZ.*\\.txt"), fun) XYZ_34.txt 345210.4 This result is even better, more accurate than excel's. AS for the second question, because with the dots, those values are read by R as character and when put into the data.frame converted to factors, the name R gives to categorical variables. You can see this with the instruction, right after the read.table, print(str(dat)) > str(dat) 'data.frame': 151 obs. of 8 variables: $ V1: int 2 2 2 2 2 2 2 2 2 2 ... $ V2: int 1 2 3 4 5 6 7 8 9 10 ... $ V3: int 1 2 3 4 5 6 7 8 9 10 ... $ V4: int 3 2 4 3 3 1 3 1 3 2 ... $ V5: int 27 16 16 27 27 27 27 27 27 16 ... $ V6: int 0 16 16 16 27 27 27 27 16 16 ... $ V7: int 6 1 1 2 1 1 1 1 2 1 ... $ V8: Factor w/ 151 levels "1.200.995","247.102",..: 1 139 135 39 133 73 142 63 77 67 ... It's V8 the column we want. The real values are 1 139 135 39 etc. The levels are categories labels, the categories themselves are the 1-based integer values. Anyway, what's important is that the code is working, and if there's an error maybe it can be solved with this modification: fun <- function(x, skip = 14){ dat <- read.table(x, skip=skip) And the rest is the same. Inspect the file and see if the data starts at line 15. (And please, Rui is enough, NO 'Mr.') Hope this helps, Rui Barradas Em 09-07-2012 14:54, vimmster escreveu: > Dear Mr. Barradas, > > your solution comes very close to what I want. > > But I have two questions left: > > > First question: If "R" computes the mean for the reaction times of test > subject 34 (the example I provided above), it says "310112.0", but if I use > the "mean"-function in Excel it says "345.210". Apart from the dots in themaybe > column of interest (which you mentioned before), the mean is obviously not > the same. Do you have any idea why? > > Second question: Why are the dots in the column of interest problematic? > > Kind regards > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4635854.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Mr. Holtman,
but I cannot leave out the value and cannot change the values to 1200.995 manually (for each test subject with a reaction time > 1000 ms), because the first your lead to incomplete data and the latter would be too time-consuming. Dear Rui, here I have three files, which have exactly the same content as "XYZ_34.txt", EXCEPT that the file "XYZ_50.txt" doesn't have a period in the first value 1200.9952 IF YOU OPEN IT WITH THE EDITOR (!), maybe because I didn't change the structure with MS Excel. The other two files should be identical. XYZ_2.txt XYZ_50.txt XYZ_1112.txt R gives me the following output: > fun <- function(x){ + dat <- read.table(x, skip=14) + dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) + mean(dat[, 8]) + } > sapply(list.files(pattern="XYZ.*\\.txt"), fun) XYZ_1112.txt XYZ_2.txt XYZ_50.txt 345210.4 345210.4 310112.0 Your second suggestion leads to the same output: > fun <- function(x, skip = 14){ + dat <- read.table(x, skip=skip) + dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) + mean(dat[, 8]) + } > sapply(list.files(pattern="XYZ.*\\.txt"), fun) XYZ_1112.txt XYZ_2.txt XYZ_50.txt 345210.4 345210.4 310112.0 Thank you for your replies! Kind regards |
Hello,
Ok, I think that there were two problems. One, gsub substitutes all (g - global) occurrences of the search pattern, so both periods were removed. The other, it would allways consider column 8 as character, but when there are no values with two periods it's read in with class numeric. Both are now corrected. fun <- function(x, skip = 14){ dat <- read.table(x, skip=skip, stringsAsFactors = FALSE) if(is.character(dat[, 8])){ len <- sapply(strsplit(dat[, 8], "\\."), length) dat[len == 3 , 8] <- sub("\\.", "", dat[len == 3 , 8]) dat[, 8] <- as.numeric(dat[, 8]) } mean(dat[, 8]) } sapply(list.files(pattern="XYZ.*\\.txt"), fun) Rui Barradas Em 10-07-2012 09:35, vimmster escreveu: > Dear Mr. Holtman, > > but I cannot leave out the value and cannot change the values to 1200.995 > manually (for each test subject with a reaction time > 1000 ms), because the > first your lead to incomplete data and the latter would be too > time-consuming. > > Dear Rui, > > here I have three files, which have exactly the same content as > "XYZ_34.txt", EXCEPT that the file "XYZ_50.txt" doesn't have a period in the > first value 1200.9952 IF YOU OPEN IT WITH THE EDITOR (!), maybe because I > didn't change the structure with MS Excel. The other two files should be > identical. > > http://r.789695.n4.nabble.com/file/n4635962/XYZ_2.txt XYZ_2.txt > http://r.789695.n4.nabble.com/file/n4635962/XYZ_50.txt XYZ_50.txt > http://r.789695.n4.nabble.com/file/n4635962/XYZ_1112.txt XYZ_1112.txt > > R gives me the following output: > >> fun <- function(x){ > + dat <- read.table(x, skip=14) > + dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) > + mean(dat[, 8]) > + } > >> sapply(list.files(pattern="XYZ.*\\.txt"), fun) > XYZ_1112.txt XYZ_2.txt XYZ_50.txt > 345210.4 345210.4 310112.0 > > Your second suggestion leads to the same output: > >> fun <- function(x, skip = 14){ > + dat <- read.table(x, skip=skip) > + dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) > + mean(dat[, 8]) > + } > >> sapply(list.files(pattern="XYZ.*\\.txt"), fun) > XYZ_1112.txt XYZ_2.txt XYZ_50.txt > 345210.4 345210.4 310112.0 > > Thank you for your replies! > > Kind regards > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4635962.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Rui,
thank you very much. Your solution works perfectly. One last question: I need to write a function, with ONE value (here: a ratio) for the correct reactions divided per trials or trialCount, respectively, FOR EACH test subject. "/" means "divided by" in the following. I need the ratio correct (reactions)/trial or correct (reactions)/trialCount, respectively (because trial and trialCount are the same WITHIN test SUBJECTS; BUT they differ in length between BETWEEN test SUBJECTS!). It would be very helpful, if I had a data frame in the end in R, with one column for "trialCount"/"trial", one column for "correct reactions"(= 1) AND (more importantly) one column for "correct (= 1) answers / trialCount". legend (just as additional information) for the variable "correct": 1 = correct reaction 2 = false reaction 3 = reaction too slow 4 = reaction too fast 5 = more than one button pressed 6 = no reaction within RT window I would be very thankful for an answer! Sorry for the questions, but I am doing this for the first time! Kind regards |
Hello,
I'm glad it help. As for this second question, you should explain yourself better. 1. What is a test subject, which column records its id? vpNum? 2. You say "divided per trials or trialCount". Does this mean per trial number (example: divide by 1, by 2, by 3, etc, by 149) or per number of trials (149 in the previous example) 3. 'correct' now seems to be categorical. Divide WHAT by trial or trialCount? Hint: post a small data example with three or four subjects and the wanted output. Rui Barradas Em 10-07-2012 18:06, vimmster escreveu: > Dear Rui, > > thank you very much. > > Your solution works perfectly. > > One last question: > > I need to write a function, with ONE value (here: a ratio) for the correct > reactions divided per trials or trialCount, respectively, FOR EACH test > subject. > > "/" means "divided by" in the following. > > I need the ratio correct (reactions)/trial or correct > (reactions)/trialCount, respectively (because trial and trialCount are the > same WITHIN test SUBJECTS; BUT they differ in length between BETWEEN test > SUBJECTS!). > > It would be very helpful, if I had a data frame in the end in R, with one > column for > "trialCount"/"trial", one column for "correct reactions"(= 1) AND (more > importantly) one column for "correct (= 1) answers / trialCount". > > legend (just as additional information) for the variable "correct": > 1 = correct reaction > 2 = false reaction > 3 = reaction too slow > 4 = reaction too fast > 5 = more than one button pressed > 6 = no reaction within RT window > > I would be very thankful for an answer! > > Sorry for the questions, but I am doing this for the first time! > > Kind regards > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4636020.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Rui,
1) With test subject I mean each file (I have posted three similar files above (2, 50 and 1112), but each test subject has one exact file (which differs of course! --> 2, 50 an 1112 are the same file but I renamed it for the problem described ans solved above). In this file the vpNum is always the same (for each test subject of course; example: for test subject 44 it is always vpNum = 44). The examples above (2, 50 and 1112) are in fact all the second test subject's file (vpNum always "2"). 2) With "trials" or "trialCount" I mean the number of trials (149 in your example, and 151 in the examples 2, 50 and 1112). But the number of trials differs between subjects, because in the examples below (vpNum = 3, 43 and 63) it is 152 (for vpNum = 3), 150 (for vpNum = 43) and 157 (for vpNum = 63). XYZ_3.txt XYZ_43.txt XYZ_63.txt 3) I mean the number of correct answers given per test subject (for example for test subject 3 in the previous example (5 rows above) we have 152 trials and 4 trials that are not correct, which means 148 correct trials (with the categorical value "1")). So in R this would be the ratio of: > 148/152 [1] 0.9736842 The wanted output should (if possible) look like this (here only for vpNum = 3 !!!): vpNum trial OR trialCount correct (reactions) ratio (which means: number correct / trialCount) 3 152 148 0.9736842 The final output should look like this: vpNum trial OR trialCount correct (reactions) ratio (which means: number correct / trialCount) 1 n n x 2 n n x 3 152 148 0.9736842 and so on until vpNum = 65 (In my question before I forgot to ask for the column "vpNum", sorry about that!). I hope this makes it more clear! Thanks for your time and help! Kind regards |
Hello,
Try make.row <- function(x, skip = 14, column){ dat <- read.table(x, skip = skip - 1, header = TRUE, stringsAsFactors = FALSE) vpNum <- dat$vpNum[1] trial <- length(dat[[ column ]]) correct <- sum(dat$correct == 1) result <- c(vpNum, trial, correct, correct/trial) names(result) <- c("vpNum", column, "correct", "ratio") result } files <- list.files(pattern = "^XYZ_.*.txt") ratios <- t(sapply(files, make.row, column = "trial")) ratios <- data.frame(ratios, row.names = seq_len(nrow(ratios))) ratios I think it's what you want. Rui Barradas Em 11-07-2012 06:18, vimmster escreveu: > Dear Rui, > > 1) With test subject I mean each file (I have posted three similar files > above (2, 50 and 1112), but each test subject has one exact file (which > differs of course! --> 2, 50 an 1112 are the same file but I renamed it for > the problem described ans solved above). In this file the vpNum is always > the same (for each test subject of course; example: for test subject 44 it > is always vpNum = 44). The examples above (2, 50 and 1112) are in fact all > the second test subject's file (vpNum always "2"). > > 2) With "trials" or "trialCount" I mean the number of trials (149 in your > example, and 151 in the examples 2, 50 and 1112). But the number of trials > differs between subjects, because in the examples below (vpNum = 3, 43 and > 63) it is 152 (for vpNum = 3), 150 (for vpNum = 43) and 157 (for vpNum = > 63). > > http://r.789695.n4.nabble.com/file/n4636106/XYZ_3.txt XYZ_3.txt > http://r.789695.n4.nabble.com/file/n4636106/XYZ_43.txt XYZ_43.txt > http://r.789695.n4.nabble.com/file/n4636106/XYZ_63.txt XYZ_63.txt > > 3) I mean the number of correct answers given per test subject (for example > for test subject 3 in the previous example (5 rows above) we have 152 trials > and 4 trials that are not correct, which means 148 correct trials (with the > categorical value "1")). So in R this would be the ratio of: >> 148/152 > [1] 0.9736842 > > The wanted output should (if possible) look like this (here only for vpNum = > 3 !!!): > vpNum trial OR trialCount correct (reactions) ratio (which means: number > correct / trialCount) > 3 152 148 > 0.9736842 > > The final output should look like this: > vpNum trial OR trialCount correct (reactions) ratio (which means: number > correct / trialCount) > 1 n n > x > 2 n n > x > 3 152 148 > 0.9736842 > and so on until vpNum = 65 (In my question before I forgot to ask for the > column "vpNum", sorry about that!). > > I hope this makes it more clear! > > Thanks for your time and help! > > Kind regards > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4636106.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Mr. Holtman and especially dear Rui,
thank you VERY much. You helped me a lot! I've just added the following: rsort <- ratios[order(ratios$vpNum),] Now the test subjects are arranged according to their vpNum. Thanks a lot again! |
Free forum by Nabble | Edit this page |