|

## Extracting arithmetic mean for specific values from multiple .txt-files

 Hello, I'm coming straight to the point: I have 65 .txt-Files named "XYZ_1.txt" to "XYZ_65.txt" (each number represents a test subject). I have to open them in Microsoft Excel to see the exact structure. In each of those .txt-files there are reaction time values (in milliseconds) from line 15, column H to line 166, column H for each test subject (and a couple of other data in the other colums of course). My problem is, that I only need the arithmetic mean for all of these reaction times per test subject. --> Again: I have 65 test subjects and according to Excel 152 reaction times  for each test subject / in each .txt-file. Is there an easy way to only extract the arithmetic mean for each test subject in an Excel file column? Thanks for your answers!
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 Since you did not provide an example of the file, I will take a guess at the content and show to to extract the values and take the mean of all of them since you did not say if you want the mean of each file, or a single means.

myData <- do.call(c, lapply(1:65, function(.file){
        x <- read.csv(paste0("XYZ_", .file, ".txt"))
        x[15:166, 'colH']
})))

mean(myData)

On Sun, Jul 8, 2012 at 7:01 PM, vimmster <[hidden email]> wrote:
> Hello,
>
> I'm coming straight to the point:
>
> I have 65 .txt-Files named "XYZ_1.txt" to "XYZ_65.txt" (each number
> represents a test subject).
>
> I have to open them in Microsoft Excel to see the exact structure.
>
> In each of those .txt-files there are reaction time values (in milliseconds)
> from line 15, column H to line 166, column H for each test subject (and a
> couple of other data in the other colums of course).
>
> My problem is, that I only need the arithmetic mean for all of these
> reaction times per test subject.
>
> --> Again: I have 65 test subjects and according to Excel 152 reaction times
> for each test subject / in each .txt-file.
>
> Is there an easy way to only extract the arithmetic mean for each test
> subject in an Excel file column?
>
> Thanks for your answers!

--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 This post was updated on . Dear Mr. Holtman, thank you for your reply. I think I did say: "all of these reaction times per test subject. ", which means that I need a file with the mean of reaction times of each file / of each test subject (because file XYZ_34.txt is identical with subject 34's data). There are 65 x 152 reaction times and I need 65 x mean(152 reaction times per test subject file) = 65 mean reaction times. I have now provided an example for a test subject 34 (please open in Excel): XYZ_34.txtKind regards
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 Hello,

Your data example has dots in the column of interess. If those values are ntegers, this might do it.

fun <- function(x){
        dat <- read.table(x, skip=14)
        H <- as.numeric(gsub("\\.", "", dat[, 8]))
        mean(H)
}

sapply(list.files(pattern="XYZ.*\\.txt"), fun)

Now do what you want with the result, for instance, write.table().

Hope this helps.

Rui Barradas

Em 09-07-2012 12:20, vimmster escreveu:
> Dear Mr. Holtman,
>
> thank you for your reply.
>
> I think I did say which mean I needed: "all of these reaction times per test
> subject. ", which means that I need a file with the mean of reaction times
> of each file / of each test subject (because file XYZ_34.txt is identical
> with subject 34's data).
>
> There are 65 x 152 reaction times and I need 65 x mean(152 reaction times
> per test subject file) = 65 mean reaction times.
>
> I have now provided an example for a test subject 34:
>
> http://r.789695.n4.nabble.com/file/n4635834/XYZ_34.txt XYZ_34.txt
>
> Kind regards
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 Dear Mr. Barradas,

your solution comes very close to what I want.

But I have two questions left:


First question: If "R" computes the mean for the reaction times of test
subject 34 (the example I provided above), it says "310112.0", but if I use
the "mean"-function in Excel it says "345.210". Apart from the dots in the
column of interest (which you mentioned before), the mean is obviously not
the same. Do you have any idea why?

Second question: Why are the dots in the column of interest problematic?

Kind regards
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 I think the real problem is the first data line:

2 1 1 3 27 0 6 1.200.995

Notice the two periods in the value.  The previous solution was getting
rid of all the periods.  If you leave out this value, you get 339.5.
if you change it to 1200.995, you get 345.21, so you data is incorrect.

On Mon, Jul 9, 2012 at 9:54 AM, vimmster <[hidden email]> wrote:
> Dear Mr. Barradas,
>
> your solution comes very close to what I want.
>
> But I have two questions left:
>
>
> First question: If "R" computes the mean for the reaction times of test
> subject 34 (the example I provided above), it says "310112.0", but if I use
> the "mean"-function in Excel it says "345.210". Apart from the dots in the
> column of interest (which you mentioned before), the mean is obviously not
> the same. Do you have any idea why?
>
> Second question: Why are the dots in the column of interest problematic?
>
> Kind regards

--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 Dear Mr. Holtman, but I cannot leave out the value and cannot change the values to 1200.995 manually (for each test subject with a reaction time > 1000 ms), because the first your lead to incomplete data and the latter would be too time-consuming. Dear Rui, here I have three files, which have exactly the same content as "XYZ_34.txt", EXCEPT that the file "XYZ_50.txt" doesn't have a period in the first value 1200.9952 IF YOU OPEN IT WITH THE EDITOR (!), maybe because I didn't change the structure with MS Excel. The other two files should be identical. XYZ_2.txtXYZ_50.txtXYZ_1112.txtR gives me the following output: > fun <- function(x){ +     dat <- read.table(x, skip=14) +     dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) +     mean(dat[, 8]) + } > sapply(list.files(pattern="XYZ.*\\.txt"), fun) XYZ_1112.txt    XYZ_2.txt   XYZ_50.txt     345210.4     345210.4     310112.0 Your second suggestion leads to the same output: > fun <- function(x, skip = 14){ +     dat <- read.table(x, skip=skip) +     dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8])) +     mean(dat[, 8]) + } > sapply(list.files(pattern="XYZ.*\\.txt"), fun) XYZ_1112.txt    XYZ_2.txt   XYZ_50.txt     345210.4     345210.4     310112.0 Thank you for your replies! Kind regards
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 Dear Rui, thank you very much. Your solution works perfectly. One last question: I need to write a function, with ONE value (here: a ratio) for the correct reactions divided per trials or trialCount, respectively, FOR EACH test subject. "/" means "divided by" in the following. I need the ratio correct (reactions)/trial or correct (reactions)/trialCount, respectively (because trial and trialCount are the same WITHIN test SUBJECTS; BUT they differ in length between BETWEEN test SUBJECTS!). It would be very helpful, if I had a data frame in the end in R, with one column for "trialCount"/"trial", one column for "correct reactions"(= 1) AND (more importantly) one column for "correct (= 1) answers / trialCount". legend (just as additional information) for the variable "correct": 1 = correct reaction 2 = false reaction 3 = reaction too slow 4 = reaction too fast 5 = more than one button pressed 6 = no reaction within RT window I would be very thankful for an answer! Sorry for the questions, but I am doing this for the first time! Kind regards
|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

|

## Re: Extracting arithmetic mean for specific values from multiple .txt-files

 Dear Rui, 1) With test subject I mean each file (I have posted three similar files above (2, 50 and 1112), but each test subject has one exact file (which differs of course! --> 2, 50 an 1112 are the same file but I renamed it for the problem described ans solved above). In this file the vpNum is always the same (for each test subject of course; example: for test subject 44 it is always vpNum = 44). The examples above (2, 50 and 1112) are in fact all the second test subject's file (vpNum always "2"). 2) With "trials" or "trialCount" I mean the number of trials (149 in your example, and 151 in the examples 2, 50 and 1112). But the number of trials differs between subjects, because in the examples below (vpNum = 3, 43 and 63) it is 152 (for vpNum = 3), 150 (for vpNum = 43) and 157 (for vpNum = 63). XYZ_3.txtXYZ_43.txtXYZ_63.txt3) I mean the number of correct answers given per test subject (for example for test subject 3 in the previous example (5 rows above) we have 152 trials and 4 trials that are not correct, which means 148 correct trials (with the categorical value "1")). So in R this would be the ratio of: > 148/152  0.9736842 The wanted output should (if possible) look like this (here only for vpNum = 3 !!!): vpNum trial OR trialCount correct (reactions) ratio (which means: number correct / trialCount) 3              152                              148                              0.9736842 The final output should look like this: vpNum trial OR trialCount correct (reactions) ratio (which means: number correct / trialCount) 1              n                                  n                                   x 2              n                                  n                                   x 3              152                              148                               0.9736842 and so on until vpNum = 65 (In my question before I forgot to ask for the column "vpNum", sorry about that!). I hope this makes it more clear! Thanks for your time and help! Kind regards