ETA: Oh gosh, I think I might have posted this in the wrong sub-thread. I'm new to the mailing-list type site, and was a little confused. Apologies! I've (hopefully!) fixed it now.

Hi everyone,

I am very, very new to R, and I'm trying to work out the p-values for

thousands of spearman correlation scores.

Essentially, I have imported a large dataset from a CSV file (366 obs. of

73775 variables) into R Studio. Along the x-axis, I have a series of words,

the y-axis contains dates, and the data is the relative frequencies of each

of the words on that particular date. Essentially, I am trying to see if the

frequency of any/all of the given words increases significantly over the

course of a year.

After some trial and error (and a lot of Googling!), I have a code which

successfully stores the Spearman Correlation values in a matrix:

x <- my_data[1:73775]

y <- my_data[1]

corrs3 <- round(cor(x, y, method = "spearman", use="complete.obs"), 3)

This code stores the words in one column of the matrix and their Spearman

value in the second column However, what I need to do now is to calculate

the corresponding p-values for each of the variables. I have been able to

this for individual variables by running the following code (although I do

get a warning saying "Cannot compute exact p-value with ties", but I've been

told that this isn't a major problem?):

cor.test(1:73775, my_data$romcom, method = "spearman")

However, what I would ideally like to do is store the p-value next to the

Spearman value in the matrix (if that is possible).

The consensus seems to be that Hmisc is the ideal tool for this kind of

thing, so I installed that library, and I've been attempting to run it as

follows

flattenCorrMatrix <- function(cormat, pmat) {

ut <- upper.tri(cormat)

data.frame(

row = rownames(cormat)[row(cormat)[ut]],

column = rownames(cormat)[col(cormat)[ut]],

cor =(cormat)[ut],

p = pmat[ut]

)

}

x <- my_data[1:73775]

y <- my_data[1]

library(Hmisc)

res2<-rcorr(as.matrix(my_data[x,y]))

flattenCorrMatrix(res2$r, res2$P)

However, I get an error message, stating:

"Unsupported index type: tbl_df".

And I'm unsure how to fix this.

I've also tried bypassing Hmisc and using the following:

x <- my_data[1:73775]

y <- my_data[1]

corrs3 <- round(cor.test(x, y, method = "spearman", use="complete.obs"), 3)

But this returns the error message:

Error in cor.test.default(x, y, method = "spearman", use = "complete.obs") :

'x' and 'y' must have the same length

More Googling suggested that the "corr.test" function from the psych library

would be better. However, when I use the following code:

x <- my_data[1:73775]

y <- my_data[1]

library("psych")

corr.test(x, y = NULL, use = "pairwise", method="spearman", ci=TRUE)

I get the following error message:

Error: cannot allocate vector of size 40.6 Gb

I'm really out of options now, and I would really appreciate any

suggestions!

Thanks!

--

Sent from:

http://r.789695.n4.nabble.com/datatable-help-f2315188.html_______________________________________________

datatable-help mailing list

datatable-help@lists.r-forge.r-project.org

https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help