Problem(s) finding p-values for numerous spearman correlations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Problem(s) finding p-values for numerous spearman correlations

Izzy_M
This post was updated on .
ETA: Oh gosh, I think I might have posted this in the wrong sub-thread. I'm new to the mailing-list type site, and was a little confused. Apologies! I've (hopefully!) fixed it now.

Hi everyone,                                

                                                                                                     
I am very, very new to R, and I'm trying to work out the p-values for
thousands of spearman correlation scores.

Essentially, I have imported a large dataset from a CSV file (366 obs. of
73775 variables) into R Studio. Along the x-axis, I have a series of words,
the y-axis contains dates, and the data is the relative frequencies of each
of the words on that particular date. Essentially, I am trying to see if the
frequency of any/all of the given words increases significantly over the
course of a year.

After some trial and error (and a lot of Googling!), I have a code which
successfully stores the Spearman Correlation values in a matrix:

x <- my_data[1:73775]
y <- my_data[1]
corrs3 <- round(cor(x, y, method = "spearman", use="complete.obs"), 3)

This code stores the words in one column of the matrix and their Spearman
value in the second column However, what I need to do now is to calculate
the corresponding p-values for each of the variables. I have been able to
this for individual variables by running the following code (although I do
get a warning saying "Cannot compute exact p-value with ties", but I've been
told that this isn't a major problem?):

cor.test(1:73775, my_data$romcom, method = "spearman")


However, what I would ideally like to do is store the p-value next to the
Spearman value in the matrix (if that is possible).

The consensus seems to be that Hmisc is the ideal tool for this kind of
thing, so I installed that library, and I've been attempting to run it as
follows

flattenCorrMatrix <- function(cormat, pmat) {
  ut <- upper.tri(cormat)
  data.frame(
    row = rownames(cormat)[row(cormat)[ut]],
    column = rownames(cormat)[col(cormat)[ut]],
    cor  =(cormat)[ut],
    p = pmat[ut]
    )
}
x <- my_data[1:73775]
y <- my_data[1]
library(Hmisc)
res2<-rcorr(as.matrix(my_data[x,y]))
flattenCorrMatrix(res2$r, res2$P)



However, I get an error message, stating:

"Unsupported index type: tbl_df".
And I'm unsure how to fix this.

I've also tried bypassing Hmisc and using the following:


x <- my_data[1:73775]
y <- my_data[1]
corrs3 <- round(cor.test(x, y, method = "spearman", use="complete.obs"), 3)

But this returns the error message:

Error in cor.test.default(x, y, method = "spearman", use = "complete.obs") :

'x' and 'y' must have the same length


More Googling suggested that the "corr.test" function from the psych library
would be better. However, when I use the following code:

x <- my_data[1:73775]
y <- my_data[1]
library("psych")
corr.test(x, y = NULL, use = "pairwise", method="spearman", ci=TRUE)


I get the following error message:

Error: cannot allocate vector of size 40.6 Gb

I'm really out of options now, and I would really appreciate any
suggestions!

Thanks!




--
Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html
_______________________________________________
datatable-help mailing list
datatable-help@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help