This post was updated on .
ETA: Oh gosh, I think I might have posted this in the wrong sub-thread. I'm new to the mailing-list type site, and was a little confused. Apologies! I've (hopefully!) fixed it now.
Hi everyone, I am very, very new to R, and I'm trying to work out the p-values for thousands of spearman correlation scores. Essentially, I have imported a large dataset from a CSV file (366 obs. of 73775 variables) into R Studio. Along the x-axis, I have a series of words, the y-axis contains dates, and the data is the relative frequencies of each of the words on that particular date. Essentially, I am trying to see if the frequency of any/all of the given words increases significantly over the course of a year. After some trial and error (and a lot of Googling!), I have a code which successfully stores the Spearman Correlation values in a matrix: x <- my_data[1:73775] y <- my_data[1] corrs3 <- round(cor(x, y, method = "spearman", use="complete.obs"), 3) This code stores the words in one column of the matrix and their Spearman value in the second column However, what I need to do now is to calculate the corresponding p-values for each of the variables. I have been able to this for individual variables by running the following code (although I do get a warning saying "Cannot compute exact p-value with ties", but I've been told that this isn't a major problem?): cor.test(1:73775, my_data$romcom, method = "spearman") However, what I would ideally like to do is store the p-value next to the Spearman value in the matrix (if that is possible). The consensus seems to be that Hmisc is the ideal tool for this kind of thing, so I installed that library, and I've been attempting to run it as follows flattenCorrMatrix <- function(cormat, pmat) { ut <- upper.tri(cormat) data.frame( row = rownames(cormat)[row(cormat)[ut]], column = rownames(cormat)[col(cormat)[ut]], cor =(cormat)[ut], p = pmat[ut] ) } x <- my_data[1:73775] y <- my_data[1] library(Hmisc) res2<-rcorr(as.matrix(my_data[x,y])) flattenCorrMatrix(res2$r, res2$P) However, I get an error message, stating: "Unsupported index type: tbl_df". And I'm unsure how to fix this. I've also tried bypassing Hmisc and using the following: x <- my_data[1:73775] y <- my_data[1] corrs3 <- round(cor.test(x, y, method = "spearman", use="complete.obs"), 3) But this returns the error message: Error in cor.test.default(x, y, method = "spearman", use = "complete.obs") : 'x' and 'y' must have the same length More Googling suggested that the "corr.test" function from the psych library would be better. However, when I use the following code: x <- my_data[1:73775] y <- my_data[1] library("psych") corr.test(x, y = NULL, use = "pairwise", method="spearman", ci=TRUE) I get the following error message: Error: cannot allocate vector of size 40.6 Gb I'm really out of options now, and I would really appreciate any suggestions! Thanks! -- Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html _______________________________________________ datatable-help mailing list datatable-help@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
Free forum by Nabble | Edit this page |