
This post has NOT been accepted by the mailing list yet.
Hello R List,
We are migrating from SPSS to R, in SPSS currently Stepwise Discriminant analysis using wilks lambda is used. The corresponding best function in R seems to be greedy.wilks in klaR package. The output of the function greedy.wilks is according to expectation  the output has lambda value and pvalue for each variable. The problem is our data set contains 325 variables and the number of rows is between 20,000 to 80,000 records. When we try to run the greedy.wilks with this dataset of 20k records and 325 variables we get a error as
Error: cannot allocate vector of size 2.7 Gb
In addition: Warning messages:
1: In matrix(0, nrow = n, ncol = n) :
Reached total allocation of 3162Mb: see help(memory.size)
2: In matrix(0, nrow = n, ncol = n) :
Reached total allocation of 3162Mb: see help(memory.size)
3: In matrix(0, nrow = n, ncol = n) :
Reached total allocation of 3162Mb: see help(memory.size)
4: In matrix(0, nrow = n, ncol = n) :
Reached total allocation of 3162Mb: see help(memory.size)
On traceback()
6: matrix(0, nrow = n, ncol = n)
5: Lambda(matrix(X.mod), grouping)
4: greedy.wilks.default(x, grouping, ...)
3: greedy.wilks(x, grouping, ...)
Doing a web search found that the problem is with the memory allocation, we tried with bigger memory size of 6 GB and still get the same message.
R Version: 3.0.0
OS: Windows 7 with 4 GB memory
The other machine tried was
R version: 3.0.1
OS: Windows Server 2008 R2 with 6GB memory.
1. Our doubt is if the error is really due to memory, if so can it scale to 80k+ records and 325 variables without bombing with same error?
2. Is there any other way we can achieve the same result i.e. lamda value and pvalue for each variable?
I am new to R, apologize for any mistakes.
Thanks,
Hari
