What is an alternative to expand.grid if create a long vector?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

What is an alternative to expand.grid if create a long vector?

Shah Alam
Dear All,

I would like to know that is there any problem in *expand.grid* function or
it is a limitation of this function.

I am trying to create a combination of elements using expand.grid function.

A <- expand.grid(
c(seq(0.001, 0.1, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)),
c(seq(0.38, 0.42, length.out = 100)),
c(seq(0.12, 0.18, length.out = 100)))

Four combinations work fine. However, If I increase the combinations up to
ten. The following error appears.

 A <- expand.grid(
c(seq(0.001, 1, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)),
c(seq(0.38, 0.42, length.out = 100)),
c(seq(0.12, 0.18, length.out = 100)),
c(seq(0.01, 0.04, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)),
c(seq(0.001, 0.01, length.out = 100)),
c(seq(0.01, 0.3, length.out = 100))
)

*Error in rep.int <http://rep.int>(rep.int <http://rep.int>(seq_len(nx),
rep.int <http://rep.int>(rep.fac, nx)), orep) :   invalid 'times' value*

After reducing the length to 10. It produced a different type of error

A <- expand.grid(
c(seq(0.001, 0.005, length.out = 10)),
c(seq(0.0001, 0.0005, length.out = 10)),
c(seq(0.38, 0.42, length.out = 5)),
c(seq(0.12, 0.18, length.out = 7)),
c(seq(0.01, 0.04, length.out = 5)),
c(seq(0.0001, 0.001, length.out = 10)),
c(seq(0.0001, 0.001, length.out = 10)),
c(seq(0.001, 0.01, length.out = 10)),
c(seq(0.1, 0.8, length.out = 8))
)

*Error: cannot allocate vector of size 1.0 Gb*

What is an alternative to expand.grid if create a long vector based on 10
elements?

With kind regards,
Shah Alam

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: What is an alternative to expand.grid if create a long vector?

PIKAL Petr
Hi

Actually expand.grid produces data frame and not vector. And dimension of
the data frame is "big"

> dim(A)
[1] 100000000         4
> str(A)
'data.frame':   100000000 obs. of  4 variables:
 $ Var1: num  0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 ...
 $ Var2: num  1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04
...
 $ Var3: num  0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 ...
 $ Var4: num  0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 ...
 - attr(*, "out.attrs")=List of 2
  ..$ dim     : int [1:4] 100 100 100 100
  ..$ dimnames:List of 4
  .. ..$ Var1: chr [1:100] "Var1=0.001" "Var1=0.002" "Var1=0.003"
"Var1=0.004" ...
  .. ..$ Var2: chr [1:100] "Var2=0.0001000000" "Var2=0.0001090909"
"Var2=0.0001181818" "Var2=0.0001272727" ...
  .. ..$ Var3: chr [1:100] "Var3=0.3800000" "Var3=0.3804040"
"Var3=0.3808081" "Var3=0.3812121" ...
  .. ..$ Var4: chr [1:100] "Var4=0.1200000" "Var4=0.1206061"
"Var4=0.1212121" "Var4=0.1218182" ...
>

in case of 4 sequences 1e8 rows, 4 columns
in case of 10 sequences 1e20 rows and 10 columns
in your last example 1.4e8 rows and 10 columns which probably cross the
memory capacity of your PC.

Maybe you could increase memory of you PC. If I am correct to store the
first you need about 3.2GB, to strore the last 11.2 GB.

May I ask what you want to do with such a big object?

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Shah Alam
> Sent: Monday, April 19, 2021 2:36 PM
> To: r-help mailing list <[hidden email]>
> Subject: [R] What is an alternative to expand.grid if create a long
vector?
>
> Dear All,
>
> I would like to know that is there any problem in *expand.grid* function
or it
> is a limitation of this function.
>
> I am trying to create a combination of elements using expand.grid
function.
>
> A <- expand.grid(
> c(seq(0.001, 0.1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =
100)),
> c(seq(0.12, 0.18, length.out = 100)))
>
> Four combinations work fine. However, If I increase the combinations up to
> ten. The following error appears.
>
>  A <- expand.grid(
> c(seq(0.001, 1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =
100)),
> c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04, length.out =
100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.0001, 0.001, length.out =
> 100)), c(seq(0.001, 0.01, length.out = 100)), c(seq(0.01, 0.3, length.out
= 100))

> )
>
> *Error in rep.int <http://rep.int>(rep.int <http://rep.int>(seq_len(nx),
> rep.int <http://rep.int>(rep.fac, nx)), orep) :   invalid 'times' value*
>
> After reducing the length to 10. It produced a different type of error
>
> A <- expand.grid(
> c(seq(0.001, 0.005, length.out = 10)),
> c(seq(0.0001, 0.0005, length.out = 10)), c(seq(0.38, 0.42, length.out =
5)),
> c(seq(0.12, 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
> c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001, length.out =
10)),

> c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1, 0.8, length.out = 8))
> )
>
> *Error: cannot allocate vector of size 1.0 Gb*
>
> What is an alternative to expand.grid if create a long vector based on 10
> elements?
>
> With kind regards,
> Shah Alam
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: What is an alternative to expand.grid if create a long vector?

Rui Barradas
In reply to this post by Shah Alam
Hello,

If you want to process the data by rows, then maybe you should consider
a custom function that divides the problem in small chunks and process
one chunk at a time.

But even so, at 8 bytes per double, 100^10 rows is

(100^10*8)/(1024^4)  # Tera bytes
#[1] 727595761

It will take you a very, very long time to process.

Revise the problem?

Hope this helps,

Rui Barradas

Às 13:35 de 19/04/21, Shah Alam escreveu:

> Dear All,
>
> I would like to know that is there any problem in *expand.grid* function or
> it is a limitation of this function.
>
> I am trying to create a combination of elements using expand.grid function.
>
> A <- expand.grid(
> c(seq(0.001, 0.1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)),
> c(seq(0.38, 0.42, length.out = 100)),
> c(seq(0.12, 0.18, length.out = 100)))
>
> Four combinations work fine. However, If I increase the combinations up to
> ten. The following error appears.
>
>   A <- expand.grid(
> c(seq(0.001, 1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)),
> c(seq(0.38, 0.42, length.out = 100)),
> c(seq(0.12, 0.18, length.out = 100)),
> c(seq(0.01, 0.04, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)),
> c(seq(0.001, 0.01, length.out = 100)),
> c(seq(0.01, 0.3, length.out = 100))
> )
>
> *Error in rep.int <http://rep.int>(rep.int <http://rep.int>(seq_len(nx),
> rep.int <http://rep.int>(rep.fac, nx)), orep) :   invalid 'times' value*
>
> After reducing the length to 10. It produced a different type of error
>
> A <- expand.grid(
> c(seq(0.001, 0.005, length.out = 10)),
> c(seq(0.0001, 0.0005, length.out = 10)),
> c(seq(0.38, 0.42, length.out = 5)),
> c(seq(0.12, 0.18, length.out = 7)),
> c(seq(0.01, 0.04, length.out = 5)),
> c(seq(0.0001, 0.001, length.out = 10)),
> c(seq(0.0001, 0.001, length.out = 10)),
> c(seq(0.001, 0.01, length.out = 10)),
> c(seq(0.1, 0.8, length.out = 8))
> )
>
> *Error: cannot allocate vector of size 1.0 Gb*
>
> What is an alternative to expand.grid if create a long vector based on 10
> elements?
>
> With kind regards,
> Shah Alam
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: What is an alternative to expand.grid if create a long vector?

R help mailing list-2
Just some thoughts I am considering about the issue of how to make giant objects in memory without making them giant or all in memory.

As stupid as this sounds, when things get really big, it can mean not only processing your data in smaller amounts but using other techniques than asking expand.grid to create all possible combinations in advance.

Some languages like python allow generators that yield one item at a time and are called until exhausted, which sounds more like your usage. A single function remains resident in memory and each time it is called it uses the resident values in a calculation and returns the next. That approach may not work well with the way expand.grid works.

So a less efficient way would be to write your own deeply nested loop that generates one set of ten or so variables each time through the deepest nested loop that you can use one at a time. Alternatively, you can use such a loop to write a line at a time in something like a .CSV format and later read N lines at a time from the file or even have multiple programs work in parallel by taking their own allocations after ignoring the lines not meant for them, or some other method.

Deeply nested loops in R tend to be slow, as I have found out, which is indeed why I switched to using pmap() on a data.frame made using expand.grid first. But if your needs are exorbitant and you have limited memory, ....

Can you squeeze some memory out of your design? Your data seems highly repetitive and if you really want to store something like this in a column:
        c(seq(0.001, 1, length.out = 100))

The size of that, for comparison, is:

object.size(seq(0.001, 1, length.out = 100))
848 bytes

So it is 8 bytes per number plus some overhead.

Then consider storing something like that another way. First, the c() wrapper around the above is redundant, albeit harmless. Why not store this:
        1L:100L

object.size(1L:100L)
448 bytes

So, four bytes per number plus some overhead.

That stores integers between 1 and 100 and in your case that means that later you can divide by a thousand or so to get the number you want each time but not store a full double-precision number.

And if you use factors, it may take less space. I note some of your other values pick different starting and ending points but in all cases you ask for 100 equally-spaced values to be calculated by seq() which is fine but you could simply record a factor with umpteen specific values as either doubles or integers and if expand.grid honors that, it would use less space in any final output.  My experiments (not shown here) suggest you can easily cut sizes in half and perhaps more with judicious usage.

Perhaps finding or writing a more efficient loop in a C or C++ function would allow a way to loop through all possibilities more efficiently and provide a function for it to call on each iteration. Depending on your need, that can do a calculation using local variables and perhaps add a line to an output file, or add another set of values to a vector or other data structure that gets returned at the end of processing.

One possibility to consider is using an on-line resource, perhaps paying a fee, that will run your R program for you in an environment with more allowed resources like memory:

 https://rstudio.cloud/

Some of the professional options allow 8 GB of memory and perhaps 4 CPU. You can, of course, configure your own machine to have more memory or perhaps allocate lots more swap space and allow your process to abuse it.

There are many possible solutions but also consider if the sizes and amounts you are working on are realistic. I worked on a project a while ago where I generated a huge amount of instances with 500 iterations per instance and was asked to bump that up to 10,000 per instance (20 times as much) just to show the results were similar and that 500 had been enough. It ran for DAYS and luckily the rest of the project went back to more manageable numbers.

So, back to your scenario, I wonder if the regularity of your data would allow interesting games to be played. Imagine smaller combinations of say 10 levels each and for each row in the resulting data.frame, expand that out again so the number 2,3,4 (using just three for illustration) becomes (2:29, 3:39, 4:49) and is given to expand.grid to make a smaller local one-use expansion table to use. Your original giant problem is converted to making a modest table that for each row expands to a second modest table that is used and immediately discarded and replaced by a similar table. So for ten variables, instead of making 100^10 variations all at once, you might make 10^10 variations and iterate on rows of that and make another 10^10 size table and do your processing on each row of that and then remove that table and replace it till done. In theory, you can use that in additional stages and cut memory use sharply albeit perhaps increasing CPU usage substantially.
 


-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Rui Barradas
Sent: Monday, April 19, 2021 12:02 PM
To: Shah Alam <[hidden email]>; r-help mailing list <[hidden email]>
Subject: Re: [R] What is an alternative to expand.grid if create a long vector?

Hello,

If you want to process the data by rows, then maybe you should consider a custom function that divides the problem in small chunks and process one chunk at a time.

But even so, at 8 bytes per double, 100^10 rows is

(100^10*8)/(1024^4)  # Tera bytes
#[1] 727595761

It will take you a very, very long time to process.

Revise the problem?

Hope this helps,

Rui Barradas

Às 13:35 de 19/04/21, Shah Alam escreveu:

> Dear All,
>
> I would like to know that is there any problem in *expand.grid*
> function or it is a limitation of this function.
>
> I am trying to create a combination of elements using expand.grid function.
>
> A <- expand.grid(
> c(seq(0.001, 0.1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out
> = 100)), c(seq(0.12, 0.18, length.out = 100)))
>
> Four combinations work fine. However, If I increase the combinations
> up to ten. The following error appears.
>
>   A <- expand.grid(
> c(seq(0.001, 1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out
> = 100)), c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04,
> length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.001, 0.01, length.out
> = 100)), c(seq(0.01, 0.3, length.out = 100))
> )
>
> *Error in rep.int <http://rep.int>(rep.int <http://rep.int>(seq_len(nx),
> rep.int <http://rep.int>(rep.fac, nx)), orep) :   invalid 'times' value*
>
> After reducing the length to 10. It produced a different type of error
>
> A <- expand.grid(
> c(seq(0.001, 0.005, length.out = 10)), c(seq(0.0001, 0.0005,
> length.out = 10)), c(seq(0.38, 0.42, length.out = 5)), c(seq(0.12,
> 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
> c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001,
> length.out = 10)), c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1,
> 0.8, length.out = 8))
> )
>
> *Error: cannot allocate vector of size 1.0 Gb*
>
> What is an alternative to expand.grid if create a long vector based on
> 10 elements?
>
> With kind regards,
> Shah Alam
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: What is an alternative to expand.grid if create a long vector?

Jan van der LAan-2

But even if you could have a generator that is superefficient and
perform an calculation that is superfast the number of elements is
ridiculously large.

If we take 1 nanosec per element; the computation would still take:

 > (100^10)*1E-9/3600
[1] 27777778

hours, or

 > (100^10)*1E-9/3600/24/365
[1] 3170.979

years.

--
Jan








On 20-04-2021 03:46, Avi Gross via R-help wrote:

> Just some thoughts I am considering about the issue of how to make giant objects in memory without making them giant or all in memory.
>
> As stupid as this sounds, when things get really big, it can mean not only processing your data in smaller amounts but using other techniques than asking expand.grid to create all possible combinations in advance.
>
> Some languages like python allow generators that yield one item at a time and are called until exhausted, which sounds more like your usage. A single function remains resident in memory and each time it is called it uses the resident values in a calculation and returns the next. That approach may not work well with the way expand.grid works.
>
> So a less efficient way would be to write your own deeply nested loop that generates one set of ten or so variables each time through the deepest nested loop that you can use one at a time. Alternatively, you can use such a loop to write a line at a time in something like a .CSV format and later read N lines at a time from the file or even have multiple programs work in parallel by taking their own allocations after ignoring the lines not meant for them, or some other method.
>
> Deeply nested loops in R tend to be slow, as I have found out, which is indeed why I switched to using pmap() on a data.frame made using expand.grid first. But if your needs are exorbitant and you have limited memory, ....
>
> Can you squeeze some memory out of your design? Your data seems highly repetitive and if you really want to store something like this in a column:
> c(seq(0.001, 1, length.out = 100))
>
> The size of that, for comparison, is:
>
> object.size(seq(0.001, 1, length.out = 100))
> 848 bytes
>
> So it is 8 bytes per number plus some overhead.
>
> Then consider storing something like that another way. First, the c() wrapper around the above is redundant, albeit harmless. Why not store this:
> 1L:100L
>
> object.size(1L:100L)
> 448 bytes
>
> So, four bytes per number plus some overhead.
>
> That stores integers between 1 and 100 and in your case that means that later you can divide by a thousand or so to get the number you want each time but not store a full double-precision number.
>
> And if you use factors, it may take less space. I note some of your other values pick different starting and ending points but in all cases you ask for 100 equally-spaced values to be calculated by seq() which is fine but you could simply record a factor with umpteen specific values as either doubles or integers and if expand.grid honors that, it would use less space in any final output.  My experiments (not shown here) suggest you can easily cut sizes in half and perhaps more with judicious usage.
>
> Perhaps finding or writing a more efficient loop in a C or C++ function would allow a way to loop through all possibilities more efficiently and provide a function for it to call on each iteration. Depending on your need, that can do a calculation using local variables and perhaps add a line to an output file, or add another set of values to a vector or other data structure that gets returned at the end of processing.
>
> One possibility to consider is using an on-line resource, perhaps paying a fee, that will run your R program for you in an environment with more allowed resources like memory:
>
>   https://rstudio.cloud/
>
> Some of the professional options allow 8 GB of memory and perhaps 4 CPU. You can, of course, configure your own machine to have more memory or perhaps allocate lots more swap space and allow your process to abuse it.
>
> There are many possible solutions but also consider if the sizes and amounts you are working on are realistic. I worked on a project a while ago where I generated a huge amount of instances with 500 iterations per instance and was asked to bump that up to 10,000 per instance (20 times as much) just to show the results were similar and that 500 had been enough. It ran for DAYS and luckily the rest of the project went back to more manageable numbers.
>
> So, back to your scenario, I wonder if the regularity of your data would allow interesting games to be played. Imagine smaller combinations of say 10 levels each and for each row in the resulting data.frame, expand that out again so the number 2,3,4 (using just three for illustration) becomes (2:29, 3:39, 4:49) and is given to expand.grid to make a smaller local one-use expansion table to use. Your original giant problem is converted to making a modest table that for each row expands to a second modest table that is used and immediately discarded and replaced by a similar table. So for ten variables, instead of making 100^10 variations all at once, you might make 10^10 variations and iterate on rows of that and make another 10^10 size table and do your processing on each row of that and then remove that table and replace it till done. In theory, you can use that in additional stages and cut memory use sharply albeit perhaps increasing CPU usage substantially.
>  
>
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Rui Barradas
> Sent: Monday, April 19, 2021 12:02 PM
> To: Shah Alam <[hidden email]>; r-help mailing list <[hidden email]>
> Subject: Re: [R] What is an alternative to expand.grid if create a long vector?
>
> Hello,
>
> If you want to process the data by rows, then maybe you should consider a custom function that divides the problem in small chunks and process one chunk at a time.
>
> But even so, at 8 bytes per double, 100^10 rows is
>
> (100^10*8)/(1024^4)  # Tera bytes
> #[1] 727595761
>
> It will take you a very, very long time to process.
>
> Revise the problem?
>
> Hope this helps,
>
> Rui Barradas
>
> Às 13:35 de 19/04/21, Shah Alam escreveu:
>> Dear All,
>>
>> I would like to know that is there any problem in *expand.grid*
>> function or it is a limitation of this function.
>>
>> I am trying to create a combination of elements using expand.grid function.
>>
>> A <- expand.grid(
>> c(seq(0.001, 0.1, length.out = 100)),
>> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out
>> = 100)), c(seq(0.12, 0.18, length.out = 100)))
>>
>> Four combinations work fine. However, If I increase the combinations
>> up to ten. The following error appears.
>>
>>    A <- expand.grid(
>> c(seq(0.001, 1, length.out = 100)),
>> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out
>> = 100)), c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04,
>> length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)),
>> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.001, 0.01, length.out
>> = 100)), c(seq(0.01, 0.3, length.out = 100))
>> )
>>
>> *Error in rep.int <http://rep.int>(rep.int <http://rep.int>(seq_len(nx),
>> rep.int <http://rep.int>(rep.fac, nx)), orep) :   invalid 'times' value*
>>
>> After reducing the length to 10. It produced a different type of error
>>
>> A <- expand.grid(
>> c(seq(0.001, 0.005, length.out = 10)), c(seq(0.0001, 0.0005,
>> length.out = 10)), c(seq(0.38, 0.42, length.out = 5)), c(seq(0.12,
>> 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
>> c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001,
>> length.out = 10)), c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1,
>> 0.8, length.out = 8))
>> )
>>
>> *Error: cannot allocate vector of size 1.0 Gb*
>>
>> What is an alternative to expand.grid if create a long vector based on
>> 10 elements?
>>
>> With kind regards,
>> Shah Alam
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: What is an alternative to expand.grid if create a long vector?

PIKAL Petr
In reply to this post by PIKAL Petr
Hi



Keep your mails on the list. Actually you did not say much about your data and
the way how do you want to model them. There are plenty of modelling functions
in R starting with e.g. lm but I am not aware of a procedure in which you just
design your explanatory variables to set plausible model. But I am not expert
in statistics and this list is not ment for solving statistical problems.



Cheers

Petr





From: Shah Alam <[hidden email]>
Sent: Monday, April 19, 2021 5:20 PM
To: PIKAL Petr <[hidden email]>
Subject: Re: [R] What is an alternative to expand.grid if create a long
vector?



Dear Petr,



Thanks for your response. I am designing a model with 10 unknown parameters.
generating the combination of unknown parameters will be used in the model to
estimate the set of vectors that fits well to actual data. Is there any other
was to do it? I also used randomLHS function from lhs package. But, it did not
serve the purpose.



Best regards,

Shah Alam





On Mon, 19 Apr 2021 at 16:07, PIKAL Petr <[hidden email]
<mailto:[hidden email]> > wrote:

Hi

Actually expand.grid produces data frame and not vector. And dimension of
the data frame is "big"

> dim(A)
[1] 100000000         4
> str(A)
'data.frame':   100000000 obs. of  4 variables:
 $ Var1: num  0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 ...
 $ Var2: num  1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04
...
 $ Var3: num  0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 ...
 $ Var4: num  0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 ...
 - attr(*, "out.attrs")=List of 2
  ..$ dim     : int [1:4] 100 100 100 100
  ..$ dimnames:List of 4
  .. ..$ Var1: chr [1:100] "Var1=0.001" "Var1=0.002" "Var1=0.003"
"Var1=0.004" ...
  .. ..$ Var2: chr [1:100] "Var2=0.0001000000" "Var2=0.0001090909"
"Var2=0.0001181818" "Var2=0.0001272727" ...
  .. ..$ Var3: chr [1:100] "Var3=0.3800000" "Var3=0.3804040"
"Var3=0.3808081" "Var3=0.3812121" ...
  .. ..$ Var4: chr [1:100] "Var4=0.1200000" "Var4=0.1206061"
"Var4=0.1212121" "Var4=0.1218182" ...
>

in case of 4 sequences 1e8 rows, 4 columns
in case of 10 sequences 1e20 rows and 10 columns
in your last example 1.4e8 rows and 10 columns which probably cross the
memory capacity of your PC.

Maybe you could increase memory of you PC. If I am correct to store the
first you need about 3.2GB, to strore the last 11.2 GB.

May I ask what you want to do with such a big object?

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]
> <mailto:[hidden email]> > On Behalf Of Shah Alam
> Sent: Monday, April 19, 2021 2:36 PM
> To: r-help mailing list <[hidden email] <mailto:[hidden email]>
>  >
> Subject: [R] What is an alternative to expand.grid if create a long
vector?
>
> Dear All,
>
> I would like to know that is there any problem in *expand.grid* function
or it
> is a limitation of this function.
>
> I am trying to create a combination of elements using expand.grid
function.
>
> A <- expand.grid(
> c(seq(0.001, 0.1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =
100)),
> c(seq(0.12, 0.18, length.out = 100)))
>
> Four combinations work fine. However, If I increase the combinations up to
> ten. The following error appears.
>
>  A <- expand.grid(
> c(seq(0.001, 1, length.out = 100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =
100)),
> c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04, length.out =
100)),
> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.0001, 0.001, length.out =
> 100)), c(seq(0.001, 0.01, length.out = 100)), c(seq(0.01, 0.3, length.out
= 100))

> )
>
> *Error in rep.int <http://rep.int>  <http://rep.int>(rep.int
> <http://rep.int>  <http://rep.int>(seq_len(nx),
> rep.int <http://rep.int>  <http://rep.int>(rep.fac, nx)), orep) :   invalid
> 'times' value*
>
> After reducing the length to 10. It produced a different type of error
>
> A <- expand.grid(
> c(seq(0.001, 0.005, length.out = 10)),
> c(seq(0.0001, 0.0005, length.out = 10)), c(seq(0.38, 0.42, length.out =
5)),
> c(seq(0.12, 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
> c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001, length.out =
10)),

> c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1, 0.8, length.out = 8))
> )
>
> *Error: cannot allocate vector of size 1.0 Gb*
>
> What is an alternative to expand.grid if create a long vector based on 10
> elements?
>
> With kind regards,
> Shah Alam
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] <mailto:[hidden email]>  mailing list -- To
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: What is an alternative to expand.grid if create a long vector?

Jan van der LAan-2

This is an optimisation problem that you are trying to solve using a
grid search. There are numerous methods for optimisation, see
https://cran.r-project.org/web/views/Optimization.html for and overview
for R. It really depends on the exact problem what method is appropriate.

As Petr said helping you decide which method to use does not fit on this
list. Perhaps de overview linked to above (and the terms 'grid search'
and 'optimization') can help you find an appropriate method.

HTH,
Jan


On 20-04-2021 09:02, PIKAL Petr wrote:

> Hi
>
>
>
> Keep your mails on the list. Actually you did not say much about your data and
> the way how do you want to model them. There are plenty of modelling functions
> in R starting with e.g. lm but I am not aware of a procedure in which you just
> design your explanatory variables to set plausible model. But I am not expert
> in statistics and this list is not ment for solving statistical problems.
>
>
>
> Cheers
>
> Petr
>
>
>
>
>
> From: Shah Alam <[hidden email]>
> Sent: Monday, April 19, 2021 5:20 PM
> To: PIKAL Petr <[hidden email]>
> Subject: Re: [R] What is an alternative to expand.grid if create a long
> vector?
>
>
>
> Dear Petr,
>
>
>
> Thanks for your response. I am designing a model with 10 unknown parameters.
> generating the combination of unknown parameters will be used in the model to
> estimate the set of vectors that fits well to actual data. Is there any other
> was to do it? I also used randomLHS function from lhs package. But, it did not
> serve the purpose.
>
>
>
> Best regards,
>
> Shah Alam
>
>
>
>
>
> On Mon, 19 Apr 2021 at 16:07, PIKAL Petr <[hidden email]
> <mailto:[hidden email]> > wrote:
>
> Hi
>
> Actually expand.grid produces data frame and not vector. And dimension of
> the data frame is "big"
>
>> dim(A)
> [1] 100000000         4
>> str(A)
> 'data.frame':   100000000 obs. of  4 variables:
>   $ Var1: num  0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 ...
>   $ Var2: num  1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04
> ...
>   $ Var3: num  0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 ...
>   $ Var4: num  0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 ...
>   - attr(*, "out.attrs")=List of 2
>    ..$ dim     : int [1:4] 100 100 100 100
>    ..$ dimnames:List of 4
>    .. ..$ Var1: chr [1:100] "Var1=0.001" "Var1=0.002" "Var1=0.003"
> "Var1=0.004" ...
>    .. ..$ Var2: chr [1:100] "Var2=0.0001000000" "Var2=0.0001090909"
> "Var2=0.0001181818" "Var2=0.0001272727" ...
>    .. ..$ Var3: chr [1:100] "Var3=0.3800000" "Var3=0.3804040"
> "Var3=0.3808081" "Var3=0.3812121" ...
>    .. ..$ Var4: chr [1:100] "Var4=0.1200000" "Var4=0.1206061"
> "Var4=0.1212121" "Var4=0.1218182" ...
>>
>
> in case of 4 sequences 1e8 rows, 4 columns
> in case of 10 sequences 1e20 rows and 10 columns
> in your last example 1.4e8 rows and 10 columns which probably cross the
> memory capacity of your PC.
>
> Maybe you could increase memory of you PC. If I am correct to store the
> first you need about 3.2GB, to strore the last 11.2 GB.
>
> May I ask what you want to do with such a big object?
>
> Cheers
> Petr
>
>> -----Original Message-----
>> From: R-help <[hidden email]
>> <mailto:[hidden email]> > On Behalf Of Shah Alam
>> Sent: Monday, April 19, 2021 2:36 PM
>> To: r-help mailing list <[hidden email] <mailto:[hidden email]>
>>   >
>> Subject: [R] What is an alternative to expand.grid if create a long
> vector?
>>
>> Dear All,
>>
>> I would like to know that is there any problem in *expand.grid* function
> or it
>> is a limitation of this function.
>>
>> I am trying to create a combination of elements using expand.grid
> function.
>>
>> A <- expand.grid(
>> c(seq(0.001, 0.1, length.out = 100)),
>> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =
> 100)),
>> c(seq(0.12, 0.18, length.out = 100)))
>>
>> Four combinations work fine. However, If I increase the combinations up to
>> ten. The following error appears.
>>
>>   A <- expand.grid(
>> c(seq(0.001, 1, length.out = 100)),
>> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =
> 100)),
>> c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04, length.out =
> 100)),
>> c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.0001, 0.001, length.out =
>> 100)), c(seq(0.001, 0.01, length.out = 100)), c(seq(0.01, 0.3, length.out
> = 100))
>> )
>>
>> *Error in rep.int <http://rep.int>  <http://rep.int>(rep.int
>> <http://rep.int>  <http://rep.int>(seq_len(nx),
>> rep.int <http://rep.int>  <http://rep.int>(rep.fac, nx)), orep) :   invalid
>> 'times' value*
>>
>> After reducing the length to 10. It produced a different type of error
>>
>> A <- expand.grid(
>> c(seq(0.001, 0.005, length.out = 10)),
>> c(seq(0.0001, 0.0005, length.out = 10)), c(seq(0.38, 0.42, length.out =
> 5)),
>> c(seq(0.12, 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
>> c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001, length.out =
> 10)),
>> c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1, 0.8, length.out = 8))
>> )
>>
>> *Error: cannot allocate vector of size 1.0 Gb*
>>
>> What is an alternative to expand.grid if create a long vector based on 10
>> elements?
>>
>> With kind regards,
>> Shah Alam
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] <mailto:[hidden email]>  mailing list -- To
>> UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.