Why does `[<-.matrix` not exist in base R

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Why does `[<-.matrix` not exist in base R

David Disabato
Whenever going from working with a data.frame to a matrix, I get annoyed
that I cannot assign and subset at the same time with matrices - like I can
with data.frames.

For example, if I want to add a new column to a data.frame, I can do
something like `myDataFrame[, "newColumn"] <- NA`.

However, with a matrix, this syntax does not work and I have to use a call
to `cbind` and create a new object. For example, `mymatrix2 <-
cbind(mymatrix, "newColumn" = NA)`.

Is there a programming reason that base R does not have a matrix method for
`[<-` or is it something that arguably should be added?

--
David J. Disabato, Ph.D.
Postdoctoral Research Scholar
Kent State University
[hidden email]

Email is not a secure form of communication as information and
confidentiality cannot be guaranteed. Information provided in an email is
not intended to be a professional service. In the case of a crisis or
emergency situation, call 911.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why does `[<-.matrix` not exist in base R

Ivan Krylov
Hello David,

On Sat, 23 Nov 2019 11:58:42 -0500
David Disabato <[hidden email]> wrote:

> For example, if I want to add a new column to a data.frame, I can do
> something like `myDataFrame[, "newColumn"] <- NA`.

<Opinion>

Arguably, iterative growth of data structures is not the "R style",
since it may lead to costly reallocations, resulting in the worst case
scenario of quadratic behaviour for linear operations.

If iterative processing is unavoidable, it might help to store partial
results in a list, then build the final matrix with a single call to
do.call(cbind, results).

</Opinion>

> However, with a matrix, this syntax does not work and I have to use a
> call to `cbind` and create a new object. For example, `mymatrix2 <-
> cbind(mymatrix, "newColumn" = NA)`.

> Is there a programming reason that base R does not have a matrix
> method for `[<-` or is it something that arguably should be added?

A data frame is a list of columns, so adding a new column is relatively
cheap: allocate enough memory for one column and append (roughly
speaking) a pointer to the list of pointers-to-column-data. This
results in reallocation of the *latter* list, but, since that list is
small in comparison to the whole data frame, it's okay. Note that this
operation does not affect any of the other columns belonging to the
same data frame.

A matrix, on the other hand, is a vector containing the whole matrix
with array dimensions stored as an attribute. Since R matrices are
stored by column [*], adding a new column to the matrix means resizing
the buffer to hold length(matrix) + nrow(matrix) elements, then
appending the new column to the end of the buffer. If the allocator
cannot enlarge the buffer in place (because the buffer is followed in
memory by another buffer), it has to allocate the new buffer elsewhere,
copy the memory, then free the old buffer.

To build a matrix by appending columns, one needs to perform this O(n)
operation O(n) times, resulting in O(n^2) performance. Adding rows is
even worse because memory has to be copied in parts, not as a whole.

Disclaimer: this is one reason I can think about why doesn't R offer
subassignment to non-existent matrix columns by default. The actual
reason might be different.

--
Best regards,
Ivan

[*]
https://github.com/wch/r-source/blob/bac4cd3013ead1379e20127d056ee036278b47ff/src/main/duplicate.c#L443

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why does `[<-.matrix` not exist in base R

Peter Dalgaard-2
In reply to this post by David Disabato
The subject is misguided. It is not a problem to assign to a subset of columns.

The issue is that the assignment operation does not want to _expand_ the matrix automatically upon seeing an out-of-bounds index. E.g.:

> M <- matrix(0,2,2)
> M[,3]<-1
Error in `[<-`(`*tmp*`, , 3, value = 1) : subscript out of bounds
> M[,2]<-1
> M
     [,1] [,2]
[1,]    0    1
[2,]    0    1

You can, however, do things like this:

> M <- M[,c(1,2,2)]
> M[,3]<-3
> M
     [,1] [,2] [,3]
[1,]    0    1    3
[2,]    0    1    3

-pd

> On 23 Nov 2019, at 17:58 , David Disabato <[hidden email]> wrote:
>
> Whenever going from working with a data.frame to a matrix, I get annoyed
> that I cannot assign and subset at the same time with matrices - like I can
> with data.frames.
>
> For example, if I want to add a new column to a data.frame, I can do
> something like `myDataFrame[, "newColumn"] <- NA`.
>
> However, with a matrix, this syntax does not work and I have to use a call
> to `cbind` and create a new object. For example, `mymatrix2 <-
> cbind(mymatrix, "newColumn" = NA)`.
>
> Is there a programming reason that base R does not have a matrix method for
> `[<-` or is it something that arguably should be added?
>
> --
> David J. Disabato, Ph.D.
> Postdoctoral Research Scholar
> Kent State University
> [hidden email]
>
> Email is not a secure form of communication as information and
> confidentiality cannot be guaranteed. Information provided in an email is
> not intended to be a professional service. In the case of a crisis or
> emergency situation, call 911.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why does `[<-.matrix` not exist in base R

Bert Gunter-2
Re: [<-.

It is perhaps worth noting that the OP seems "misguided" in another sense.
His complaint seems to rest on the assumption that because matrices and
data frames both have a row/column structure, certain operations on them
should be similar. I disagree. In fact, data frames and matrices are very
different structures with very different semantics and wholly different
purposes. Their "similarity" is superficial. First and foremost, (numeric)
matrices are numerical objects, the basic building blocks for linear
algebra with a whole devoted set of algebraic functionality for them (see
also: BLAS) ; while data frames are essentially data storage/manipulation
structures, internal data bases for R. As a result, imo, there is good
reason that [<-. should *not* behave with matrices as it does with data
frames: when doing complex matrix calculations, returning an error message
when indices go out of range seems much more desirable than silently
changing dimensions. Indeed, I think one might make a better argument for
doing that for data frames also, but, as it is both relativey innocuous and
convenient to add columns in that context -- the data frame method is just
a wrapper for data.frame() as the man page says -- it's not really an issue
(and certainly shouldn't be altered now).

Perhaps a moral: one should be very wary of assuming that behavior that you
think is "natural" and "desirable" would be assumed to be so by others.
Especially for long used and extensively exercised core functionality.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Nov 24, 2019 at 6:47 AM peter dalgaard <[hidden email]> wrote:

> The subject is misguided. It is not a problem to assign to a subset of
> columns.
>
> The issue is that the assignment operation does not want to _expand_ the
> matrix automatically upon seeing an out-of-bounds index. E.g.:
>
> > M <- matrix(0,2,2)
> > M[,3]<-1
> Error in `[<-`(`*tmp*`, , 3, value = 1) : subscript out of bounds
> > M[,2]<-1
> > M
>      [,1] [,2]
> [1,]    0    1
> [2,]    0    1
>
> You can, however, do things like this:
>
> > M <- M[,c(1,2,2)]
> > M[,3]<-3
> > M
>      [,1] [,2] [,3]
> [1,]    0    1    3
> [2,]    0    1    3
>
> -pd
>
> > On 23 Nov 2019, at 17:58 , David Disabato <[hidden email]> wrote:
> >
> > Whenever going from working with a data.frame to a matrix, I get annoyed
> > that I cannot assign and subset at the same time with matrices - like I
> can
> > with data.frames.
> >
> > For example, if I want to add a new column to a data.frame, I can do
> > something like `myDataFrame[, "newColumn"] <- NA`.
> >
> > However, with a matrix, this syntax does not work and I have to use a
> call
> > to `cbind` and create a new object. For example, `mymatrix2 <-
> > cbind(mymatrix, "newColumn" = NA)`.
> >
> > Is there a programming reason that base R does not have a matrix method
> for
> > `[<-` or is it something that arguably should be added?
> >
> > --
> > David J. Disabato, Ph.D.
> > Postdoctoral Research Scholar
> > Kent State University
> > [hidden email]
> >
> > Email is not a secure form of communication as information and
> > confidentiality cannot be guaranteed. Information provided in an email is
> > not intended to be a professional service. In the case of a crisis or
> > emergency situation, call 911.
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.