data.frame() versus as.data.frame() applied to a matrix.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

data.frame() versus as.data.frame() applied to a matrix.

Rolf Turner

Consider the following:

set.seed(42)
X <- matrix(runif(40),10,4)
colnames(X) <- c("a","b","a:x","b:x") # Imitating the output
                                       # of model.matrix().
D1 <- as.data.frame(X)
D2 <- data.frame(X)
names(D1)
[1] "a"   "b"   "a:x" "b:x"
names(D2)
[1] "a"   "b"   "a.x" "b.x"

The names of D2 are syntactically valid; those of D1 are not.

Why should I have expected this phenomenon? :-)

The as.data.frame() syntax seems to me much more natural for converting
a matrix to a data frame, yet it doesn't get it quite right, sometimes,
in respect of the names.

Is there some reason that as.data.frame() does not apply make.names()?
Or was this just an oversight?

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data.frame() versus as.data.frame() applied to a matrix.

Jeff Newmiller
I have no idea about "why it is this way" but there are many cases where I would rather have to use backticks around syntactically-invalid names than deal with arbitrary rules for mapping column names as they were supplied to column names as R wants them to be. From that perspective, making the conversion function leave the names alone and limit the name-mashing to one function sounds great to me. You can always call make.names yourself.

On February 5, 2019 2:22:24 PM PST, Rolf Turner <[hidden email]> wrote:

>
>Consider the following:
>
>set.seed(42)
>X <- matrix(runif(40),10,4)
>colnames(X) <- c("a","b","a:x","b:x") # Imitating the output
>                                       # of model.matrix().
>D1 <- as.data.frame(X)
>D2 <- data.frame(X)
>names(D1)
>[1] "a"   "b"   "a:x" "b:x"
>names(D2)
>[1] "a"   "b"   "a.x" "b.x"
>
>The names of D2 are syntactically valid; those of D1 are not.
>
>Why should I have expected this phenomenon? :-)
>
>The as.data.frame() syntax seems to me much more natural for converting
>
>a matrix to a data frame, yet it doesn't get it quite right, sometimes,
>in respect of the names.
>
>Is there some reason that as.data.frame() does not apply make.names()?
>Or was this just an oversight?
>
>cheers,
>
>Rolf Turner

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data.frame() versus as.data.frame() applied to a matrix.

Rolf Turner

On 2/6/19 12:27 PM, Jeff Newmiller wrote:

> I have no idea about "why it is this way" but there are many cases
> where I would rather have to use backticks around
> syntactically-invalid names than deal with arbitrary rules for
> mapping column names as they were supplied to column names as R wants
> them to be. From that perspective, making the conversion function
> leave the names alone and limit the name-mashing to one function
> sounds great to me. You can always call make.names yourself.

Fair enough.  My real problem was getting ambushed by the fact that
*different* names arise depending on whether one uses data.frame(X)
or as.data.frame(X).  I'll spare you the details. :-)

cheers,

Rolf

>
> On February 5, 2019 2:22:24 PM PST, Rolf Turner
> <[hidden email]> wrote:
>>
>> Consider the following:
>>
>> set.seed(42) X <- matrix(runif(40),10,4) colnames(X) <-
>> c("a","b","a:x","b:x") # Imitating the output # of model.matrix().
>> D1 <- as.data.frame(X) D2 <- data.frame(X) names(D1) [1] "a"   "b"
>> "a:x" "b:x" names(D2) [1] "a"   "b"   "a.x" "b.x"
>>
>> The names of D2 are syntactically valid; those of D1 are not.
>>
>> Why should I have expected this phenomenon? :-)
>>
>> The as.data.frame() syntax seems to me much more natural for
>> converting
>>
>> a matrix to a data frame, yet it doesn't get it quite right,
>> sometimes, in respect of the names.
>>
>> Is there some reason that as.data.frame() does not apply
>> make.names()? Or was this just an oversight?

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data.frame() versus as.data.frame() applied to a matrix.

Richard M. Heiberger
To me the interesting difference between matrix() and as.matrix() is
that as.matrix() retains the argument names as the rows names of the
result.
> tmp <- structure(1:3, names=letters[1:3])
> tmp
a b c
1 2 3
> matrix(tmp)
     [,1]
[1,]    1
[2,]    2
[3,]    3
> as.matrix(tmp)
  [,1]
a    1
b    2
c    3
>

On Tue, Feb 5, 2019 at 6:53 PM Rolf Turner <[hidden email]> wrote:

>
>
> On 2/6/19 12:27 PM, Jeff Newmiller wrote:
>
> > I have no idea about "why it is this way" but there are many cases
> > where I would rather have to use backticks around
> > syntactically-invalid names than deal with arbitrary rules for
> > mapping column names as they were supplied to column names as R wants
> > them to be. From that perspective, making the conversion function
> > leave the names alone and limit the name-mashing to one function
> > sounds great to me. You can always call make.names yourself.
>
> Fair enough.  My real problem was getting ambushed by the fact that
> *different* names arise depending on whether one uses data.frame(X)
> or as.data.frame(X).  I'll spare you the details. :-)
>
> cheers,
>
> Rolf
>
> >
> > On February 5, 2019 2:22:24 PM PST, Rolf Turner
> > <[hidden email]> wrote:
> >>
> >> Consider the following:
> >>
> >> set.seed(42) X <- matrix(runif(40),10,4) colnames(X) <-
> >> c("a","b","a:x","b:x") # Imitating the output # of model.matrix().
> >> D1 <- as.data.frame(X) D2 <- data.frame(X) names(D1) [1] "a"   "b"
> >> "a:x" "b:x" names(D2) [1] "a"   "b"   "a.x" "b.x"
> >>
> >> The names of D2 are syntactically valid; those of D1 are not.
> >>
> >> Why should I have expected this phenomenon? :-)
> >>
> >> The as.data.frame() syntax seems to me much more natural for
> >> converting
> >>
> >> a matrix to a data frame, yet it doesn't get it quite right,
> >> sometimes, in respect of the names.
> >>
> >> Is there some reason that as.data.frame() does not apply
> >> make.names()? Or was this just an oversight?
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data.frame() versus as.data.frame() applied to a matrix.

R help mailing list-2
In reply to this post by Rolf Turner
I think of the methods of as.data.frame as a helper functions for
data.frame and don't usually call as.data.frame directly.  data.frame()
will call as.data.frame for each of its arguments and then put together the
the results into one big data.frame.

> for(method in
c("as.data.frame.list","as.data.frame.character","as.data.frame.integer","as.data.frame.numeric","as.data.frame.matrix"))
trace(method, quote(str(x)))
Tracing function "as.data.frame.list" in package "base"
Tracing function "as.data.frame.character" in package "base"
Tracing function "as.data.frame.integer" in package "base"
Tracing function "as.data.frame.numeric" in package "base"
Tracing function "as.data.frame.matrix" in package "base"
> d <-
data.frame(Mat=cbind(m1=11:12,M2=13:14),Num=c(15.5,16.6),Int=17:18,List=list(L1=19:20,L2=c(20.2,21.2)))
Tracing as.data.frame.matrix(x[[i]], optional = TRUE) on entry
 int [1:2, 1:2] 11 12 13 14
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "m1" "M2"
Tracing as.data.frame.numeric(x[[i]], optional = TRUE) on entry
 num [1:2] 15.5 16.6
Tracing as.data.frame.integer(x[[i]], optional = TRUE) on entry
 int [1:2] 17 18
Tracing as.data.frame.list(x[[i]], optional = TRUE, stringsAsFactors =
stringsAsFactors) on entry
List of 2
 $ L1: int [1:2] 19 20
 $ L2: num [1:2] 20.2 21.2
Tracing as.data.frame.integer(x[[i]], optional = TRUE) on entry
 int [1:2] 19 20
Tracing as.data.frame.numeric(x[[i]], optional = TRUE) on entry
 num [1:2] 20.2 21.2

If I recall correctly, that is how S did things and Splus tried to use
something like as.data.frameAux for the name of the helper function to
avoid some of the frustration you describe.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Feb 5, 2019 at 2:22 PM Rolf Turner <[hidden email]> wrote:

>
> Consider the following:
>
> set.seed(42)
> X <- matrix(runif(40),10,4)
> colnames(X) <- c("a","b","a:x","b:x") # Imitating the output
>                                        # of model.matrix().
> D1 <- as.data.frame(X)
> D2 <- data.frame(X)
> names(D1)
> [1] "a"   "b"   "a:x" "b:x"
> names(D2)
> [1] "a"   "b"   "a.x" "b.x"
>
> The names of D2 are syntactically valid; those of D1 are not.
>
> Why should I have expected this phenomenon? :-)
>
> The as.data.frame() syntax seems to me much more natural for converting
> a matrix to a data frame, yet it doesn't get it quite right, sometimes,
> in respect of the names.
>
> Is there some reason that as.data.frame() does not apply make.names()?
> Or was this just an oversight?
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.