# oddity in transform

5 messages
Open this post in threaded view
|

## oddity in transform

 Note the inconsistency in the names in these two examples.  X.Time in the first case and Time.1 in the second case.   > transform(BOD, X = BOD[1:2] * seq(6))     Time demand X.Time X.demand   1    1    8.3      1      8.3   2    2   10.3      4     20.6   3    3   19.0      9     57.0   4    4   16.0     16     64.0   5    5   15.6     25     78.0   6    7   19.8     42    118.8   > transform(BOD, X = BOD[1] * seq(6))     Time demand Time.1   1    1    8.3      1   2    2   10.3      4   3    3   19.0      9   4    4   16.0     16   5    5   15.6     25   6    7   19.8     42 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: oddity in transform

 I think you meant to call BOD[,1] From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!" And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"   If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column. And it uses the tag-name if simply supplied with a vector: data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1 data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector) data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems. You can always change the column-names later. Best regards, Emil Bode   Data-analyst   +31 6 43 83 89 33 [hidden email]   DANS: Netherlands Institute for Permanent Access to Digital Research Resources Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] | dans.knaw.nl DANS is an institute of the Dutch Academy KNAW and funding organisation NWO . ﻿On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote:     Note the inconsistency in the names in these two examples.  X.Time in     the first case and Time.1 in the second case.           > transform(BOD, X = BOD[1:2] * seq(6))         Time demand X.Time X.demand       1    1    8.3      1      8.3       2    2   10.3      4     20.6       3    3   19.0      9     57.0       4    4   16.0     16     64.0       5    5   15.6     25     78.0       6    7   19.8     42    118.8           > transform(BOD, X = BOD[1] * seq(6))         Time demand Time.1       1    1    8.3      1       2    2   10.3      4       3    3   19.0      9       4    4   16.0     16       5    5   15.6     25       6    7   19.8     42         --     Statistics & Software Consulting     GKX Group, GKX Associates Inc.     tel: 1-877-GKX-GROUP     email: ggrothendieck at gmail.com         ______________________________________________     [hidden email] mailing list     https://stat.ethz.ch/mailman/listinfo/r-devel    ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: oddity in transform

 The idea is that one wants to write the line of code below  in a general way which works the same whether you specify ix as one column or multiple columns but the naming entirely changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or other hard coding solutions still require writing multiple cases. ix <- 1:2 transform(BOD, X = BOD[ix] * seq(6)) On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <[hidden email]> wrote: > I think you meant to call BOD[,1] > From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!" > And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))" > > If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column. > And it uses the tag-name if simply supplied with a vector: > data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1 > data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector) > data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X > > So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems. > You can always change the column-names later. > > Best regards, > Emil Bode > > Data-analyst > > +31 6 43 83 89 33 > [hidden email] > > DANS: Netherlands Institute for Permanent Access to Digital Research Resources > Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] | dans.knaw.nl > DANS is an institute of the Dutch Academy KNAW and funding organisation NWO . > > ﻿On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote: > >     Note the inconsistency in the names in these two examples.  X.Time in >     the first case and Time.1 in the second case. > >       > transform(BOD, X = BOD[1:2] * seq(6)) >         Time demand X.Time X.demand >       1    1    8.3      1      8.3 >       2    2   10.3      4     20.6 >       3    3   19.0      9     57.0 >       4    4   16.0     16     64.0 >       5    5   15.6     25     78.0 >       6    7   19.8     42    118.8 > >       > transform(BOD, X = BOD[1] * seq(6)) >         Time demand Time.1 >       1    1    8.3      1 >       2    2   10.3      4 >       3    3   19.0      9 >       4    4   16.0     16 >       5    5   15.6     25 >       6    7   19.8     42 > >     -- >     Statistics & Software Consulting >     GKX Group, GKX Associates Inc. >     tel: 1-877-GKX-GROUP >     email: ggrothendieck at gmail.com > >     ______________________________________________ >     [hidden email] mailing list >     https://stat.ethz.ch/mailman/listinfo/r-devel> > -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: oddity in transform

 I don't think it has much to do with transform in particular: > BOD <- data.frame(Time = 1:6, demand = runif(6)) > BOD[["X"]] <- BOD[1:2] * seq(6); BOD   Time    demand X.Time  X.demand 1    1 0.8649628      1 0.8649628 2    2 0.5895380      4 1.1790761 3    3 0.6854635      9 2.0563906 4    4 0.4255801     16 1.7023206 5    5 0.5738793     25 2.8693967 6    6 0.9996713     36 5.9980281 > BOD <- data.frame(Time = 1:6, demand = runif(6)) > BOD[["X"]] <- BOD[1] * seq(6); BOD   Time     demand Time 1    1 0.72990231    1 2    2 0.61721422    4 3    3 0.02389160    9 4    4 0.28341746   16 5    5 0.06116124   25 6    6 0.67966577   36 --Ista On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck <[hidden email]> wrote: > The idea is that one wants to write the line of code below >  in a general way which works the same > whether you specify ix as one column or multiple columns but the naming entirely > changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or > other hard coding solutions still require writing multiple cases. > > ix <- 1:2 > transform(BOD, X = BOD[ix] * seq(6)) > > > > On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <[hidden email]> wrote: >> I think you meant to call BOD[,1] >> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!" >> And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))" >> >> If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column. >> And it uses the tag-name if simply supplied with a vector: >> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1 >> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector) >> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X >> >> So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems. >> You can always change the column-names later. >> >> Best regards, >> Emil Bode >> >> Data-analyst >> >> +31 6 43 83 89 33 >> [hidden email] >> >> DANS: Netherlands Institute for Permanent Access to Digital Research Resources >> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] | dans.knaw.nl >> DANS is an institute of the Dutch Academy KNAW and funding organisation NWO . >> >> ﻿On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote: >> >>     Note the inconsistency in the names in these two examples.  X.Time in >>     the first case and Time.1 in the second case. >> >>       > transform(BOD, X = BOD[1:2] * seq(6)) >>         Time demand X.Time X.demand >>       1    1    8.3      1      8.3 >>       2    2   10.3      4     20.6 >>       3    3   19.0      9     57.0 >>       4    4   16.0     16     64.0 >>       5    5   15.6     25     78.0 >>       6    7   19.8     42    118.8 >> >>       > transform(BOD, X = BOD[1] * seq(6)) >>         Time demand Time.1 >>       1    1    8.3      1 >>       2    2   10.3      4 >>       3    3   19.0      9 >>       4    4   16.0     16 >>       5    5   15.6     25 >>       6    7   19.8     42 >> >>     -- >>     Statistics & Software Consulting >>     GKX Group, GKX Associates Inc. >>     tel: 1-877-GKX-GROUP >>     email: ggrothendieck at gmail.com >> >>     ______________________________________________ >>     [hidden email] mailing list >>     https://stat.ethz.ch/mailman/listinfo/r-devel>> >> > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: oddity in transform

 On Tue, Jul 24, 2018 at 11:41 AM, Ista Zahn <[hidden email]> wrote: > I don't think it has much to do with transform in particular: > >> BOD <- data.frame(Time = 1:6, demand = runif(6)) >> BOD[["X"]] <- BOD[1:2] * seq(6); BOD >   Time    demand X.Time  X.demand > 1    1 0.8649628      1 0.8649628 > 2    2 0.5895380      4 1.1790761 > 3    3 0.6854635      9 2.0563906 > 4    4 0.4255801     16 1.7023206 > 5    5 0.5738793     25 2.8693967 > 6    6 0.9996713     36 5.9980281 >> BOD <- data.frame(Time = 1:6, demand = runif(6)) >> BOD[["X"]] <- BOD[1] * seq(6); BOD >   Time     demand Time > 1    1 0.72990231    1 > 2    2 0.61721422    4 > 3    3 0.02389160    9 > 4    4 0.28341746   16 > 5    5 0.06116124   25 > 6    6 0.67966577   36 Ugh, well, I see now that BOD[["X"]] <- BOD[1:2] * seq(6); BOD and transform(BOD, X = BOD[1:2] * seq(6)) don't produce the same thing, despite printing in ways that look similar. However, data.frame(BOD, X = BOD[1:2] * seq(6)) and data.frame(BOD, X = BOD[1] * seq(6)) do produce the same result as transform, so the point about this being much more pervasive still holds. --Ista > > --Ista > > > On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck > <[hidden email]> wrote: >> The idea is that one wants to write the line of code below >>  in a general way which works the same >> whether you specify ix as one column or multiple columns but the naming entirely >> changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or >> other hard coding solutions still require writing multiple cases. >> >> ix <- 1:2 >> transform(BOD, X = BOD[ix] * seq(6)) >> >> >> >> On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <[hidden email]> wrote: >>> I think you meant to call BOD[,1] >>> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!" >>> And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))" >>> >>> If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column. >>> And it uses the tag-name if simply supplied with a vector: >>> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1 >>> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector) >>> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X >>> >>> So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems. >>> You can always change the column-names later. >>> >>> Best regards, >>> Emil Bode >>> >>> Data-analyst >>> >>> +31 6 43 83 89 33 >>> [hidden email] >>> >>> DANS: Netherlands Institute for Permanent Access to Digital Research Resources >>> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] | dans.knaw.nl >>> DANS is an institute of the Dutch Academy KNAW and funding organisation NWO . >>> >>> ﻿On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote: >>> >>>     Note the inconsistency in the names in these two examples.  X.Time in >>>     the first case and Time.1 in the second case. >>> >>>       > transform(BOD, X = BOD[1:2] * seq(6)) >>>         Time demand X.Time X.demand >>>       1    1    8.3      1      8.3 >>>       2    2   10.3      4     20.6 >>>       3    3   19.0      9     57.0 >>>       4    4   16.0     16     64.0 >>>       5    5   15.6     25     78.0 >>>       6    7   19.8     42    118.8 >>> >>>       > transform(BOD, X = BOD[1] * seq(6)) >>>         Time demand Time.1 >>>       1    1    8.3      1 >>>       2    2   10.3      4 >>>       3    3   19.0      9 >>>       4    4   16.0     16 >>>       5    5   15.6     25 >>>       6    7   19.8     42 >>> >>>     -- >>>     Statistics & Software Consulting >>>     GKX Group, GKX Associates Inc. >>>     tel: 1-877-GKX-GROUP >>>     email: ggrothendieck at gmail.com >>> >>>     ______________________________________________ >>>     [hidden email] mailing list >>>     https://stat.ethz.ch/mailman/listinfo/r-devel>>> >>> >> >> >> >> -- >> Statistics & Software Consulting >> GKX Group, GKX Associates Inc. >> tel: 1-877-GKX-GROUP >> email: ggrothendieck at gmail.com >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel