oddity in transform

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

oddity in transform

Gabor Grothendieck
Note the inconsistency in the names in these two examples.  X.Time in
the first case and Time.1 in the second case.

  > transform(BOD, X = BOD[1:2] * seq(6))
    Time demand X.Time X.demand
  1    1    8.3      1      8.3
  2    2   10.3      4     20.6
  3    3   19.0      9     57.0
  4    4   16.0     16     64.0
  5    5   15.6     25     78.0
  6    7   19.8     42    118.8

  > transform(BOD, X = BOD[1] * seq(6))
    Time demand Time.1
  1    1    8.3      1
  2    2   10.3      4
  3    3   19.0      9
  4    4   16.0     16
  5    5   15.6     25
  6    7   19.8     42

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: oddity in transform

Emil
I think you meant to call BOD[,1]
From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!"
And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
 
If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column.
And it uses the tag-name if simply supplied with a vector:
data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1
data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector)
data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X

So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems.
You can always change the column-names later.

Best regards,
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
[hidden email]
 
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] <mailto:[hidden email]> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.

On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote:

    Note the inconsistency in the names in these two examples.  X.Time in
    the first case and Time.1 in the second case.
   
      > transform(BOD, X = BOD[1:2] * seq(6))
        Time demand X.Time X.demand
      1    1    8.3      1      8.3
      2    2   10.3      4     20.6
      3    3   19.0      9     57.0
      4    4   16.0     16     64.0
      5    5   15.6     25     78.0
      6    7   19.8     42    118.8
   
      > transform(BOD, X = BOD[1] * seq(6))
        Time demand Time.1
      1    1    8.3      1
      2    2   10.3      4
      3    3   19.0      9
      4    4   16.0     16
      5    5   15.6     25
      6    7   19.8     42
   
    --
    Statistics & Software Consulting
    GKX Group, GKX Associates Inc.
    tel: 1-877-GKX-GROUP
    email: ggrothendieck at gmail.com
   
    ______________________________________________
    [hidden email] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
   

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: oddity in transform

Gabor Grothendieck
The idea is that one wants to write the line of code below
 in a general way which works the same
whether you specify ix as one column or multiple columns but the naming entirely
changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or
other hard coding solutions still require writing multiple cases.

ix <- 1:2
transform(BOD, X = BOD[ix] * seq(6))



On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <[hidden email]> wrote:

> I think you meant to call BOD[,1]
> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!"
> And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
>
> If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column.
> And it uses the tag-name if simply supplied with a vector:
> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1
> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector)
> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X
>
> So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems.
> You can always change the column-names later.
>
> Best regards,
> Emil Bode
>
> Data-analyst
>
> +31 6 43 83 89 33
> [hidden email]
>
> DANS: Netherlands Institute for Permanent Access to Digital Research Resources
> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] <mailto:[hidden email]> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
> DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.
>
> On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Note the inconsistency in the names in these two examples.  X.Time in
>     the first case and Time.1 in the second case.
>
>       > transform(BOD, X = BOD[1:2] * seq(6))
>         Time demand X.Time X.demand
>       1    1    8.3      1      8.3
>       2    2   10.3      4     20.6
>       3    3   19.0      9     57.0
>       4    4   16.0     16     64.0
>       5    5   15.6     25     78.0
>       6    7   19.8     42    118.8
>
>       > transform(BOD, X = BOD[1] * seq(6))
>         Time demand Time.1
>       1    1    8.3      1
>       2    2   10.3      4
>       3    3   19.0      9
>       4    4   16.0     16
>       5    5   15.6     25
>       6    7   19.8     42
>
>     --
>     Statistics & Software Consulting
>     GKX Group, GKX Associates Inc.
>     tel: 1-877-GKX-GROUP
>     email: ggrothendieck at gmail.com
>
>     ______________________________________________
>     [hidden email] mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: oddity in transform

Ista Zahn
I don't think it has much to do with transform in particular:

> BOD <- data.frame(Time = 1:6, demand = runif(6))
> BOD[["X"]] <- BOD[1:2] * seq(6); BOD
  Time    demand X.Time  X.demand
1    1 0.8649628      1 0.8649628
2    2 0.5895380      4 1.1790761
3    3 0.6854635      9 2.0563906
4    4 0.4255801     16 1.7023206
5    5 0.5738793     25 2.8693967
6    6 0.9996713     36 5.9980281
> BOD <- data.frame(Time = 1:6, demand = runif(6))
> BOD[["X"]] <- BOD[1] * seq(6); BOD
  Time     demand Time
1    1 0.72990231    1
2    2 0.61721422    4
3    3 0.02389160    9
4    4 0.28341746   16
5    5 0.06116124   25
6    6 0.67966577   36

--Ista


On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck
<[hidden email]> wrote:

> The idea is that one wants to write the line of code below
>  in a general way which works the same
> whether you specify ix as one column or multiple columns but the naming entirely
> changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or
> other hard coding solutions still require writing multiple cases.
>
> ix <- 1:2
> transform(BOD, X = BOD[ix] * seq(6))
>
>
>
> On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <[hidden email]> wrote:
>> I think you meant to call BOD[,1]
>> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!"
>> And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
>>
>> If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column.
>> And it uses the tag-name if simply supplied with a vector:
>> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1
>> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector)
>> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X
>>
>> So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems.
>> You can always change the column-names later.
>>
>> Best regards,
>> Emil Bode
>>
>> Data-analyst
>>
>> +31 6 43 83 89 33
>> [hidden email]
>>
>> DANS: Netherlands Institute for Permanent Access to Digital Research Resources
>> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] <mailto:[hidden email]> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
>> DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.
>>
>> On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote:
>>
>>     Note the inconsistency in the names in these two examples.  X.Time in
>>     the first case and Time.1 in the second case.
>>
>>       > transform(BOD, X = BOD[1:2] * seq(6))
>>         Time demand X.Time X.demand
>>       1    1    8.3      1      8.3
>>       2    2   10.3      4     20.6
>>       3    3   19.0      9     57.0
>>       4    4   16.0     16     64.0
>>       5    5   15.6     25     78.0
>>       6    7   19.8     42    118.8
>>
>>       > transform(BOD, X = BOD[1] * seq(6))
>>         Time demand Time.1
>>       1    1    8.3      1
>>       2    2   10.3      4
>>       3    3   19.0      9
>>       4    4   16.0     16
>>       5    5   15.6     25
>>       6    7   19.8     42
>>
>>     --
>>     Statistics & Software Consulting
>>     GKX Group, GKX Associates Inc.
>>     tel: 1-877-GKX-GROUP
>>     email: ggrothendieck at gmail.com
>>
>>     ______________________________________________
>>     [hidden email] mailing list
>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: oddity in transform

Ista Zahn
On Tue, Jul 24, 2018 at 11:41 AM, Ista Zahn <[hidden email]> wrote:

> I don't think it has much to do with transform in particular:
>
>> BOD <- data.frame(Time = 1:6, demand = runif(6))
>> BOD[["X"]] <- BOD[1:2] * seq(6); BOD
>   Time    demand X.Time  X.demand
> 1    1 0.8649628      1 0.8649628
> 2    2 0.5895380      4 1.1790761
> 3    3 0.6854635      9 2.0563906
> 4    4 0.4255801     16 1.7023206
> 5    5 0.5738793     25 2.8693967
> 6    6 0.9996713     36 5.9980281
>> BOD <- data.frame(Time = 1:6, demand = runif(6))
>> BOD[["X"]] <- BOD[1] * seq(6); BOD
>   Time     demand Time
> 1    1 0.72990231    1
> 2    2 0.61721422    4
> 3    3 0.02389160    9
> 4    4 0.28341746   16
> 5    5 0.06116124   25
> 6    6 0.67966577   36

Ugh, well, I see now that

BOD[["X"]] <- BOD[1:2] * seq(6); BOD

and

transform(BOD, X = BOD[1:2] * seq(6))

don't produce the same thing, despite printing in ways that look
similar. However,

data.frame(BOD, X = BOD[1:2] * seq(6))

and

data.frame(BOD, X = BOD[1] * seq(6))

do produce the same result as transform, so the point about this being
much more pervasive still holds.

--Ista



>
> --Ista
>
>
> On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck
> <[hidden email]> wrote:
>> The idea is that one wants to write the line of code below
>>  in a general way which works the same
>> whether you specify ix as one column or multiple columns but the naming entirely
>> changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or
>> other hard coding solutions still require writing multiple cases.
>>
>> ix <- 1:2
>> transform(BOD, X = BOD[ix] * seq(6))
>>
>>
>>
>> On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <[hidden email]> wrote:
>>> I think you meant to call BOD[,1]
>>> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!"
>>> And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
>>>
>>> If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column.
>>> And it uses the tag-name if simply supplied with a vector:
>>> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1
>>> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector)
>>> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X
>>>
>>> So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems.
>>> You can always change the column-names later.
>>>
>>> Best regards,
>>> Emil Bode
>>>
>>> Data-analyst
>>>
>>> +31 6 43 83 89 33
>>> [hidden email]
>>>
>>> DANS: Netherlands Institute for Permanent Access to Digital Research Resources
>>> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] <mailto:[hidden email]> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
>>> DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.
>>>
>>> On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <[hidden email] on behalf of [hidden email]> wrote:
>>>
>>>     Note the inconsistency in the names in these two examples.  X.Time in
>>>     the first case and Time.1 in the second case.
>>>
>>>       > transform(BOD, X = BOD[1:2] * seq(6))
>>>         Time demand X.Time X.demand
>>>       1    1    8.3      1      8.3
>>>       2    2   10.3      4     20.6
>>>       3    3   19.0      9     57.0
>>>       4    4   16.0     16     64.0
>>>       5    5   15.6     25     78.0
>>>       6    7   19.8     42    118.8
>>>
>>>       > transform(BOD, X = BOD[1] * seq(6))
>>>         Time demand Time.1
>>>       1    1    8.3      1
>>>       2    2   10.3      4
>>>       3    3   19.0      9
>>>       4    4   16.0     16
>>>       5    5   15.6     25
>>>       6    7   19.8     42
>>>
>>>     --
>>>     Statistics & Software Consulting
>>>     GKX Group, GKX Associates Inc.
>>>     tel: 1-877-GKX-GROUP
>>>     email: ggrothendieck at gmail.com
>>>
>>>     ______________________________________________
>>>     [hidden email] mailing list
>>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel