Add a new row based on test set predicted values and time stamps

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Add a new row based on test set predicted values and time stamps

R help mailing list-2
Hi all,

I have the prediction for my test set which are forecasted Value for "4/1/2020" for each match of "id" and "Group". I would like to add a fourth row to each group by (Group,id) in my train set and the values for this row should come from test set :

my train set: 

     structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020", "1/1/2020", 
     "2/1/2020", "3/1/2020", "1/1/2020", "2/1/2020", "3/1/2020", ""
     ), Value = c(3.5, 2.7, 4, 2.5, 3.7, 0, 3, 0, 1, NA), Group = c("A", 
    "A", "A", "B", "B", "B", "C", "C", "C", ""), id = c(1L, 1L, 1L, 
    101L, 101L, 101L, 100L, 100L, 100L, NA)), class = "data.frame", row.names = c(NA, 
    -10L))

test set:

    structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
      ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""), 
    id = c(1L, 101L, 100L, NA), value = c(0.2, 0.7, 0.9, NA)), class = "data.frame", row.names = c(NA, 
     -4L))structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
    ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""), 
    id = c(1L, 101L, 100L, NA)), class = "data.frame", row.names = c(NA, 
    -4L))

desired output: 

    structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", 
    "1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", "1/1/2020", "2/1/2020", 
    "3/1/2020", "4/1/2020"), Value = c(3.5, 2.7, 4, 0.2, 2.5, 3.7, 
     0, 0.7, 3, 0, 1, 0.9), Group = c("A", "A", "A", "A", "B", "B", 
    "B", "B", "C", "C", "C", "C"), id = c(1L, 1L, 1L, 1L, 101L, 101L, 
    101L, 101L, 100L, 100L, 100L, 100L)), class = "data.frame", row.names = c(NA, 
   -12L))

Data is dummy and I have milions of records in original data set. 

Thanks for any help.
Elahe 
 

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Add a new row based on test set predicted values and time stamps

Bert Gunter-2
It may not be necessary to insert the rows in that order -- R can identify
and use the information from the rows in in most cases without it.
So to combine the results as you described (the code you sent got garbled a
bit btw -- you should proofread more carefully in future), all you would
need to do is:

## with train and test your train and test data frames of course
out <- na.omit(rbind(train, cbind(test[,c(1,3,4)], Value = test[,"value"])))
## Note that the cbind() stuff is needed to create the correct "Value"
column for rbind(). See ?rbind for details

If you insist that you need the row ordering as you specified, then follow
this by:

out <- with(out, out[order(Group, id, as.POSIXct(Date,format = "%D%")), ])

What this does is to first convert your text data column to POSIXct ("See
?DateTimeClasses for details) which gives them the desired calendar
ordering. The order() function (see ?order for details) then gives the
permutation ordering them from early to late within groups and id's, which
are then used as the row subscripts to reorder the rows in the data frame.

DO NOTE: For this to work reliably, your Date column must be consistent and
correct in its formatting!

Other note: It probably makes more sense to convert your Date column to a
POSIXct or POSIXlt dates from the beginning, as this will make things like
plotting in date order straightforward. There are also date-time packages
(in the "tidyverse" suite, I think,  as well as others) that simplify such
things. I am pretty ignorant about date-time stuff, so I can't really be
more specific. https://cran.r-project.org/web/views/TimeSeries.html  will
have lots of info on this if you need it. As well as searching, of course.

HTH

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 13, 2021 at 3:26 AM Elahe chalabi via R-help <
[hidden email]> wrote:

> Hi all,
>
> I have the prediction for my test set which are forecasted Value for
> "4/1/2020" for each match of "id" and "Group". I would like to add a fourth
> row to each group by (Group,id) in my train set and the values for this row
> should come from test set :
>
> my train set:
>
>      structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
> "1/1/2020",
>      "2/1/2020", "3/1/2020", "1/1/2020", "2/1/2020", "3/1/2020", ""
>      ), Value = c(3.5, 2.7, 4, 2.5, 3.7, 0, 3, 0, 1, NA), Group = c("A",
>     "A", "A", "B", "B", "B", "C", "C", "C", ""), id = c(1L, 1L, 1L,
>     101L, 101L, 101L, 100L, 100L, 100L, NA)), class = "data.frame",
> row.names = c(NA,
>     -10L))
>
> test set:
>
>     structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>       ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>     id = c(1L, 101L, 100L, NA), value = c(0.2, 0.7, 0.9, NA)), class =
> "data.frame", row.names = c(NA,
>      -4L))structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>     ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>     id = c(1L, 101L, 100L, NA)), class = "data.frame", row.names = c(NA,
>     -4L))
>
> desired output:
>
>     structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
> "4/1/2020",
>     "1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", "1/1/2020",
> "2/1/2020",
>     "3/1/2020", "4/1/2020"), Value = c(3.5, 2.7, 4, 0.2, 2.5, 3.7,
>      0, 0.7, 3, 0, 1, 0.9), Group = c("A", "A", "A", "A", "B", "B",
>     "B", "B", "C", "C", "C", "C"), id = c(1L, 1L, 1L, 1L, 101L, 101L,
>     101L, 101L, 100L, 100L, 100L, 100L)), class = "data.frame", row.names
> = c(NA,
>    -12L))
>
> Data is dummy and I have milions of records in original data set.
>
> Thanks for any help.
> Elahe
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Add a new row based on test set predicted values and time stamps

Bert Gunter-2
(Revealing my ignorance):

Simpler still than the as.POSIXct() idiom is just to use the as.Date
version:

out <- with(out, out [order(Group, id, as.Date(Date)),])

## all else the same...

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 13, 2021 at 10:47 AM Bert Gunter <[hidden email]> wrote:

> It may not be necessary to insert the rows in that order -- R can identify
> and use the information from the rows in in most cases without it.
> So to combine the results as you described (the code you sent got garbled
> a bit btw -- you should proofread more carefully in future), all you would
> need to do is:
>
> ## with train and test your train and test data frames of course
> out <- na.omit(rbind(train, cbind(test[,c(1,3,4)], Value =
> test[,"value"])))
> ## Note that the cbind() stuff is needed to create the correct "Value"
> column for rbind(). See ?rbind for details
>
> If you insist that you need the row ordering as you specified, then follow
> this by:
>
> out <- with(out, out[order(Group, id, as.POSIXct(Date,format = "%D%")), ])
>
> What this does is to first convert your text data column to POSIXct ("See
> ?DateTimeClasses for details) which gives them the desired calendar
> ordering. The order() function (see ?order for details) then gives the
> permutation ordering them from early to late within groups and id's, which
> are then used as the row subscripts to reorder the rows in the data frame.
>
> DO NOTE: For this to work reliably, your Date column must be consistent
> and correct in its formatting!
>
> Other note: It probably makes more sense to convert your Date column to a
> POSIXct or POSIXlt dates from the beginning, as this will make things like
> plotting in date order straightforward. There are also date-time packages
> (in the "tidyverse" suite, I think,  as well as others) that simplify such
> things. I am pretty ignorant about date-time stuff, so I can't really be
> more specific. https://cran.r-project.org/web/views/TimeSeries.html  will
> have lots of info on this if you need it. As well as searching, of course.
>
> HTH
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 13, 2021 at 3:26 AM Elahe chalabi via R-help <
> [hidden email]> wrote:
>
>> Hi all,
>>
>> I have the prediction for my test set which are forecasted Value for
>> "4/1/2020" for each match of "id" and "Group". I would like to add a fourth
>> row to each group by (Group,id) in my train set and the values for this row
>> should come from test set :
>>
>> my train set:
>>
>>      structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>> "1/1/2020",
>>      "2/1/2020", "3/1/2020", "1/1/2020", "2/1/2020", "3/1/2020", ""
>>      ), Value = c(3.5, 2.7, 4, 2.5, 3.7, 0, 3, 0, 1, NA), Group = c("A",
>>     "A", "A", "B", "B", "B", "C", "C", "C", ""), id = c(1L, 1L, 1L,
>>     101L, 101L, 101L, 100L, 100L, 100L, NA)), class = "data.frame",
>> row.names = c(NA,
>>     -10L))
>>
>> test set:
>>
>>     structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>>       ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>     id = c(1L, 101L, 100L, NA), value = c(0.2, 0.7, 0.9, NA)), class =
>> "data.frame", row.names = c(NA,
>>      -4L))structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>>     ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>     id = c(1L, 101L, 100L, NA)), class = "data.frame", row.names = c(NA,
>>     -4L))
>>
>> desired output:
>>
>>     structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>> "4/1/2020",
>>     "1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", "1/1/2020",
>> "2/1/2020",
>>     "3/1/2020", "4/1/2020"), Value = c(3.5, 2.7, 4, 0.2, 2.5, 3.7,
>>      0, 0.7, 3, 0, 1, 0.9), Group = c("A", "A", "A", "A", "B", "B",
>>     "B", "B", "C", "C", "C", "C"), id = c(1L, 1L, 1L, 1L, 101L, 101L,
>>     101L, 101L, 100L, 100L, 100L, 100L)), class = "data.frame", row.names
>> = c(NA,
>>    -12L))
>>
>> Data is dummy and I have milions of records in original data set.
>>
>> Thanks for any help.
>> Elahe
>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Add a new row based on test set predicted values and time stamps

Jeff Newmiller
The date you get using as.Date on a POSIXct value depends on the timezone. That is, as.Date only pays attention to the underlying UTC seconds-since-epoch value, so it ignores the timezone which can be unexpected for most people.

TL;DR as.Date is not the same as as.POSIXct( trunc( dtm, units="days" ) ) unless you are using GMT.

On April 13, 2021 10:55:04 AM PDT, Bert Gunter <[hidden email]> wrote:

>(Revealing my ignorance):
>
>Simpler still than the as.POSIXct() idiom is just to use the as.Date
>version:
>
>out <- with(out, out [order(Group, id, as.Date(Date)),])
>
>## all else the same...
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and
>sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Tue, Apr 13, 2021 at 10:47 AM Bert Gunter <[hidden email]>
>wrote:
>
>> It may not be necessary to insert the rows in that order -- R can
>identify
>> and use the information from the rows in in most cases without it.
>> So to combine the results as you described (the code you sent got
>garbled
>> a bit btw -- you should proofread more carefully in future), all you
>would
>> need to do is:
>>
>> ## with train and test your train and test data frames of course
>> out <- na.omit(rbind(train, cbind(test[,c(1,3,4)], Value =
>> test[,"value"])))
>> ## Note that the cbind() stuff is needed to create the correct
>"Value"
>> column for rbind(). See ?rbind for details
>>
>> If you insist that you need the row ordering as you specified, then
>follow
>> this by:
>>
>> out <- with(out, out[order(Group, id, as.POSIXct(Date,format =
>"%D%")), ])
>>
>> What this does is to first convert your text data column to POSIXct
>("See
>> ?DateTimeClasses for details) which gives them the desired calendar
>> ordering. The order() function (see ?order for details) then gives
>the
>> permutation ordering them from early to late within groups and id's,
>which
>> are then used as the row subscripts to reorder the rows in the data
>frame.
>>
>> DO NOTE: For this to work reliably, your Date column must be
>consistent
>> and correct in its formatting!
>>
>> Other note: It probably makes more sense to convert your Date column
>to a
>> POSIXct or POSIXlt dates from the beginning, as this will make things
>like
>> plotting in date order straightforward. There are also date-time
>packages
>> (in the "tidyverse" suite, I think,  as well as others) that simplify
>such
>> things. I am pretty ignorant about date-time stuff, so I can't really
>be
>> more specific. https://cran.r-project.org/web/views/TimeSeries.html 
>will
>> have lots of info on this if you need it. As well as searching, of
>course.
>>
>> HTH
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
>along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Apr 13, 2021 at 3:26 AM Elahe chalabi via R-help <
>> [hidden email]> wrote:
>>
>>> Hi all,
>>>
>>> I have the prediction for my test set which are forecasted Value for
>>> "4/1/2020" for each match of "id" and "Group". I would like to add a
>fourth
>>> row to each group by (Group,id) in my train set and the values for
>this row
>>> should come from test set :
>>>
>>> my train set:
>>>
>>>      structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>>> "1/1/2020",
>>>      "2/1/2020", "3/1/2020", "1/1/2020", "2/1/2020", "3/1/2020", ""
>>>      ), Value = c(3.5, 2.7, 4, 2.5, 3.7, 0, 3, 0, 1, NA), Group =
>c("A",
>>>     "A", "A", "B", "B", "B", "C", "C", "C", ""), id = c(1L, 1L, 1L,
>>>     101L, 101L, 101L, 100L, 100L, 100L, NA)), class = "data.frame",
>>> row.names = c(NA,
>>>     -10L))
>>>
>>> test set:
>>>
>>>     structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>>>       ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>>     id = c(1L, 101L, 100L, NA), value = c(0.2, 0.7, 0.9, NA)), class
>=
>>> "data.frame", row.names = c(NA,
>>>      -4L))structure(list(Date = c("4/1/2020", "4/1/2020",
>"4/1/2020", ""
>>>     ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>>     id = c(1L, 101L, 100L, NA)), class = "data.frame", row.names =
>c(NA,
>>>     -4L))
>>>
>>> desired output:
>>>
>>>     structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>>> "4/1/2020",
>>>     "1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", "1/1/2020",
>>> "2/1/2020",
>>>     "3/1/2020", "4/1/2020"), Value = c(3.5, 2.7, 4, 0.2, 2.5, 3.7,
>>>      0, 0.7, 3, 0, 1, 0.9), Group = c("A", "A", "A", "A", "B", "B",
>>>     "B", "B", "C", "C", "C", "C"), id = c(1L, 1L, 1L, 1L, 101L,
>101L,
>>>     101L, 101L, 100L, 100L, 100L, 100L)), class = "data.frame",
>row.names
>>> = c(NA,
>>>    -12L))
>>>
>>> Data is dummy and I have milions of records in original data set.
>>>
>>> Thanks for any help.
>>> Elahe
>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.