Data Table Merge Help

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Table Merge Help

Graeve, Nick
Hello

I'm not sure if this is an appropriate use of this mailing list or not, please let me know if it isn't.  I'm struggling to figure out how to merge two data tables based on max effective date logic compared to when a payment occurred.  My dtDistributions DT is a transactional dataset while dtDepartments is a domain data set containing all department names and the effective date of when department name changes have occurred.  For the Bob example below, there was a payment on 2016-01-01 which occurred in H229000.  In 2012, this department was named "Modified Name", in 2019 the department will be named "Final Name".  When I merge these two tables, I'd like it to pull the transactional data and match it up to department name "Modified Name" since that was the active department name at the time of that transaction.  I've read documentation on foverlaps, but I'm not sure if this problem is considered a range of dates or not.  At the bottom of this post is a temporarily solution that is working but it runs for a long time due to the amount of data in my actual source.

Here is some sample data to get started:
library(data.table)
dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
                          Department = factor(c("H229000", "H135000", "H047800")),
                          Amount = c(5, 34, 87),
                          PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))

dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")),
                        EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
                        Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))

Here is the output I would like to see:
PayeeName  Department     PaymentDT   Amount
Bob        Modified Name  2016-01-01  5
Tracy      Payables       2015-01-01  34
Tom        Postal         2015-01-01  87

I was able to get this working by using the sqldf library, but it runs for a very long time in my actual dataset and I'd like to use data.table if at all possible.
library(sqldf)
joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
            FROM dtDistributions A, dtDepartments B
            WHERE A.DEPARTMENT = B.Department
            AND B.EffDT = (SELECT MAX(ED.EffDT)
                            FROM dtDepartments ED
                            WHERE B.Department = ED.Department
                            AND ED.EffDT <= A.PaymentDT)"

finalDT <- data.table(sqldf(joinString))



-----Message Disclaimer-----

This e-mail message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply email to [hidden email] and delete or destroy all copies of the original message and attachments thereto. Email sent to or from the Principal Financial Group or any of its member companies may be retained as required by law or regulation.

Nothing in this message is intended to constitute an Electronic signature for purposes of the Uniform Electronic Transactions Act (UETA) or the Electronic Signatures in Global and National Commerce Act ("E-Sign") unless a specific statement to the contrary is included in this message.

If you no longer wish to receive any further solicitation from the Principal Financial Group you may unsubscribe at https://www.principal.com/do-not-contact-form any time.

If you are a Canadian resident and no longer wish to receive commercial electronic messages you may unsubscribe at https://www.principal.com/do-not-email-request-canadian-residents any time.





This message was secured by Zix(R).

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data Table Merge Help

Bert Gunter-2
Did you search first? (This is suggested by the posting guide -- below
-- prior to posting).

"merge 2 data.tables in R" brought up what looked like useful stuff,
in particular the  merge() function for data tables. If this does not
do what you want, it may help to explain why not.

Alternatively, there is a merge.data.frame function that may do the
job if you first convert your data.table to a data.frame.

As I do not use the data.table package, you or others may have to fill
in details to make these work -- if they *can* work.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Feb 1, 2018 at 9:45 AM, Graeve, Nick <[hidden email]> wrote:
> Hello
>
> I'm not sure if this is an appropriate use of this mailing list or not, please let me know if it isn't.  I'm struggling to figure out how to merge two data tables based on max effective date logic compared to when a payment occurred.  My dtDistributions DT is a transactional dataset while dtDepartments is a domain data set containing all department names and the effective date of when department name changes have occurred.  For the Bob example below, there was a payment on 2016-01-01 which occurred in H229000.  In 2012, this department was named "Modified Name", in 2019 the department will be named "Final Name".  When I merge these two tables, I'd like it to pull the transactional data and match it up to department name "Modified Name" since that was the active department name at the time of that transaction.  I've read documentation on foverlaps, but I'm not sure if this problem is considered a range of dates or not.  At the bottom of this post is a temporarily solution that is wo
 rking but it runs for a long time due to the amount of data in my actual source.

>
> Here is some sample data to get started:
> library(data.table)
> dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
>                           Department = factor(c("H229000", "H135000", "H047800")),
>                           Amount = c(5, 34, 87),
>                           PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))
>
> dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")),
>                         EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
>                         Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))
>
> Here is the output I would like to see:
> PayeeName  Department     PaymentDT   Amount
> Bob        Modified Name  2016-01-01  5
> Tracy      Payables       2015-01-01  34
> Tom        Postal         2015-01-01  87
>
> I was able to get this working by using the sqldf library, but it runs for a very long time in my actual dataset and I'd like to use data.table if at all possible.
> library(sqldf)
> joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
>             FROM dtDistributions A, dtDepartments B
>             WHERE A.DEPARTMENT = B.Department
>             AND B.EffDT = (SELECT MAX(ED.EffDT)
>                             FROM dtDepartments ED
>                             WHERE B.Department = ED.Department
>                             AND ED.EffDT <= A.PaymentDT)"
>
> finalDT <- data.table(sqldf(joinString))
>
>
>
> -----Message Disclaimer-----
>
> This e-mail message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply email to [hidden email] and delete or destroy all copies of the original message and attachments thereto. Email sent to or from the Principal Financial Group or any of its member companies may be retained as required by law or regulation.
>
> Nothing in this message is intended to constitute an Electronic signature for purposes of the Uniform Electronic Transactions Act (UETA) or the Electronic Signatures in Global and National Commerce Act ("E-Sign") unless a specific statement to the contrary is included in this message.
>
> If you no longer wish to receive any further solicitation from the Principal Financial Group you may unsubscribe at https://www.principal.com/do-not-contact-form any time.
>
> If you are a Canadian resident and no longer wish to receive commercial electronic messages you may unsubscribe at https://www.principal.com/do-not-email-request-canadian-residents any time.
>
>
>
>
>
> This message was secured by Zix(R).
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data Table Merge Help

Jeff Newmiller
In reply to this post by Graeve, Nick
I rarely use data.table, but I think the vignette for the package discusses rolling joins. Also,  Google popped up [1].

[1] https://www.r-bloggers.com/understanding-data-table-rolling-joins/
--
Sent from my phone. Please excuse my brevity.

On February 1, 2018 9:45:53 AM PST, "Graeve, Nick" <[hidden email]> wrote:

>Hello
>
>I'm not sure if this is an appropriate use of this mailing list or not,
>please let me know if it isn't.  I'm struggling to figure out how to
>merge two data tables based on max effective date logic compared to
>when a payment occurred.  My dtDistributions DT is a transactional
>dataset while dtDepartments is a domain data set containing all
>department names and the effective date of when department name changes
>have occurred.  For the Bob example below, there was a payment on
>2016-01-01 which occurred in H229000.  In 2012, this department was
>named "Modified Name", in 2019 the department will be named "Final
>Name".  When I merge these two tables, I'd like it to pull the
>transactional data and match it up to department name "Modified Name"
>since that was the active department name at the time of that
>transaction.  I've read documentation on foverlaps, but I'm not sure if
>this problem is considered a range of dates or not.  At the bottom of
>this post is a temporarily solution that is working but it runs for a
>long time due to the amount of data in my actual source.
>
>Here is some sample data to get started:
>library(data.table)
>dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
>                          Department = factor(c("H229000", "H135000",
>"H047800")),
>                          Amount = c(5, 34, 87),
>                          PaymentDT = as.Date(c("2016-01-01",
>"2015-01-01", "2015-01-01")))
>
>dtDepartments <- data.table(Department = factor(c("H229000", "H229000",
>"H229000", "H135000", "H047800")),
>                        EffDT = as.Date(c("2019-01-01", "2012-01-01",
>"1901-01-01", "1901-01-01", "1901-01-01")),
>                        Descr = c("Final Name","Modified
>Name","Original Name","Payables","Postal"))
>
>Here is the output I would like to see:
>PayeeName  Department     PaymentDT   Amount
>Bob        Modified Name  2016-01-01  5
>Tracy      Payables       2015-01-01  34
>Tom        Postal         2015-01-01  87
>
>I was able to get this working by using the sqldf library, but it runs
>for a very long time in my actual dataset and I'd like to use
>data.table if at all possible.
>library(sqldf)
>joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
>            FROM dtDistributions A, dtDepartments B
>            WHERE A.DEPARTMENT = B.Department
>            AND B.EffDT = (SELECT MAX(ED.EffDT)
>                            FROM dtDepartments ED
>                            WHERE B.Department = ED.Department
>                            AND ED.EffDT <= A.PaymentDT)"
>
>finalDT <- data.table(sqldf(joinString))
>
>
>
>-----Message Disclaimer-----
>
>This e-mail message is intended only for the use of the individual or
>entity to which it is addressed, and may contain information that is
>privileged, confidential and exempt from disclosure under applicable
>law. If you are not the intended recipient, any dissemination,
>distribution or copying of this communication is strictly prohibited.
>If you have received this communication in error, please notify us
>immediately by reply email to [hidden email] and delete or
>destroy all copies of the original message and attachments thereto.
>Email sent to or from the Principal Financial Group or any of its
>member companies may be retained as required by law or regulation.
>
>Nothing in this message is intended to constitute an Electronic
>signature for purposes of the Uniform Electronic Transactions Act
>(UETA) or the Electronic Signatures in Global and National Commerce Act
>("E-Sign") unless a specific statement to the contrary is included in
>this message.
>
>If you no longer wish to receive any further solicitation from the
>Principal Financial Group you may unsubscribe at
>https://www.principal.com/do-not-contact-form any time.
>
>If you are a Canadian resident and no longer wish to receive commercial
>electronic messages you may unsubscribe at
>https://www.principal.com/do-not-email-request-canadian-residents any
>time.
>
>
>
>
>
>This message was secured by Zix(R).
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.