Get Specific Records from Another DataFrame

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Get Specific Records from Another DataFrame

M Can
Hello
I have been struggling with this simple looking problem. I have two
dataframes. The first one contains ID, date, and revenue information for
specific suppliers.

id lastdate depvar
A 5/10/2017 10
B 8/16/2017 20
C 2/14/2017 30
D 9/5/2017 40
E 8/1/2017 50
F 11/4/2017 60
G 6/22/2017 70
The second dataframe contains timeseries data of each supplier in different
columns. For example Column A are the transaction dates of supplier A and
A_indvar is an independent variable value for each date.

A_date A_indvar B_date B_indvar …
1/1/2017 49 2/5/2017 50
1/2/2017 42 2/6/2017 62
1/3/2017 10 2/7/2017 88
1/4/2017 37 2/8/2017 36
1/5/2017 84 2/9/2017 71
1/6/2017 47 2/10/2017 36
1/7/2017 91 2/11/2017 98
Now, I would like to run a regression equation using specific independent
variable values from the second dataset. For example, let say I want to
look at the independent variable values of each supplier 30 days before the
last date given in the first dataset. I want to locate those from the
second dataset and enter them into the first dataset. So that the first
dataset will look like:

id lastdate depvar indvar-30
A 5/10/2017 10 55
B 8/16/2017 20 62
C 2/14/2017 30 74
D 9/5/2017 40 45
E 8/1/2017 50 35
F 11/4/2017 60 56
G 6/22/2017 70 48
I am able to create an array that takes all last dates and then subtracts
30 from them. Then how can I go to the second table and grab the value of
the independent variables for those specific dates? Timeseries of the
suppliers have different start dates.

I feel like I need to use something similar to vlookup in Excel. But a
value in the first dataset (let's say Supplier A) becomes a variable name
in dataset two.

Any hint is appreciated.

Thank you.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Get Specific Records from Another DataFrame

Jeff Newmiller
I suggest that you read the vignette for the data table package. That package uses some odd syntax compared to base R but has some features designed especially for these kinds of problems.
--
Sent from my phone. Please excuse my brevity.

On March 24, 2018 5:03:44 PM PDT, M Can <[hidden email]> wrote:

>Hello
>I have been struggling with this simple looking problem. I have two
>dataframes. The first one contains ID, date, and revenue information
>for
>specific suppliers.
>
>id lastdate depvar
>A 5/10/2017 10
>B 8/16/2017 20
>C 2/14/2017 30
>D 9/5/2017 40
>E 8/1/2017 50
>F 11/4/2017 60
>G 6/22/2017 70
>The second dataframe contains timeseries data of each supplier in
>different
>columns. For example Column A are the transaction dates of supplier A
>and
>A_indvar is an independent variable value for each date.
>
>A_date A_indvar B_date B_indvar …
>1/1/2017 49 2/5/2017 50
>1/2/2017 42 2/6/2017 62
>1/3/2017 10 2/7/2017 88
>1/4/2017 37 2/8/2017 36
>1/5/2017 84 2/9/2017 71
>1/6/2017 47 2/10/2017 36
>1/7/2017 91 2/11/2017 98
>Now, I would like to run a regression equation using specific
>independent
>variable values from the second dataset. For example, let say I want to
>look at the independent variable values of each supplier 30 days before
>the
>last date given in the first dataset. I want to locate those from the
>second dataset and enter them into the first dataset. So that the first
>dataset will look like:
>
>id lastdate depvar indvar-30
>A 5/10/2017 10 55
>B 8/16/2017 20 62
>C 2/14/2017 30 74
>D 9/5/2017 40 45
>E 8/1/2017 50 35
>F 11/4/2017 60 56
>G 6/22/2017 70 48
>I am able to create an array that takes all last dates and then
>subtracts
>30 from them. Then how can I go to the second table and grab the value
>of
>the independent variables for those specific dates? Timeseries of the
>suppliers have different start dates.
>
>I feel like I need to use something similar to vlookup in Excel. But a
>value in the first dataset (let's say Supplier A) becomes a variable
>name
>in dataset two.
>
>Any hint is appreciated.
>
>Thank you.
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.