Creating a conditional lag variable in R

classic Classic list List threaded Threaded
7 messages Options
F86
Reply | Threaded
Open this post in threaded view
|

Creating a conditional lag variable in R

F86
Dear R-users,

I’ve a rather complicated task to do and need all the help I can get.

I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.

More preciesly, I want to create the following variables:
(i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
(ii) a variable that is =1 5 years pre the agreement and
(iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).

Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).

To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.

The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).

Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.

All suggestions are much appreciated!



data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
    year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
    1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
    1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
    1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
    1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
    1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
    1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
    X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
    1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
    1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
    X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
    4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-60L))



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Creating a conditional lag variable in R

Bert Gunter-2
Because you posted in HTML, your example got mangled and resulted in an
error. Re-post in *plain text* please (making sure that you cut and paste
correctly)

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 26, 2019 at 12:25 PM Faradj Koliev <[hidden email]> wrote:

> Dear R-users,
>
> I’ve a rather complicated task to do and need all the help I can get.
>
> I have data indicating whether a country has signed an agreement or not
> (1=yes and 0=otherwise). I want to simply create variable that would
> capture the years before the agreement is signed. The aim is to see whether
> pre or post agreement period has any impact on my dependent variables.
>
> More preciesly, I want to create the following variables:
> (i) a variable that is =1 in the 4 years pre/before the agreement, 0
> otherwise;
> (ii) a variable that is =1 5 years pre the agreement and
> (iii) a variable that would count the 4 and 5 years pre the agreement
> (1,2,3,4..).
>
> Please see the sample data below. I have manually added the variables I
> would like to generate in R, labelled as “X1_pre4” ( 4 years before the
> agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5),
> and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1
> and X2 is the agreement that countries have either signed (1) or not (0).
> Note though that I want the variable to capture all the years up to 4 and
> 5. If it’s only 2 years, it should still be ==1 (please see the example
> below).
>
> To illustrate the logic: the country A has signed the agreement X1 in 1972
> in the sample data,  then, the (i) and (ii) variables as above should be =1
> for the years 1970, 1971, and =0 from 1972 until the end of the study
> period.
>
> The country A has signed the agreement X2 in 1975,  then, the (i) variable
> should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for
> the  1970-1974  period (post 5 years before the agreement is signed).
>
> Later, I would also like to create post_4 and post_5 variables, but I
> think I’ll be able to figure it out once I know how to generate the
> pre/before variables.
>
> All suggestions are much appreciated!
>
>
>
> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>     year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>     1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>     1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>     1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>     1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>     1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>     1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>     X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>     1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>     1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>     1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>     X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>     4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
> -60L))
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Creating a conditional lag variable in R

Jim Lemon-4
In reply to this post by F86
Hi Faradj,
There is a problem with your structure statement in that the hyphen
(-) following the left angle bracket (<) has been transformed into a
fancy hyphen somewhere in the process. I replaced it with an ordinary
hyphen and it worked okay. Also, your coding for "B" seems to include
the first year (1970) of "C". I have taken the liberty of correcting
this.
What you are trying to do looks a bit over-complicated to me. The
variables X1 and X2 are very redundant. Perhaps you only need to know
when the agreement was signed (1972 for country A) and when the
agreement was proposed (1970?). Thus for each country, you could
generate a delay for signing agreement X1 (A=2, B=8, C=14) and X2
(A=5, B=1, C=13). If you want to categorize the delay, I would suggest
ensuring that the category breaks are meaningful [1].
However, to answer your question, I would create X1_pre4 and
X1_pre4_count as simply the inverse of X1, then use "cumsum" to create
the year counting variable. Then change every instance of
X1_pre4_count greater than 4 to zero and also the corresponding values
of X1_pre4.

data$X1_pre4<-ifelse(data$X1,0,1)
data$X1_pre4_count[data$country=="A"]<-
 cumsum(data$X1_pre4_count[data$country=="A"])
data$X1_pre4_count[data$country=="B"]<-
 cumsum(data$X1_pre4_count[data$country=="B"])
data$X1_pre4_count[data$country=="c"]<-
 cumsum(data$X1_pre4_count[data$country=="C"])
knockout<-which(data$X1_pre4_count == 0 | data$X1_pre4_count > 4)
data$X1_pre4_count[knockout]<-0
data$X1_pre4[knockout]<-0

Same for "X2" and "pre5".

Jim

[1] Lemon, J. (2009). On the perils of categorizing responses.
Tutorials in Quantitative Methods for Psychology, 5(1), 35-39.

On Sat, Jul 27, 2019 at 5:25 AM Faradj Koliev <[hidden email]> wrote:

>
> Dear R-users,
>
> I’ve a rather complicated task to do and need all the help I can get.
>
> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>
> More preciesly, I want to create the following variables:
> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
> (ii) a variable that is =1 5 years pre the agreement and
> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>
> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>
> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>
> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>
> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>
> All suggestions are much appreciated!
>
>
>
> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>     year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>     1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>     1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>     1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>     1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>     1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>     1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>     X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>     1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>     1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>     1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>     X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>     4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
> -60L))
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
F86
Reply | Threaded
Open this post in threaded view
|

Re: Creating a conditional lag variable in R

F86
In reply to this post by Bert Gunter-2
Re-post, now in *plain text*.



Dear R-users,

I’ve a rather complicated task to do and need all the help I can get.

I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.

More preciesly, I want to create the following variables:
(i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
(ii) a variable that is =1 5 years pre the agreement and
(iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).

Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).

To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.

The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).

Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.

All suggestions are much appreciated!



data<-structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
    year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
    1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
    1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
    1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
    1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
    1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
    1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
    X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
    1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
    1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
    X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
    4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-60L))

> On 26 Jul 2019, at 21:58, Bert Gunter <[hidden email]> wrote:
>
> Because you posted in HTML, your example got mangled and resulted in an error. Re-post in *plain text* please (making sure that you cut and paste correctly)
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Jul 26, 2019 at 12:25 PM Faradj Koliev <[hidden email]> wrote:
> Dear R-users,
>
> I’ve a rather complicated task to do and need all the help I can get.
>
> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>
> More preciesly, I want to create the following variables:
> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
> (ii) a variable that is =1 5 years pre the agreement and
> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>
> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>
> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>
> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>
> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>
> All suggestions are much appreciated!
>
>
>
> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>     year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>     1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>     1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>     1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>     1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>     1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>     1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>     X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>     1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>     1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>     1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>     X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>     4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>     0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
> -60L))
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Creating a conditional lag variable in R

Peter Dalgaard-2
Some pointers (not tested, may contain blunders...)

(a) you likely need some sort of split-operate-unsplit construct, by country. E.g.,

myfun <- function(d) {....operate on data frame with only one country....}
ll <- split(data, data$country)
ll.new <- lapply(ll, myfun)
data.new <- unsplit(ll.new, data$country)

(There might be a tidyverse idiom for this too)

(b) your X1_pre5count looks like it is the same as cumsum(1-X1)*X1 (within country)

(c) if you count in the opposite direction, tt <- rev(cumsum(rev(1-X1))) you get number of years until agreement. Then X1_pre4 should be as.integer(tt <=4  & tt > 0)

-pd

> On 27 Jul 2019, at 09:13 , Faradj Koliev <[hidden email]> wrote:
>
> Re-post, now in *plain text*.
>
>
>
> Dear R-users,
>
> I’ve a rather complicated task to do and need all the help I can get.
>
> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>
> More preciesly, I want to create the following variables:
> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
> (ii) a variable that is =1 5 years pre the agreement and
> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>
> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>
> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>
> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>
> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>
> All suggestions are much appreciated!
>
>
>
> data<-structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>    year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>    1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>    1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>    1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>    1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>    1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>    1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>    X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>    1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>    1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>    1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>    1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>    1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>    X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>    4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
> -60L))
>
>> On 26 Jul 2019, at 21:58, Bert Gunter <[hidden email]> wrote:
>>
>> Because you posted in HTML, your example got mangled and resulted in an error. Re-post in *plain text* please (making sure that you cut and paste correctly)
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Fri, Jul 26, 2019 at 12:25 PM Faradj Koliev <[hidden email]> wrote:
>> Dear R-users,
>>
>> I’ve a rather complicated task to do and need all the help I can get.
>>
>> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>>
>> More preciesly, I want to create the following variables:
>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
>> (ii) a variable that is =1 5 years pre the agreement and
>> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>>
>> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>>
>> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>>
>> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>>
>> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>>
>> All suggestions are much appreciated!
>>
>>
>>
>> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>>    year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>>    1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>>    1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>>    1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>>    1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>>    1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>>    1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>>    X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>    1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>    1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>>    1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>>    1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>    1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>>    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>    X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>>    4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
>> -60L))
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
F86
Reply | Threaded
Open this post in threaded view
|

Re: Creating a conditional lag variable in R

F86
Peter Dalgaard,

Thanks for this.

I’ll try to think of ways to apply this logic. At the moment, I’m trying to do this with “mutate” using dplyr package. But it’s not easy..

> On 27 Jul 2019, at 10:33, peter dalgaard <[hidden email]> wrote:
>
> Some pointers (not tested, may contain blunders...)
>
> (a) you likely need some sort of split-operate-unsplit construct, by country. E.g.,
>
> myfun <- function(d) {....operate on data frame with only one country....}
> ll <- split(data, data$country)
> ll.new <- lapply(ll, myfun)
> data.new <- unsplit(ll.new, data$country)
>
> (There might be a tidyverse idiom for this too)
>
> (b) your X1_pre5count looks like it is the same as cumsum(1-X1)*X1 (within country)
>
> (c) if you count in the opposite direction, tt <- rev(cumsum(rev(1-X1))) you get number of years until agreement. Then X1_pre4 should be as.integer(tt <=4  & tt > 0)
>
> -pd
>
>> On 27 Jul 2019, at 09:13 , Faradj Koliev <[hidden email]> wrote:
>>
>> Re-post, now in *plain text*.
>>
>>
>>
>> Dear R-users,
>>
>> I’ve a rather complicated task to do and need all the help I can get.
>>
>> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>>
>> More preciesly, I want to create the following variables:
>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
>> (ii) a variable that is =1 5 years pre the agreement and
>> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>>
>> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>>
>> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>>
>> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>>
>> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>>
>> All suggestions are much appreciated!
>>
>>
>>
>> data<-structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>>   year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>>   1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>>   1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>>   1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>>   1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>>   1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>>   1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>>   X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>   1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>   1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>>   1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>>   1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>   1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>>   1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>   X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>>   4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>>   0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
>> -60L))
>>
>>> On 26 Jul 2019, at 21:58, Bert Gunter <[hidden email]> wrote:
>>>
>>> Because you posted in HTML, your example got mangled and resulted in an error. Re-post in *plain text* please (making sure that you cut and paste correctly)
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Fri, Jul 26, 2019 at 12:25 PM Faradj Koliev <[hidden email]> wrote:
>>> Dear R-users,
>>>
>>> I’ve a rather complicated task to do and need all the help I can get.
>>>
>>> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>>>
>>> More preciesly, I want to create the following variables:
>>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
>>> (ii) a variable that is =1 5 years pre the agreement and
>>> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>>>
>>> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>>>
>>> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>>>
>>> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>>>
>>> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>>>
>>> All suggestions are much appreciated!
>>>
>>>
>>>
>>> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>>>   year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>>>   1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>>>   1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>>>   1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>>>   1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>>>   1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>>>   1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>>>   X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>   1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>   1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>>>   1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>>>   1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>>   1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>>>   1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>>   X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>>>   4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>>>   0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
>>> -60L))
>>>
>>>
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
F86
Reply | Threaded
Open this post in threaded view
|

Re: Creating a conditional lag variable in R

F86
Thank you all. I now have the right solution for this (perhaps of interest to some):

check_pre <- function(idx, k) { pre_vec <- sapply(1:length(idx), function(x) +any(idx[x:(pmin(x + k, length(idx)))] %in% 1)); pre_vec[idx == 1] <- 0; return(pre_vec) }

df %>%
  group_by(country) %>%
  mutate(
    idx = +( (lag(X1) == 0 & X1 == 1) | row_number() == 1 & X1 == 1),
    X1_pre4 = check_pre(idx, 4),
    X1_pre5 = check_pre(idx, 5),
    idx = NULL
  )


> On 27 Jul 2019, at 10:45, Faradj Koliev <[hidden email]> wrote:
>
> Peter Dalgaard,
>
> Thanks for this.
>
> I’ll try to think of ways to apply this logic. At the moment, I’m trying to do this with “mutate” using dplyr package. But it’s not easy..
>
>> On 27 Jul 2019, at 10:33, peter dalgaard <[hidden email]> wrote:
>>
>> Some pointers (not tested, may contain blunders...)
>>
>> (a) you likely need some sort of split-operate-unsplit construct, by country. E.g.,
>>
>> myfun <- function(d) {....operate on data frame with only one country....}
>> ll <- split(data, data$country)
>> ll.new <- lapply(ll, myfun)
>> data.new <- unsplit(ll.new, data$country)
>>
>> (There might be a tidyverse idiom for this too)
>>
>> (b) your X1_pre5count looks like it is the same as cumsum(1-X1)*X1 (within country)
>>
>> (c) if you count in the opposite direction, tt <- rev(cumsum(rev(1-X1))) you get number of years until agreement. Then X1_pre4 should be as.integer(tt <=4  & tt > 0)
>>
>> -pd
>>
>>> On 27 Jul 2019, at 09:13 , Faradj Koliev <[hidden email]> wrote:
>>>
>>> Re-post, now in *plain text*.
>>>
>>>
>>>
>>> Dear R-users,
>>>
>>> I’ve a rather complicated task to do and need all the help I can get.
>>>
>>> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>>>
>>> More preciesly, I want to create the following variables:
>>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
>>> (ii) a variable that is =1 5 years pre the agreement and
>>> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>>>
>>> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>>>
>>> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>>>
>>> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>>>
>>> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>>>
>>> All suggestions are much appreciated!
>>>
>>>
>>>
>>> data<-structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>>>  year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>>>  1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>>>  1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>>>  1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>>>  1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>>>  1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>>>  1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>>>  X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>  1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>  1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>>>  1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>>>  1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>>  1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>>>  1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>>  X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>>>  4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>>>  0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
>>> -60L))
>>>
>>>> On 26 Jul 2019, at 21:58, Bert Gunter <[hidden email]> wrote:
>>>>
>>>> Because you posted in HTML, your example got mangled and resulted in an error. Re-post in *plain text* please (making sure that you cut and paste correctly)
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Fri, Jul 26, 2019 at 12:25 PM Faradj Koliev <[hidden email]> wrote:
>>>> Dear R-users,
>>>>
>>>> I’ve a rather complicated task to do and need all the help I can get.
>>>>
>>>> I have data indicating whether a country has signed an agreement or not (1=yes and 0=otherwise). I want to simply create variable that would capture the years before the agreement is signed. The aim is to see whether pre or post agreement period has any impact on my dependent variables.
>>>>
>>>> More preciesly, I want to create the following variables:
>>>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 otherwise;
>>>> (ii) a variable that is =1 5 years pre the agreement and
>>>> (iii) a variable that would count the 4 and 5 years pre the agreement (1,2,3,4..).
>>>>
>>>> Please see the sample data below. I have manually added the variables I would like to generate in R, labelled as “X1_pre4” ( 4 years before the agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 and X2 is the agreement that countries have either signed (1) or not (0). Note though that I want the variable to capture all the years up to 4 and 5. If it’s only 2 years, it should still be ==1 (please see the example below).
>>>>
>>>> To illustrate the logic: the country A has signed the agreement X1 in 1972 in the sample data,  then, the (i) and (ii) variables as above should be =1 for the years 1970, 1971, and =0 from 1972 until the end of the study period.
>>>>
>>>> The country A has signed the agreement X2 in 1975,  then, the (i) variable should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the  1970-1974  period (post 5 years before the agreement is signed).
>>>>
>>>> Later, I would also like to create post_4 and post_5 variables, but I think I’ll be able to figure it out once I know how to generate the pre/before variables.
>>>>
>>>> All suggestions are much appreciated!
>>>>
>>>>
>>>>
>>>> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
>>>>  year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L,
>>>>  1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
>>>>  1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L,
>>>>  1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L,
>>>>  1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L,
>>>>  1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L,
>>>>  1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L),
>>>>  X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>>  1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
>>>>  1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L,
>>>>  1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>>>  1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L,
>>>>  1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>>>  X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L,
>>>>  4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L,
>>>>  0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
>>>> -60L))
>>>>
>>>>
>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: [hidden email]  Priv: [hidden email]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.