lag.formula function - RDocumentation (2024)

Description

Lags a variable using panel id + time identifiers in a formula.

Usage

# S3 method for formulalag( x, k = 1, data, time.step = NULL, fill = NA, duplicate.method = c("none", "first"), ...)

Arguments

A formula of the type var ~ id + time where var is the variable to be lagged, id is a variable representing the panel id, and time is the time variable of the panel.

An integer giving the number of lags. Default is 1. For leads, just use a negative number.

data

Optional, the data.frame in which to evaluate the formula. If not provided, variables will be fetched in the current environment.

time.step

The method to compute the lags, default is NULL (which means automatically set). Can be equal to: "unitary", "consecutive", "within.consecutive", or to a number. If "unitary", then the largest common divisor between consecutive time periods is used (typically if the time variable represents years, it will be 1). This method can apply only to integer (or convertible to integer) variables. If "consecutive", then the time variable can be of any type: two successive time periods represent a lag of 1. If "witihn.consecutive" then **within a given id**, two successive time periods represent a lag of 1. Finally, if the time variable is numeric, you can provide your own numeric time step.

fill

Scalar. How to fill the observations without defined lead/lag values. Default is NA.

Value

It returns a vector of the same type and length as the variable to be lagged in the formula.

Examples

Run this code

# NOT RUN {# simple example with an unbalanced panelbase = data.frame(id = rep(1:2, each = 4), time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)base$lag1 = lag(x~id+time, 1, base) # lag 1base$lead1 = lag(x~id+time, -1, base) # lead 1base$lag2_fill0 = lag(x~id+time, 2, base, fill = 0)# with time.step = "consecutive"base$lag1_consecutive = lag(x~id+time, 1, base, time.step = "consecutive")# => works for indiv. 2 because 9 (resp. 6) is consecutive to 6 (resp. 4)base$lag1_within.consecutive = lag(x~id+time, 1, base, time.step = "within")# => now two consecutive years within each indiv is one lagprint(base)# Argument time.step = "consecutive" is# mostly useful when the time variable is not a number:# e.g. c("1991q1", "1991q2", "1991q3") etc# with duplicatesbase_dup = data.frame(id = rep(1:2, each = 4), time = c(1, 1, 1, 2, 1, 2, 2, 3), x = 1:8)# Error because of duplicate values for (id, time)try(lag(x~id+time, 1, base_dup))# Error is bypassed, lag corresponds to first occurence of (id, time)lag(x~id+time, 1, base_dup, duplicate.method = "first")# Playing with time stepsbase = data.frame(id = rep(1:2, each = 4), time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)# time step: 0.5 (here equivalent to lag of 1)lag(x~id+time, 2, base, time.step = 0.5)# Error: wrong time steptry(lag(x~id+time, 2, base, time.step = 7))# Adding NAs + unsorted IDsbase = data.frame(id = rep(1:2, each = 4), time = c(4, NA, 3, 1, 2, NA, 1, 3), x = 1:8)base$lag1 = lag(x~id+time, 1, base)base$lag1_within = lag(x~id+time, 1, base, time.step = "w")base_bis = base[order(base$id, base$time),]print(base_bis)# You can create variables without specifying the data within data.table:if(require("data.table")){ base = data.table(id = rep(1:2, each = 3), year = 1990 + rep(1:3, 2), x = 1:6) base[, x.l1 := lag(x~id+year, 1)]}# }

Run the code above in your browser using DataCamp Workspace