B-DiDi a Dynamic Bayesian approach to DiD

Marketing Science Conference 2024

Daniel Winkler

WU-Vienna

Pascal Güntürkün

WU-Vienna

Peter Knaus

Harvard University

June 27, 2024

Overview

3 Parts

Motivation: “Why opt-out defaults save fewer lives than we think”
Dynamic Bayesian Approach to DiD
1. Proposed model
2. Simulation results
Application to policy changes

Motivation: Why opt-out defaults save fewer lives than we think

with Pascal Güntürkün (WU Vienna), Sinika Studte (HSBA Hamburg School of Business Administration), Michel Clement (University of Hamburg), Eva-Maria Merz (VU Amsterdam; Sanquin Research), Elisabeth Huis in ’t Veld (Tilburg University), Jonathan Tan (Nanyang Technical University), Eamonn Ferguson (The University of Nottingham; Cambridge University)

Exploiting changes in policy nudges

Promote socially desirable behavior through subtle changes in the choice environment
While freedom of choice remains in tact
Popular with policy makers and researchers
Most popular: Opt-out default
- Consent to socially desireable behavior presumed
- Individuals can freely opt-out
- Examples: organ donation, retirement savings, green energy usage
We studied opt-in vs. opt-out defaults for deceased organ donation

What effect do you expect?

Can default nudges backfire?

Consensus based on meta-analyses: opt-out default policies have positive effects on targeted behavior
- e.g., Benartzi et al., 2017; Jachimowicz et al., 2019; Mertens et al., 2022; Steffel et al., 2019
How does substitutive behavior to reach the same goal change?
- Living vs. deceased organ donation,
- energy usage reduction vs. green energy usage,
- masks vs. vaccine

Switching to opt-out organ donation

Not enough organs available
Deceased donations increase
Living organ donations???

Evaluating the policy change: quasi-experimental setting

Difference-in-differences basics

Number of available organs observed
- for two countries
- over two periods
Country A switches to opt-out between period 1 and 2
Country B remains opt-in
The change (difference B) in country B is used as the counterfactual for the change (difference A) in country A
Estimator: difference in differences B and A

DiD basics

Key assumption: number of available organs would have developed in parallel without the switch

Method: Callaway and Sant’Anna (2021)

For countries adopting the switch at different times
Select units that are not (yet) treated as controls
Estimate group-time average treatment effect & aggregate

Research problem

Not suitable for inference on small treated groups (<5)
Frequentist methods not suitable to show null effect and parallel trends (Wasserstein, Schirm, and Lazar 2019)
- Bayesian shrinkage prior lets data decide if an effect is present
- Savage-Dickey density ratios / ROPE can be used to show null effect (Wagenmakers et al. 2010)
High variance

Solution: B-DiDi

For each group \(g\) estimate two models:

Pre-treatment model to gain confidence in parallel trends
Post-treatment model to estimate treatment effects

General setup

Use time-varying parameters
- Assumption: trend-violations/treatment effects evolve smoothly
- “A treatment effect today makes a similar effect likely tomorrow.”
Poentially correct treatment effect using the estimate of trend violations (in the pre-period) similar to Rambachan and Roth (2023)

Estimation

Gaussian RW State Space model (Cadonna, Frühwirth-Schnatter, and Knaus 2020) \[ \begin{aligned} \beta_{g,t} &= \beta_{g,t-1} + w_{g,t}, \quad w_{g,t} \sim N_4(\mathbf{0}, \mathbf{Q_{g}}) \\ y_{g,t} &= X_{g,t} \beta_{g,t} + \epsilon_{g,t}, \quad \epsilon_{g,t} \sim N_p(\mathbf{0}, Diag(\sigma^2_g)) \\ \mathbf{Q_g} &= Diag(\theta_{1,g}, \dots, \theta_{4,g}) \end{aligned} \]
Non-centered paramterization (Frühwirth-Schnatter and Wagner 2010)

\[ \begin{aligned} \tilde{\beta}_{g,t} &= \tilde{\beta}_{g,t-1} + \tilde{w}_{g,t}, \quad \tilde{w}_{g,t} \sim N_4(\mathbf{0}, \mathbf{I})\\ y_{g,t} &= X_{g,t} \beta_g + X_{g,t} Diag(\sqrt{\theta_{g, 1}}, \dots, \sqrt{\theta_{g,4}}) \tilde \beta_{g,t} + \epsilon_{g,t}, \quad \epsilon_{g,t} \sim N_p(0, Diag(\sigma^2_g)) \end{aligned} \]

Diffuse Normal, inverse gamma prior for pre-treatment model

Post-treatment model

Tripple Gamma (Cadonna, Frühwirth-Schnatter, and Knaus 2020) priors for \(\sqrt{\theta_{g,j}}\) and \(\beta_{g,j}\)¹

\[ \sqrt{\theta}_j\left|\xi_j^2 \sim N\left(0, \xi_j^2\right),\\ \xi_j^2\right| a^{\xi}, \kappa_j^2 \sim G\left(a^{\xi}, \frac{a^{\xi} \kappa_j^2}{2}\right),\\ \kappa_j^2 \mid c^{\xi}, \kappa_B^2 \sim G \left(c^{\xi}, \frac{c^{\xi}}{\kappa_B^2}\right) \]

\[ \beta_j\left|\phi_j^2 \sim N\left(0, \phi_j^2\right), \\ \phi_j^2\right| a^\phi, \lambda_j^2 \sim G\left(a^\phi, \frac{a^\phi \lambda_j^2}{2}\right), \\ \lambda_j^2 \mid c^\phi, \lambda_B^2 \sim G\left(c^\phi, \frac{c^\phi}{\lambda_B^2}\right) \]

Post-treatment model

Global shrinkage (\(\lambda_B^2,\ \kappa_B^2\)) also as in (Cadonna, Frühwirth-Schnatter, and Knaus 2020) s.t.

\(\frac{\lambda_B^2}{2} \sim F(2a^\lambda, 2c^\lambda)\) and
\(\frac{\kappa_B^2}{2} \sim F(2a^\kappa, 2c^\kappa)\)

Does it work?

DGP: No trend violation, no treatment effect

DGP: No trend violation, constant effect

DGP: No trend violation, increasing effect

DGP: Trend violation, no effect

Correcting trend violations

Corrected by implied violation based on posterior-median of pre-treatment model (similar to Rambachan and Roth 2023)

Three periods of effects

Declining effect with bottom-out

Application to organ donation policy

Wales: a success story

Wales: pre-trend corrected

Wales: Aggregate effect

Slovakia: negative effect on living donations

Slovakia: null effect of policy change

Summary & Outlook

B-DiDi will be published as a Julia package
Parameters are learned across time to use the data efficiently
Performs very well when there are few treated units (even just 1!)
Triple-Gamma is a flexible shrinkage prior for determining whether an effect is non-zero
- Has properties of Bayesian model averaging (accounts for model uncertainty)
- Nests many popular shrinkage priors (e.g., Lasso, Horseshoe)
Having the Bayesian posterior for the treatment effect can provide evidence for null-effect
- Savage-Dickery ratio, Region Of Practical Equivalence

Thank you for your attention

Any questions?

Appendix

Triple-Gamma marginal prior

Treatment effect

\(Y_{i,t}\) … outcome of unit \(i\) at time \(t\)

Potential outcomes (Rubin 2005)

\(Y_{i,t}(0)\) … \(Y_{i,t}\) given \(i\) is not treated at \(t\)
\(Y_{i,t}(1)\) … \(Y_{i,t}\) give \(i\) is treated at \(t\)
Observed: \(Y_{i,t} = \mathbb{1}(treated_{i,t}) Y_{i,t}(1) + \left[1 - \mathbb{1}(treated_{i,t})\right] Y_{i,t}(0)\)
Individual treatment effect: \(\tau_{i,t} = Y_{i,t}(1) - Y_{i,t}(0)\)

Treatment effect

\(Y_{i,t}\) … outcome of unit \(i\) at time \(t\)

Potential outcomes (Rubin 2005)

\(Y_{i,t}(0)\) … \(Y_{i,t}\) given \(i\) is not treated at \(t\)
\(Y_{i,t}(1)\) … \(Y_{i,t}\) give \(i\) is treated at \(t\)
Observed: \(Y_{i,t} = \mathbb{1}(treated_{i,t}) Y_{i,t}(1) + \left[1 - \mathbb{1}(treated_{i,t})\right] Y_{i,t}(0)\)
Individual treatment effect: \(\tau_{i,t} = Y_{i,t}(1) - Y_{i,t}(0)\)

Group	t = 1	t = 2
\(g = 2\)	\(Y_{i,1}(0)\)	\(Y_{i, 2}(1)\)
\(g = \infty\)	\(Y_{j,1}(0)\)	\(Y_{j, 2}(0)\)

Average Treatment Effect on the Treated (\(\tau_{g=2}\))

\(\bar Y_{g=k, t}\)… average outcome of group \(k\) at time \(t\)
\(\delta_{g=\cdot}\)… trend of the outcome for \(g = \cdot\)
\(\delta_{g=\cdot} = \delta\) for \(g = 2\) and \(g = \infty\) under parallel trends

\[ \begin{aligned} \bar Y_{g=2, 2} - \bar Y_{g=2, 1} &= \delta_{g=2} + \tau_{g=2} \\ \bar Y_{g=2, 2} - \bar Y_{g=2, 1} &= \bar Y_{g=\infty, 2} - \bar Y_{g=\infty, 1} + \tau_{g=2} \\ \left[\bar Y_{g=2, 2} - \bar Y_{g=2, 1}\right] - \left[\bar Y_{g=\infty, 2} - \bar Y_{g=\infty, 1}\right] &= \tau_{g=2} \end{aligned} \]

References

Cadonna, Annalisa, Sylvia Frühwirth-Schnatter, and Peter Knaus. 2020. “Triple the Gamma—a Unifying Shrinkage Prior for Variance and Variable Selection in Sparse State Space and TVP Models.” Econometrics 8 (2): 20. https://doi.org/10.3390/econometrics8020020.

Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics, Themed issue: Treatment effect 1, 225 (2): 200–230. https://doi.org/10.1016/j.jeconom.2020.12.001.

Frühwirth-Schnatter, Sylvia, and Helga Wagner. 2010. “Stochastic Model Specification Search for Gaussian and Partial Non-Gaussian State Space Models.” Journal of Econometrics 154 (1): 85–100. https://doi.org/10.1016/j.jeconom.2009.07.003.

Knaus, Peter, and Sylvia Frühwirth-Schnatter. 2023. “The Dynamic Triple Gamma Prior as a Shrinkage Process Prior for Time-Varying Parameter Models.” arXiv. https://doi.org/10.48550/arXiv.2312.10487.

Rambachan, Ashesh, and Jonathan Roth. 2023. “A More Credible Approach to Parallel Trends.” Review of Economic Studies 90 (5): 2555–91. https://doi.org/10.1093/restud/rdad018.

Rubin, Donald B. 2005. “Causal Inference Using Potential Outcomes: Design, Modeling, Decisions.” Journal of the American Statistical Association 100 (469): 322–31. https://www.jstor.org/stable/27590541.

Wagenmakers, Eric-Jan, Tom Lodewyckx, Himanshu Kuriyal, and Raoul Grasman. 2010. “Bayesian Hypothesis Testing for Psychologists: A Tutorial on the Savage–Dickey Method.” Cognitive Psychology 60 (3): 158–89. https://doi.org/10.1016/j.cogpsych.2009.12.001.

Wasserstein, Ronald L., Allen L. Schirm, and Nicole A. Lazar. 2019. “Moving to a World Beyond ‘p < 0.05’.” The American Statistician 73 (March): 1–19. https://doi.org/10.1080/00031305.2019.1583913.