Preliminaries

Welcome to Data-based Storytelling!

 

Who are we?

Nils Wlömert

  • Professor of Marketing, Vienna University of Economics and Business, Institute for Retailing & Data Science, since 2021
  • Assistant Professor of Marketing, Vienna University of Economics and Business, Institute for Interactive Marketing & Social Media, 2015-2020
  • Ph.D. in Marketing, University of Hamburg, 2010-2014
  • Dissertation topic: “Information Technology and Online Content Distribution” (German Marketing Association Best Dissertation Award 2015; International Journal of Research in Marketing Best Paper Award)
  • Universal Music Group: Business Analyst & Digital Marketing Manager, 2008-2010
  • Research interests: multi-channel distribution, online content distribution, marketing modeling, e-commerce policies, information goods, …

Daniel Winkler

  • Teaching & Research Associate, Vienna University of Economics and Business, Institute for Retailing & Data Science, since 2021 (prev. IMSM 2019-2021)
  • Ph.D. in Mathematics for Economics and Business, 2019-now
  • Masters in Economics w/ major in Mathematics, Vienna University of Economics and Business, 2016-2019
  • Research interestes: Online content distribution, influencer marketing, platform power, applied Bayesian modeling, …

Goals

Gain the ability to create & communicate valuable insight from data

  1. Develop a Data Science Toolbox
    • Knowledge on data analysis and createation
    • R programming skills to help implementation
  1. Gain confidence in our analysis
    • Learn (currently) common techniques
    • Cultivate a mindset for developing skills
    • Learn about common pitfalls
  1. Work hard and have a good time
    • Be open to question everything
    • Study to understand not to repeat
    • Master marketable skills for your career

[…] associate “winning” with the effort process itself. That’s the holy grail of dopamine management for success. It won’t make you dull or unhappy; it will make everything easier and more pleasurable […].

Andrew Huberman

Lost in data Translation

Lost in data Translation

Industry

Hire as many data scientists as you can find you’ll still be lost without translators to connect analytics with real business value. […] By 2025 Chief Data Officers and their teams function as a business unit with profit-and-loss responsibilities. The unit, in partnership with business teams, is responsible for ideating new ways to use data, developing a holistic enterprise data strategy (and embedding it as part of a business strategy), and incubating new sources of revenue by monetizing data services and data sharing.

McKinsey and Company

Academia

The empirics-first approach is not antagonistic to theory but rather can serve as a stepping-stone to theory. The approach lends itself well to today’s data-rich environment, which can reveal novel research questions untethered to theory. […] we argue that [empirics first] has a natural arc that bends more easily back to real-world implications.

Golder et al. (2022)

Shifting the paradigm

 

How old is this person?

Shifting the paradigm

Thinking in school

g world worldthinking thinkingthinking->world about

Shifting the paradigm

Thinking in school

g world worldthinking thinkingthinking->world aboutcreate createthinking->create presentation presentationcreate->presentation

Shifting the paradigm

Thinking in school

g world worldthinking thinkingthinking->world aboutcreate createthinking->create presentation presentationcreate->presentation audience audienceaudience->presentation

Shifting the paradigm

Thinking in school

g world worldthinking thinkingthinking->world aboutcreate createthinking->create presentation presentationcreate->presentation presentation->create audience audienceaudience->presentation

Shifting the paradigm

Thinking in school

g world worldthinking thinkingthinking->world aboutcreate createthinking->create presentation presentationcreate->presentation presentation->create audience audienceaudience->presentation

Thinking in life

g world worldpresentation presentationpresentation->world create and think create and thinkcreate and think->world aboutcreate and think->presentation audience audienceaudience->presentation

Shifting the paradigm

Thinking in school

g world worldthinking thinkingthinking->world aboutcreate createthinking->create presentation presentationcreate->presentation presentation->create audience audienceaudience->presentation

Thinking in life

g world worldpresentation presentationpresentation->world VALUEcreate and think create and thinkcreate and think->world aboutcreate and think->presentation audience audienceaudience->presentation

“Who does what better now?”

Creating Value with Data

Creating Value with Data

Golder et al. (2022)

Creating Value with Data

  1. Identify Opportunity
  • Theory is in short supply
  • Observations do not align with theory
  • Literature is equivocal
  • Intuition leads to multiple plausible and conflicting outcomes
  • Newly emergent data allows scantly-/un-examined relationships to be probed
  • Find consequential DVs and actionable IVs
  1. Explore Terrain
  • Start with open ended research question
  • Scope re-definition based on empirical findings
  • Risk: “scope creep”
  • Listen to the data
  • Generate robust and meaningful findings
  1. Advance Understanding
  • Uncover empirical regularities (repeatable over circumstances)
  • Concern with effect sizes
  • Empirical findings can question existing theory
  • Empirical findings can initiate theory development
  • Take the “third mission” seriously
    • managers, consumers
    • policy makers, educators
    • general public

If you torture the data long enough, it will confess to anything

Ronald Coase

Listen to the Data

  • Reliability and credibility of sources
  • Visualization
    • Establish “face-validity” of statistical findings
    • e.g., Berman and Israeli (2022) study the adoption of retail analytics dashboards and “find an increase of 4%–10% in average weekly revenues postadoption”

Listen to the Data

  • Ensure robustness
  • Consider:
    • different moderators, mediators, confounders, models
    • different methods
    • different subsets of data

e.g., Simpson’s Paradox

OLS 1
(Intercept) 5.565***
x -0.308***
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Listen to the Data

  • Ensure robustness
  • Consider:
    • different moderators, mediators, confounders, models
    • different methods
    • different subsets of data

e.g., Simpson’s Paradox

OLS 2
(Intercept) 4.315***
x 0.334***
g B -4.403***
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Failed robustness checks should be viewed as learning opportunities that lead to an even broader exploration that considers why a finding obtains in one context but not another

Golder et al. (2022)

Listen to the Data

  • Incorporate prior knowledge
    • Using (informal) expert knowledge is desireable
    • Provide initial ideas to explore
    • Develop expectations about (causal) relationships
    • Be suspicious of contrary findings

Traditionally theory-agnostic predictive analytics tools are likely to have larger impact and lesser bias if they are able to smartly combine theoretical insights […] with large troves of data.

Bradlow et al. (2017)

Listen to the Data

  • Explore causality

Aside: Necessary and Sufficient Conditions

Observation: The floor is wet

Necessary Condition

Without such a condition the observation cannot happen.

e.g., Water was “applied” to the floor.

  • Being signed up in LPIS to get a grade
  • Being at least 35 years old to become Austrian president
  • For whole numbers \(>2\): Being odd to be prime

Sufficient Condition

If such a condition is met the observation happens.

e.g., It is raining right now.

  • You arriving before me \(\rightarrow\) not miss any of the lecture
  • Receiving \(>50\%\) of votes in a presidential election \(\rightarrow\) become Austrian president
  • A number being divisible by \(4\) \(\rightarrow\) be even

Can you think of counter examples for each?

e.g., Being at least 1.5m tall is not a necessary condition to become Austrian president

What happens if you “flip” conditions?

e.g., Getting a grade is … to know you are signed up in LPIS

What happens if you negate conditions?

e.g., Not being signed up on LPIS is … to know you will not receive a grade

Which of the examples are necessary and sufficient?

Can you come up with more?

Advance Understanding

G reg Empirical RegularitiesGeneralizable Generalizablereg->Generalizable effsize Effect sizesGeneralizable->effsize meta Prelude to Meta-analysiseffsize->meta

Advance Understanding

G concept Conceptual and theoretical InsightsNovel Novelconcept->Novel Persuasive PersuasiveNovel->Persuasive Clear ClearPersuasive->Clear reg Empirical RegularitiesGeneralizable Generalizablereg->Generalizable effsize Effect sizesGeneralizable->effsize meta Prelude to Meta-analysiseffsize->meta

Advance Understanding

G stkh Advise Stakeholdersb2c Managers / Consumersstkh->b2c pol Policy Makersb2c->pol Teaching Teachingpol->Teaching concept Conceptual and theoretical InsightsNovel Novelconcept->Novel Persuasive PersuasiveNovel->Persuasive Clear ClearPersuasive->Clear reg Empirical RegularitiesGeneralizable Generalizablereg->Generalizable effsize Effect sizesGeneralizable->effsize meta Prelude to Meta-analysiseffsize->meta

Advance Understanding

Understanding World Worldstkh Advise Stakeholdersstkh->World VALUEb2c Managers / Consumersstkh->b2c pol Policy Makersb2c->pol Teaching Teachingpol->Teaching concept Conceptual and theoretical Insightsconcept->stkh Novel Novelconcept->Novel Persuasive PersuasiveNovel->Persuasive Clear ClearPersuasive->Clear reg Empirical Regularitiesreg->concept Generalizable Generalizablereg->Generalizable effsize Effect sizesGeneralizable->effsize meta Prelude to Meta-analysiseffsize->meta

Advise Stakeholders

Three-act structure

  1. Setup - Why should I pay attention?
    • Introduce protagonist
    • Present the problem
    • Call to action
  1. Confrontation - Why should I endorse?
    • Prev. attempts to resolve the problem
      show imbalance & get credibility
    • Make clear why your solution is needed and the audience should drive action
  1. Resolution - What can I do?
    • Present your solution
    • Call to action

Think different

Think different

Setup

  • Introduce “the crazy ones”: Einstein, Dylan, MLK, Earhart…
  • “See things differently”, “not fond of rules”, “no respect for the status quo”

Confrontation

  • “You can quote them, disagree with them, glorify, or vilify them”
  • You should buy-in because: “the only thing you can’t do is ignore them”
  • “While some may see them as the crazy ones…”

Resolution

  • “We see genius.”
  • “Because the people who are crazy enough to think they can change the world, are the ones who do.”

Exercise

Prepare a 1-Minute elevator pitch for your thesis (or some other project)
15 min.

Uncover Empirical Regularities

A data analysis can [consist of] importing, cleaning, transforming, and modeling data with a goal to build a machine learning algorithm to decide which product a company should sell.

McGowan, Peng, and Hicks (2022)

6 Principles

  1. Data Matching
    Are the variables of interests directly available?
  2. Exhaustive
    Are multiple, complementary methods used?
  3. Skeptical
    Are related questions and alternative explanations explored?
  4. Second Order
    Does the analysis provide important context and supporting information?
  5. Clarity
    Are key pieces of evidence clearly summarized and visualized?
  6. Reproducible
    Can another researcher take the code/data and reproduce the results?

A Data Analysis

library(palmerpenguins)
data("penguins")
str(penguins)
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
library(ggplot2)
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
    geom_point()

# Correlation of flipper length and body mass
flen_bmas_corr <- round(cor(penguins$flipper_length_mm, penguins$body_mass_g, use="pairwise.complete.obs"), digits = 2)
flen_bmas_corr
[1] 0.87

The correlation between flipper length and body mass in penguins is 0.87.

Have fun!

References

Links

Five Fifty: Lost in translation

LEADERSHIP LAB: The Craft of Writing Effectively

The age of analytics: Competing in a data-driven world

The data-driven enterprise of 2025

Think Different Logo

Preattentive attributes

Academic References

Berman, Ron, and Ayelet Israeli. 2022. “The Value of Descriptive Analytics: Evidence from Online Retailers.” Marketing Science 41 (6): 1074–96. https://doi.org/10.1287/mksc.2022.1352.
Bradlow, Eric T., Manish Gangwar, Praveen Kopalle, and Sudhir Voleti. 2017. “The Role of Big Data and Predictive Analytics in Retailing.” Journal of Retailing, The Future of Retailing, 93 (1): 79–95. https://doi.org/10.1016/j.jretai.2016.12.004.
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2020. “A Crash Course in Good and Bad Controls.” SSRN 3689437.
Cunningham, Scott. 2021. Causal Inference - the Mixtape. Yale University Press. https://mixtape.scunning.com/index.html.
Golder, Peter N., Marnik G. Dekimpe, Jake T. An, Harald J. van Heerde, Darren S. U. Kim, and Joseph W. Alba. 2022. “Learning from Data: An Empirics-First Approach to Relevant Knowledge Generation.” Journal of Marketing, September, 00222429221129200. https://doi.org/10.1177/00222429221129200.
Goldfarb, Avi, Catherine Tucker, and Yanwen Wang. 2022. “Conducting Research in Marketing with Quasi-Experiments.” Journal of Marketing 86 (3): 1–20. https://doi.org/10.1177/00222429221082977.
McGowan, Lucy D’Agostino, Roger D. Peng, and Stephanie C. Hicks. 2022. “Design Principles for Data Analysis.” Journal of Computational and Graphical Statistics 0 (0): 1–8. https://doi.org/10.1080/10618600.2022.2104290.
Meyer, Breed D. 1995. “Natural and Quasi-Experiments in Economics.” Journal of Business & Economic Statistics 13 (2): 151–61.