Introduction

Data-based Storytelling

Daniel Winkler

Institute for Retailing & Data Science

Nils Wlömert

Preliminaries

Welcome to Data-based Storytelling!

 

Lectures

Date Time Room Topics Slides Recommended Readings
01-08-2024 1:00pm - 6:00pm LC.-1.038 Introduction Introduction Lost in Data Translation, R for Data Science
01-10-2024 1:00pm - 6:00pm LC.-1.038 Modelling and Visualization Theory Modelling
Visualization
Causal Pitchfork Visualization
A Crash Course in Good and Bad Controls
The Psychology behind Data Visualization Techniques
01-15-2024 1:00pm - 6:00pm LC.2.064 Empirical Model Building I + Coaching
01-17-2024 1:00pm - 6:00pm LC.-1.038 Empirical Model Building II + Coaching
01-24-2024 1:00pm - 5:00pm LC.2.064 Project Coaching
01-31-2024 1:00pm - 3:00pm Online Take home exam FAQs

Who are we?

Nils Wlömert

  • Professor of Marketing, Vienna University of Economics and Business, Institute for Retailing & Data Science, since 2021
  • Assistant Professor of Marketing, Vienna University of Economics and Business, Institute for Interactive Marketing & Social Media, 2015-2020
  • Ph.D. in Marketing, University of Hamburg, 2010-2014
  • Dissertation topic: “Information Technology and Online Content Distribution” (German Marketing Association Best Dissertation Award 2015; International Journal of Research in Marketing Best Paper Award)
  • Universal Music Group: Business Analyst & Digital Marketing Manager, 2008-2010
  • Research interests: multi-channel distribution, online content distribution, marketing modeling, e-commerce policies, information goods, …

Daniel Winkler

  • Teaching & Research Associate, Vienna University of Economics and Business, Institute for Retailing & Data Science, since 2021 (prev. IMSM 2019-2021)
  • Ph.D. in Mathematics for Economics and Business, 2019-now
  • Masters in Economics w/ major in Mathematics, Vienna University of Economics and Business, 2016-2019
  • Research interestes: Online content distribution, influencer marketing, platform power, applied Bayesian modeling, …

Goals

Gain the ability to create & communicate valuable insight from data

  1. Develop a Data Science Toolbox
    • Knowledge on data analysis and createation
    • R programming skills to help implementation
  1. Gain confidence in our analysis
    • Learn (currently) common techniques
    • Cultivate a mindset for developing skills
    • Learn about common pitfalls
  1. Work hard and have a good time
    • Be open to question everything
    • Study to understand not to repeat
    • Master marketable skills for your career

[…] associate “winning” with the effort process itself. That’s the holy grail of dopamine management for success. It won’t make you dull or unhappy; it will make everything easier and more pleasurable […].

Andrew Huberman

Lost in data Translation

Lost in data Translation

Industry

Hire as many data scientists as you can find you’ll still be lost without translators to connect analytics with real business value. […] By 2025 Chief Data Officers and their teams function as a business unit with profit-and-loss responsibilities. The unit, in partnership with business teams, is responsible for ideating new ways to use data, developing a holistic enterprise data strategy (and embedding it as part of a business strategy), and incubating new sources of revenue by monetizing data services and data sharing.

McKinsey and Company

Academia

The empirics-first approach is not antagonistic to theory but rather can serve as a stepping-stone to theory. The approach lends itself well to today’s data-rich environment, which can reveal novel research questions untethered to theory. […] we argue that [empirics first] has a natural arc that bends more easily back to real-world implications.

Golder et al. (2022)

Shifting the paradigm

 

How old is this person?

Shifting the paradigm

Thinking in school

g world world thinking thinking thinking->world about

Shifting the paradigm

Thinking in school

g world world thinking thinking thinking->world about create create thinking->create presentation presentation create->presentation

Shifting the paradigm

Thinking in school

g world world thinking thinking thinking->world about create create thinking->create presentation presentation create->presentation audience audience audience->presentation

Shifting the paradigm

Thinking in school

g world world thinking thinking thinking->world about create create thinking->create presentation presentation create->presentation presentation->create audience audience audience->presentation

Shifting the paradigm

Thinking in school

g world world thinking thinking thinking->world about create create thinking->create presentation presentation create->presentation presentation->create audience audience audience->presentation

Thinking in life

g world world presentation presentation presentation->world create and think create and think create and think->world about create and think->presentation audience audience audience->presentation

Shifting the paradigm

Thinking in school

g world world thinking thinking thinking->world about create create thinking->create presentation presentation create->presentation presentation->create audience audience audience->presentation

Thinking in life

g world world presentation presentation presentation->world VALUE create and think create and think create and think->world about create and think->presentation audience audience audience->presentation

“Who does what better now?”

Creating Value with Data

Creating Value with Data

Golder et al. (2022)

Creating Value with Data

  1. Identify Opportunity
  • Theory is in short supply
  • Observations do not align with theory
  • Literature is equivocal
  • Intuition leads to multiple plausible and conflicting outcomes
  • Newly emergent data allows scantly-/un-examined relationships to be probed
  • Find consequential DVs and actionable IVs
  1. Explore Terrain
  • Start with open ended research question
  • Scope re-definition based on empirical findings
  • Risk: “scope creep”
  • Listen to the data
  • Generate robust and meaningful findings
  1. Advance Understanding
  • Uncover empirical regularities (repeatable over circumstances)
  • Concern with effect sizes
  • Empirical findings can question existing theory
  • Empirical findings can initiate theory development
  • Take the “third mission” seriously
    • managers, consumers
    • policy makers, educators
    • general public

If you torture the data long enough, it will confess to anything

Ronald Coase

Listen to the Data

  • Reliability and credibility of sources
  • Visualization
    • Establish “face-validity” of statistical findings
    • e.g., Berman and Israeli (2022) study the adoption of retail analytics dashboards and “find an increase of 4%–10% in average weekly revenues postadoption”

Listen to the Data

  • Ensure robustness
  • Consider:
    • different moderators, mediators, confounders, models
    • different methods
    • different subsets of data

e.g., Simpson’s Paradox

OLS 1
(Intercept) 5.565***
x -0.308***
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Listen to the Data

  • Ensure robustness
  • Consider:
    • different moderators, mediators, confounders, models
    • different methods
    • different subsets of data

e.g., Simpson’s Paradox

OLS 2
(Intercept) 4.315***
x 0.334***
g B -4.403***
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Failed robustness checks should be viewed as learning opportunities that lead to an even broader exploration that considers why a finding obtains in one context but not another

Golder et al. (2022)

Listen to the Data

  • Incorporate prior knowledge
    • Using (informal) expert knowledge is desireable
    • Provide initial ideas to explore
    • Develop expectations about (causal) relationships
    • Be suspicious of contrary findings

Traditionally theory-agnostic predictive analytics tools are likely to have larger impact and lesser bias if they are able to smartly combine theoretical insights […] with large troves of data.

Bradlow et al. (2017)

Listen to the Data

  • Explore causality

Aside: Necessary and Sufficient Conditions

Observation: The floor is wet

Necessary Condition

Without such a condition the observation cannot happen.

e.g., Water was “applied” to the floor.

  • Being signed up in LPIS to get a grade
  • Being at least 35 years old to become Austrian president
  • For whole numbers \(>2\): Being odd to be prime

Sufficient Condition

If such a condition is met the observation happens.

e.g., It is raining right now.

  • You arriving before me \(\rightarrow\) not miss any of the lecture
  • Receiving \(>50\%\) of votes in a presidential election \(\rightarrow\) become Austrian president
  • A number being divisible by \(4\) \(\rightarrow\) be even

Can you think of counter examples for each?

e.g., Being at least 1.5m tall is not a necessary condition to become Austrian president

What happens if you “flip” conditions?

e.g., Getting a grade is … to know you are signed up in LPIS

What happens if you negate conditions?

e.g., Not being signed up on LPIS is … to know you will not receive a grade

Which of the examples are necessary and sufficient?

Can you come up with more?

Advance Understanding

G reg Empirical Regularities Generalizable Generalizable reg->Generalizable effsize Effect sizes Generalizable->effsize meta Prelude to Meta-analysis effsize->meta

Advance Understanding

G concept Conceptual and theoretical Insights Novel Novel concept->Novel Persuasive Persuasive Novel->Persuasive Clear Clear Persuasive->Clear reg Empirical Regularities Generalizable Generalizable reg->Generalizable effsize Effect sizes Generalizable->effsize meta Prelude to Meta-analysis effsize->meta

Advance Understanding

G stkh Advise Stakeholders b2c Managers / Consumers stkh->b2c pol Policy Makers b2c->pol Teaching Teaching pol->Teaching concept Conceptual and theoretical Insights Novel Novel concept->Novel Persuasive Persuasive Novel->Persuasive Clear Clear Persuasive->Clear reg Empirical Regularities Generalizable Generalizable reg->Generalizable effsize Effect sizes Generalizable->effsize meta Prelude to Meta-analysis effsize->meta

Advance Understanding

Understanding World World stkh Advise Stakeholders stkh->World VALUE b2c Managers / Consumers stkh->b2c pol Policy Makers b2c->pol Teaching Teaching pol->Teaching concept Conceptual and theoretical Insights concept->stkh Novel Novel concept->Novel Persuasive Persuasive Novel->Persuasive Clear Clear Persuasive->Clear reg Empirical Regularities reg->concept Generalizable Generalizable reg->Generalizable effsize Effect sizes Generalizable->effsize meta Prelude to Meta-analysis effsize->meta

Advise Stakeholders

Three-act structure

  1. Setup - Why should I pay attention?
    • Introduce protagonist
    • Present the problem
    • Call to action
  1. Confrontation - Why should I endorse?
    • Prev. attempts to resolve the problem
      show imbalance & get credibility
    • Make clear why your solution is needed and the audience should drive action
  1. Resolution - What can I do?
    • Present your solution
    • Call to action

Think different

Think different

Setup

  • Introduce “the crazy ones”: Einstein, Dylan, MLK, Earhart…
  • “See things differently”, “not fond of rules”, “no respect for the status quo”

Confrontation

  • “You can quote them, disagree with them, glorify, or vilify them”
  • You should buy-in because: “the only thing you can’t do is ignore them”
  • “While some may see them as the crazy ones…”

Resolution

  • “We see genius.”
  • “Because the people who are crazy enough to think they can change the world, are the ones who do.”

Exercise

Prepare a 1-Minute elevator pitch for your thesis (or some other project)
15 min.

Uncover Empirical Regularities

A data analysis can [consist of] importing, cleaning, transforming, and modeling data with a goal to build a machine learning algorithm to decide which product a company should sell.

McGowan, Peng, and Hicks (2022)

6 Principles

  1. Data Matching
    Are the variables of interests directly available?
  2. Exhaustive
    Are multiple, complementary methods used?
  3. Skeptical
    Are related questions and alternative explanations explored?
  4. Second Order
    Does the analysis provide important context and supporting information?
  5. Clarity
    Are key pieces of evidence clearly summarized and visualized?
  6. Reproducible
    Can another researcher take the code/data and reproduce the results?

A Data Analysis

library(palmerpenguins)
data("penguins")
str(penguins)
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
library(ggplot2)
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
    geom_point()

# Correlation of flipper length and body mass
flen_bmas_corr <- round(cor(penguins$flipper_length_mm, penguins$body_mass_g, use="pairwise.complete.obs"), digits = 2)
flen_bmas_corr
[1] 0.87

The correlation between flipper length and body mass in penguins is 0.87.

Project Topics

  • 7 Groups (3-4 people)

Topics

Project Timeline

Component Date
Preliminary story outline Jan. 21
Last coaching Jan. 24
Website submission Feb. 18

Preliminary story outline

  • Data description
  • Explore the data
  • Initial idea for project
  • Come up with preliminary structure for the website

Final Result

  • A website including a short opinion essay

Have fun!

References

Links

Five Fifty: Lost in translation

LEADERSHIP LAB: The Craft of Writing Effectively

The age of analytics: Competing in a data-driven world

The data-driven enterprise of 2025

Think Different Logo

Preattentive attributes

Academic References

Aguiar, Luis, and Joel Waldfogel. 2021. “Platforms, Power, and Promotion: Evidence from Spotify Playlists*.” The Journal of Industrial Economics 69 (3): 653–91. https://doi.org/10.1111/joie.12263.
Berger, Jonah A., Wendy W. Moe, and David A. Schweidel. 2022. “What Holds Attention? Linguistic Drivers of Engagement.” SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn.4311202.
Berger, Jonah, and Katherine L. Milkman. 2012. “What Makes Online Content Viral?” Journal of Marketing Research 49 (2): 192–205. https://doi.org/10.1509/jmr.10.0353.
Berman, Ron, and Ayelet Israeli. 2022. “The Value of Descriptive Analytics: Evidence from Online Retailers.” Marketing Science 41 (6): 1074–96. https://doi.org/10.1287/mksc.2022.1352.
Boughanmi, Khaled, and Asim Ansari. 2021. “Dynamics of Musical Success: A Machine Learning Approach for Multimedia Data Fusion.” Journal of Marketing Research, April, 00222437211016495. https://doi.org/10.1177/00222437211016495.
Bradlow, Eric T., Manish Gangwar, Praveen Kopalle, and Sudhir Voleti. 2017. “The Role of Big Data and Predictive Analytics in Retailing.” Journal of Retailing, The Future of Retailing, 93 (1): 79–95. https://doi.org/10.1016/j.jretai.2016.12.004.
Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics, Themed issue: Treatment effect 1, 225 (2): 200–230. https://doi.org/10.1016/j.jeconom.2020.12.001.
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2020. “A Crash Course in Good and Bad Controls.” SSRN 3689437.
Cunningham, Scott. 2021. Causal Inference - the Mixtape. Yale University Press. https://mixtape.scunning.com/index.html.
Felbermayr, Armin, and Alexandros Nanopoulos. 2016. “The Role of Emotions for the Perceived Usefulness in Online Customer Reviews.” Journal of Interactive Marketing 36 (November): 60–76. https://doi.org/10.1016/j.intmar.2016.05.004.
Filippas, Apostolos, John J. Horton, and Joseph M. Golden. 2022. “Reputation Inflation.” Marketing Science 41 (4): 733–45. https://doi.org/10.1287/mksc.2022.1350.
Golder, Peter N., Marnik G. Dekimpe, Jake T. An, Harald J. van Heerde, Darren S. U. Kim, and Joseph W. Alba. 2022. “Learning from Data: An Empirics-First Approach to Relevant Knowledge Generation.” Journal of Marketing, September, 00222429221129200. https://doi.org/10.1177/00222429221129200.
Goldfarb, Avi, Catherine Tucker, and Yanwen Wang. 2022. “Conducting Research in Marketing with Quasi-Experiments.” Journal of Marketing 86 (3): 1–20. https://doi.org/10.1177/00222429221082977.
He, Sherry, Brett Hollenbeck, and Davide Proserpio. 2022. “The Market for Fake Reviews.” Marketing Science 41 (5): 896–921. https://doi.org/10.1287/mksc.2022.1353.
Kim, Aekyoung, Felipe M. Affonso, Juliano Laran, and Kristina M. Durante. 2021. “Serendipity: Chance Encounters in the Marketplace Enhance Consumer Satisfaction.” Journal of Marketing 85 (4): 141–57. https://doi.org/10.1177/00222429211000344.
McGowan, Lucy D’Agostino, Roger D. Peng, and Stephanie C. Hicks. 2022. “Design Principles for Data Analysis.” Journal of Computational and Graphical Statistics 0 (0): 1–8. https://doi.org/10.1080/10618600.2022.2104290.
Meyer, Breed D. 1995. “Natural and Quasi-Experiments in Economics.” Journal of Business & Economic Statistics 13 (2): 151–61.
Pachali, Max J., and Hannes Datta. 2022. “What Drives Demand for Playlists on Spotify?” Available at SSRN.
Rocklage, Matthew D., Derek D. Rucker, and Loran F. Nordgren. 2021. “Mass-Scale Emotionality Reveals Human Behaviour and Marketplace Success.” Nature Human Behaviour 5 (10): 1323–29. https://doi.org/10.1038/s41562-021-01098-5.