Practitioner’s Guide to Statistical Tests

Table of Contents

  1. Preliminaries
    1.1. CTR Prediction Example
    1.2. Hypothesis Testing
    1.3. Data Generation
    1.4. Sensitivity and p-value CDF
    1.5. P-value sanity check
  2. Tests on the number of clicks
    2.1. T-test on the number of clicks
    2.2. Mann-Whitney on number of successes
  3. Tests on global CTRs
    3.1. Assumptions
    3.2. Binomial z-test: it fails
    3.3. Bootstrap for global CTR
    3.4. Delta method for global CTR
    3.5. Bucketization
    3.6. Linearization
  4. Tests on user CTRs
    4.1. Assumptions
    4.2. T-test on user CTRs
    4.3. T-test on smoothed user CTRs
    4.4. Intra-user correlation aware weighting
  5. So what? Which test should I use?
  6. Methods not in the list
    6.1. Transformation of clicks
    6.2. CUPED
    6.3. Using future metric prediction

1. Preliminaries

1.1. CTR Prediction Example

1.2. Hypothesis Testing

  1. The test itself
  2. Distribution of the experimental data
  3. Effect size
  4. Size of the test groups

1.3. Data Generation

  1. Initialize N users for the control group, N for the treatment group.
  2. Generate the number of ad views for each user in treatment and control groups from the same log-normal distribution.
  3. Generate the ground truth user CTR for each user from beta-distribution with mean success_rate_control for the control group and success_rate_control * (1 + uplift) for the treatment group.
  4. Generate the number of ad clicks for each user from a binomial distribution with the number of trials equal to the number of views and the success rate equal to the ground truth user CTR.

1.4. Sensitivity and p-value CDF

1.5. P-value sanity check

2. Tests on the number of clicks

2.1. T-test on the number of clicks

2.2. Mann-Whitney on number of successes

  1. It does not require any additional assumptions of data distribution
  2. Since it works with rank (clicks) rather than clicks itself, it is more robust to outliers

3. Tests on global CTRs

3.1. Assumptions

3.2. Binomial z-test: it fails

  • we don’t influence views;
  • each click and each view is an i.i.d sample.

3.3. Bootstrap for global CTR

3.4. Delta method for global CTR

3.5. Bucketization

3.6. Linearization

4. Tests on user CTRs

4.1. Assumptions

4.2. T-test on user CTRs

  • we don’t influence views
  • user CTRs (and uplift in user CTR) and views are independent

4.3. T-test on smoothed user CTRs

  • we don’t influence views
  • user CTR (and uplift in user CTR) does not depend on views

4.4. Intra-user correlation aware weighting

  • we don’t influence views
  • user CTR (and uplift in user CTR) does not depend on views

5. So what? Which test should I use?

  1. Your experiment does not influence views.
  2. User CTRs in your data do not depend on views, and uplift in user CTRs (the treatment effect of your experiment) does not depend on views.

6. Methods not in the list

6.1. Transformation of clicks

6.2. CUPED

6.3. Using future metric prediction

--

--

--

About VK technologies and infrastructure

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

2021 Gaza Damage Assessment using Earth Observation Data

Confounding Variable and Spurious Correlation: Key Challenge in making Causal Inference

Data Visualization Using Pandas Bokeh

ERM Flashcards — Part 3

What actually is “Probabilistic record linkage”

A new model for search results ranking in graph data

5 Data Science Portfolios to Aspire for

Cobras, India and Systems Thinking

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
VK Team

VK Team

About VK technologies and infrastructure

More from Medium

Where to Look for Your Best, Most Happiest Data Science Job

A/B Testing, calculating your sample size using various methods

What does a correlation imply?

Let’s Talk statistics: Hypothesis Testing Cheatsheet #1