Practitioner’s Guide to Statistical Tests

Table of Contents

  1. Preliminaries
    1.1. CTR Prediction Example
    1.2. Hypothesis Testing
    1.3. Data Generation
    1.4. Sensitivity and p-value CDF
    1.5. P-value sanity check
  2. Tests on the number of clicks
    2.1. T-test on the number of clicks
    2.2. Mann-Whitney on number of successes
  3. Tests on global CTRs
    3.1. Assumptions
    3.2. Binomial z-test: it fails
    3.3. Bootstrap for global CTR
    3.4. Delta method for global CTR
    3.5. Bucketization
    3.6. Linearization
  4. Tests on user CTRs
    4.1. Assumptions
    4.2. T-test on user CTRs
    4.3. T-test on smoothed user CTRs
    4.4. Intra-user correlation aware weighting
  5. So what? Which test should I use?
  6. Methods not in the list
    6.1. Transformation of clicks
    6.2. CUPED
    6.3. Using future metric prediction

1. Preliminaries

1.1. CTR Prediction Example

1.2. Hypothesis Testing

  1. The test itself
  2. Distribution of the experimental data
  3. Effect size
  4. Size of the test groups

1.3. Data Generation

  1. Initialize N users for the control group, N for the treatment group.
  2. Generate the number of ad views for each user in treatment and control groups from the same log-normal distribution.
  3. Generate the ground truth user CTR for each user from beta-distribution with mean success_rate_control for the control group and success_rate_control * (1 + uplift) for the treatment group.
  4. Generate the number of ad clicks for each user from a binomial distribution with the number of trials equal to the number of views and the success rate equal to the ground truth user CTR.

1.4. Sensitivity and p-value CDF

1.5. P-value sanity check

2. Tests on the number of clicks

2.1. T-test on the number of clicks

2.2. Mann-Whitney on number of successes

  1. It does not require any additional assumptions of data distribution
  2. Since it works with rank (clicks) rather than clicks itself, it is more robust to outliers

3. Tests on global CTRs

3.1. Assumptions

3.2. Binomial z-test: it fails

  • we don’t influence views;
  • each click and each view is an i.i.d sample.

3.3. Bootstrap for global CTR

3.4. Delta method for global CTR

3.5. Bucketization

3.6. Linearization

4. Tests on user CTRs

4.1. Assumptions

4.2. T-test on user CTRs

  • we don’t influence views
  • user CTRs (and uplift in user CTR) and views are independent

4.3. T-test on smoothed user CTRs

  • we don’t influence views
  • user CTR (and uplift in user CTR) does not depend on views

4.4. Intra-user correlation aware weighting

  • we don’t influence views
  • user CTR (and uplift in user CTR) does not depend on views

5. So what? Which test should I use?

  1. Your experiment does not influence views.
  2. User CTRs in your data do not depend on views, and uplift in user CTRs (the treatment effect of your experiment) does not depend on views.

6. Methods not in the list

6.1. Transformation of clicks

6.2. CUPED

6.3. Using future metric prediction

--

--

--

About VK technologies and infrastructure

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Guide to Using Descriptive Statistics in Data Science

CENTRAL LIMIT THEOREM

COVID-19 vaccines are worthless if people aren’t vaccinated

How to build your own chatbot using Data Science?

Setting Your Business Up For Success with Behaviour Segmentation

Five Must-Know String Methods in Python

Beginner’s Guide to NumPy for Data Science

Mapping the Parisian trees

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
VK Team

VK Team

About VK technologies and infrastructure

More from Medium

Outliers in data preprocessing

Deploying R models in online mode on Cloud Pak for Data

T-tests — when to use which?

Celebrating our second anniversary of Data Science at Microsoft