Hypothesis & Goal – What? Why? How?

What is the motivation behind setting good goals and hypothesis? The main reason we should do it for every feature (not only for experiments) is to let us (as the feature initiators) and the other stake holders understand why do we need this feature. And what impact will it make.

I want to suggest you a template I’ve built which is helping me to set good feature hypothesis and goals.

The “Big Goal”

Every company has at least 1 “big goal”. The goal can be a weekly, monthly, yearly or anything the company defines.

Example for “big goals”:

  • 1M new users by the end of this year
  • $500K monthly revenue

The “Feature Goal”

Every feature should be backed up with a main goal we are trying to achieve. This goal should bring us 1 step closer to our “Big Goal”

Let’s go back for our previous example and set example for feature goals:

  • 20K new users in 1 month (meaning we want this feature to help with 24% of our “big goal” which is 1M new users a year)
  • $10K daily revenue (meaning we want this feature to help with 60% of our “big goal” which is $500K revenue a month)

The “Hypothesis”

What do you think we should do in order to achieve the feature goal.

For example: Change the “Add to cart” button in the middle of the screen with bigger font

Let’s combine them all together

Consider the following template

We think that if we will [HYPOTHESIS] then we will gain [FEATURE_GOAL] and the overall effect for each [BIG_GOAL_PERIOD] will be [BIG_GOAL_PERIOD_VALUE] which is [BIG_GOAL_PERIOD_VALUE]/[BIG_GOAL_VALUE]*100% of [BIG_GOAL]

For example:

We think that if we will change the “Add to cart” button in the middle of the screen with bigger font then we will gain $10K daily revenue and the overall effect for each month will be $300K which is 60% of  $500K monthly revenue

Summary

I’ve noticed sometimes feature initiators are having hard times defining good hypothesis and goals and want to share this template I’ve built. In the AB testing world one of the most important part of an AB test is this definition. A feature without these definitions is like shooting in the dark. By using this template you can easily understand the Why part of the feature and can see the direction that the feature initiator aims to go with this feature.

Another important side effect of this definition is that after we present this definition with our colleagues it leads to a very meaningful conversation which can raise more important questions and aspects of this new feature we want to do.

Advertisements

Safari & 3rd party cookies is only a camouflage

Featured image

Did you know that Safari blocks by default 3rd party cookies?  Well… not exactly.

What is the Safari definition for 3rd party domain?

3rd party domain for Safari is a domain the user didn’t visit on his current session on any of the main window.

Is it possible to write a 3rd party cookie on Safari even though the user chose to block them? Yes! And if you ask me, I can’t understand the logic behind this decision. That means that every advertiser can store cookies even after your security settings prevents from doing it.

Which steps should you do in order to write 3rd party cookies for Safari?

  1. The 3rd party cookie should contain the P3P header
  2. The user has to visit the 3rd party domain site (be sure to have an endpoint in the 3rd party domain which will redirect the user back to your site right after that)
  3. The redirection must occur in the main window (if you redirect the user in the client side you should use: window.top.location.href to redirect)

I don’t know why and for how long Safari will allow this bypass, and from my point of view it is a major security issue and they can potentially lose users which privacy is their top concerns.

A/B How and when to decide

Ever asked yourself how and when to end an experiment and make a decision? Relying only on statistical significance is not enough for you? This post is for you!

Goal

The most important step of A/B testing is having a well defined goal. I’ve heard lot’s of people saying “How do I set a goal? I don’t know what to expect, all I want to do is just to increase the conversion rate” I always give the following answer:

Let’s say you have 5% conversion rate, and you say you want to make it better, so 6% conversion rate is 20% percent better, right? Meaning that if you reached 20% more with at least 95% confidence level then your experiment succeeded and you reached your goal, right? Well… Definitely Maybe!

Let’s understand within an example

8,000 visits

Group A – 4,000 visits with 200 conversions

Group B – 4,000 visits with 240 conversion

According to statistical significance  test results B is the winner with 97.5% confidence level!!! Amazing! Is that so…? Consider the following scenarios

Scenario A

  • Go to a quiet room
  • Put this song in the background: Relaxation Music
  • Imagine that after 2 weeks of the experiment you see 20% conversion rate
  • You now imagine that everyone in your company are smiling, and your boss is really proud of you, and your company is the next google
  • Now you see that this experiment yields only 40 more conversions to your company
  • Now you should play this sound

Scenario B

  • Go to a quiet room
  • Put this song in the background: Relaxation Music
  • Now think on the end result (for example you have by far more conversions you expected – think on the numbers which will make you feel like that)
  • You now imagine that everyone in your company are smiling, and your boss is really proud of you, and your company is the next google
  • You did it! You are ready to run the next experiment

Sample Size

Continuing our last example it is extremely important to understand the sample size of the goal you’ve just defined, you should know and understand what is the minimal sample size you need in order to have statistical significance of at least 95%. On this post we won’t explain how to calculate the minimum sample size, but you can use some online calculators such as: http://www.testsignificance.com/ or http://www.stat.ubc.ca/~rollin/stats/ssize/b2.html. Be sure that the sample size you need is realistic and you have enough traffic in an acceptable time frame in order to achieve your goals before running your tests.

You can also use this calculator in order to calculate how long should you run the test in order to get significance results.

Cycles and Seasonality

Another important thing to understand is the application seasonality and cycles.

Cycles

Every application has a trend of users behavior. For example if you have an e-commerce site and let’s say that you see that on Monday-Wednesday your users are looking at products and reading reviews, and on the weekend they usually make the purchase then your minimum cycle period is 7 days. A cycle can also be 1 day, 2 weeks, and even a month. If you don’t know what is your minimum cycle period my suggestion is to set a minimum of a weekly cycle.

Seasonality

Is it Christmas? Or maybe black Friday? Or the Super Bowl? Your users will probably behaves different on this period so be sure you understand it and don’t assume this behavior to the rest of your users at the end of this season.

Once you identify your cycle / season you should wait till the whole period is over and don’t make a decision in the middle. For example if you choose a weekly cycle, and you’ve started your experiment on Tuesday you should make a decision only on Tuesdays (After 7,14,21, days etc…) and not in the middle in order to be sure you are concluding all type of segments of users. 

Segmentation

Cycles and Seasonality is a special case of segmentation. Segmentation means you have different kind of users with different behaviors. An example to segmentation can be:

  • Gender
  • New vs Existing users
  • Physical location
  •  Web vs Mobile web
  • Anonymous vs Signed in users
  • And many more

When we analyze our results we usually analyze the conversion rate between the groups and then make the wise decision. But we should also use segmentation as a tool to better explain our results.

Statistical Significance

Last but not least – The statistical significance.

When you make a decision based on your A/B results you must be sure that you’ve reached a confidence level for at least 95% significance and also you should be sure you understand what’s false positive means type I and type II errors. For 2 variation with 95% significance level there are 5% for false positive, meaning that there is a 5% chance that you will make the wrong decision so be careful with that. Let’s emphasis this problem with the famous 41 shades of blue google experiment. Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. Let’s consider that each variation was 95% confidence it means that there is a now 88%  confidence, meaning that for 41 variations there is a 12% for false positive!

There are many ways to calculate the confidence level based on your results with statistical significance tests such as ANOVA , T-Test, Z-TestChi Square Test and more… When performing these tests you should decide whether you want to perform a 1 or 2 tail test.

Summary

Before running your experiment be sure you know your goals, from the goals check what should be your minimum sample size, and check for significance results and the end of your cycle / season. Understand your false positive rate and reduce it to minimum.

The most important thing is to believe in your data and love statistics even if it is the opposite of your hypothesis.

A/B & Gradual Rollout – The Same, But Different

People ask me: “Nir, what is the difference between A/B testing and Gradual rollout?” This post will try to give an answer to this question

Gradual Rollout

When we refer to Gradual Rollout we are talking about a methodology of deploying a new feature to the users by degrees, rather than deploy it suddenly to all of the users

Why?

There is 1 reason for gradually rolling out new feature and the reason is to: reduce the risk when deploying a brand new feature

User Acceptance Tests

So you have the most talented QA team. The QA team found all the bugs on your feature, but does the QA knows if it works for the user?

Let’s explain it using an example:

  • You are a big mobile device manufacturer
  • You just finished developing your new OS version 2.0 and are very excited about it
  • Your amazing QA team approved this version, and you are all happy with the new OS version
  • You are now ready to rollout this new version on the air (OTA)

Now you have 2 options:

    1. Rollout to all users at once – 100% of the users
    2. Gradually rollout – 1% of the users

Let’s imagine that your users start complaining to your support that after 3 days the device turned off with no reason consistently. Your engineering team investigate this issue and find out it happens only to users that have linked their exchange account to their device, and it is happening only after only 3 days… The engineering team then fix this bug and a new version is now ready to be rolled out.

Imagine what would happen if you deployed it to 100% of the users, your company was now in big trouble! But if you just deployed it to 1% of the users then only a small amount of you users got this bad experience. After the fix is rolled out again you can see and measure again the current version quality. When you will feel more comfortable you can then expose it to more and more users by degrees.

Performance

You’ve executed load test for your feature. But you are still not sure and don’t want to take the risk that your feature will break down on production during the peak hours. So we can start gradually rolling it out and then measure and learn how this feature affected your system.

Feedback

Feedback is the most important thing. You can get feedback from your users, analytics, stakeholders. You want to get feedback from some of the users before rolling it out too all of them. You want to learn!

Who is doing it?

For example: SamsungMicrosoftTwitterFacebookQuora

A/B Testing

An A/B testing is an experiment between 2 (or more) variants and the winner is the variant which fulfilled the hypothesis for achieving your goal. (The “formal” definition of A/B testing is referring only to 2 different web pages while Multivariate testing refer to a tweak in a specific component (web page for example)).

Why?

You want to increase conversion but you are not sure how. You have 2 solutions (=variants) in your mind and want to see which one is better for you.

For example you have a website want to add a new landing for anonymous users. So you have an idea for 2 landing pages, and you are not sure which page will have a better conversion rate. So you can choose only 1 landing page and be 50% right, or you can do A/B testing between these 2 landing pages and to be 99.9% right ) – You can then continue tweaking and tweaking your page with Multivariate testing in order to improve your conversion rate even more.

The Same

  • There are users who are exposed to it, and other which aren’t
  • Random throttling between the groups
  • They are the same 🙂

The Difference

The main difference is with the intent:

  • AB: To test our hypothesis
  • Gradual rollout: To reduce the risk

Another difference is with the throttling control. While on A/B testing you are checking 2 version and you usually do 50%/50% split between the 2 variants. On graduall rollout you start with by small proportion, and open it to more users by degrees until you’ve reaches 100% of the users. For example 1% –> 5% –> 10% –>  50% –> 70% –> 100%

When to choose A/B testing?

When you are not sure which is better A or B (or CDEFG). And you want to test your hypothesis

When to choose Gradual Rollout?

When you have a new feature you want to deploy, but you want to reduce the risk to minimum

Summary

A/B testing and graduall rollout uses the same framework. The main difference is with the intent and different throttling usage between them

A/B Testing: Why?

Are you asking the right questions? So you want to build an A/B culture, but did you ever think of why? Why will you ever want to do it? Don’t you have something better to do? Is this the correct way to achieve your goals? Do you have any goals?

In an Utopian world you would never have the need to do A/B testing. You could just think of a new feature then develop it and see how this brand new amazing feature converts even better than you’ve ever expected. Now back to reality…

The problem

Let me help you a bit, if you agree with these sentences then you might start thinking on implementing an A/B culture on your company:

  • You try to increase conversion rate and you are not sure what is the best solution?
  • You argue with your colleagues too much about things such as: what is the best graphic design / where to place the button etc…
  • You are not sure you are doing the right thing
  • You are not god
  • You want to learn
  • You have an idea and want to test it on real users before implementing the entire dream

Setting up your goals

Now let’s ask it in the A/B language: What is your goal before building an A/B testing culture on your company? Before you continue reading, try to think what are your goals, why you want to build an A/B culture on your company?

For example your goals can be:

  • To increase the following conversions
    • % of Active users
    • Growth rate
    • Revenue
  • Position your website as one of the top leading websites in the world
  • Start creating success stories

The Hypothesis

Let’s first explain what is a hypothesis?

An assumption that is suggested to solve a specific problem which hasn’t been approved yet to be true

Good, now after we have goals, we can start think of our hypothesis and try to find ways of how the hell to implement our dreams. For example the hypothesis for the example goals above may be:

  • Change the feature life cycle:
    • From: Have an idea-> Write a spec -> Test it -> Deploy it -> Forget it every existed
    • To: Have an idea -> Discuss it with your colleagues -> Argue -> Convert the arguments to hypothesis -> Test it on real users -> Deploy it -> Learn -> Think and iterate
  • Go back to old features and try to find new ways to increase revenue
  • Improve yourself by thinking and trying alternatives
  • Responding your users needs over following a plan (Sounds familiar? http://agilemanifesto.org/)
  • Understand that you are not always right

If we look at these hypothesis we can understand that we can implement them using an A/B testing culture. I’m not saying if we should move the an A/B testing culture or not, I’m suggesting moving to an A/B culture in order to test our hypothesis. If we will prove our hypothesis to be true then A/B is our answer. And what if not? What if our hypothesis was proved to be false? Should we give up and go back to the starting point? Or should we learn and understand why it didn’t work and then tweak it and try it again?

Summary

  • Don’t embed an A/B culture to your company just because it is cool and has a buzz. Do use it as a process to achieve your goals
  • Define your goals and hypothesis
  • Test them and learn
  • Decide and iterate

If you ever asked yourself, or wonder how and who can build an A/B culture in a company then the answer is Everyone who wants.

Oh, and don’t forget: While trying to build this culture you will hear a lot of NO! I don’t like it! I don’t want it. So that means you should give up? ABsolutly no!