hi! i want to build an a/b testing tool, but i'm struggling to learn how to approach building some aspects.
for clarification, i'm not looking for a/b testing tools. rather, i'm looking for resources on how I can build these types of a/b tools myself. i have a high level overview of the data flow and architecture that i know my product should have. i need to hash each user_id to ensure each user consistently gets the same version, and also need to ensure that each version is split evenly among the entire userbase. then i should store this user_id-version mapping for quick lookup later. but i'm stumped when it comes to this:
- how do i know which hashing algorithm to use to ensure that each version is split evenly among the userbase?
- is it better to build and run this hashing and allocation in-house or are there external services that help with this?
- The most useful resource we've found was from Spotify, of all places: https://engineering.atspotify.com/category/data-science/
- For hashing, md5 hash on (user-id + a/b-test-id) is sufficient. In practice we had no issues with split bias. You should not get too clever with hashing. You'll want to stick to something reliable and widely supported to make any post-experiment analysis easier. You definitely want to log the user-id to version mapping somewhere.
- As for in-house vs external, I would probably go in-house, though that depends on the system you're A/B testing. In practice the amount of work needed to integrate a third-party tool was roughly the same as building the platform, but building the platform meant we could test more bespoke features.