Sequential A/B Testing Retains the World Streaming Netflix Half 2: Counting Processes | by Netflix Know-how Weblog | Mar, 2024
The counting processes are features that increment by 1 every time a brand new occasion arrives. Clearly, there are fewer occasions occurring within the remedy than within the management. If these had been login occasions, this is able to recommend that the brand new code comprises a bug that stops some customers from with the ability to log in efficiently.
This can be a widespread scenario when coping with occasion timestamps. To provide one other instance, if occasions corresponded to errors or crashes, we want to know if these are accruing sooner within the remedy than within the management. Furthermore, we need to reply that query as shortly as attainable to forestall any additional disruption to the service. This necessitates sequential testing methods which had been launched in part 1.
Time-Inhomogeneous Poisson Course of
Our knowledge for every remedy group is a realization of a one-dimensional level course of, that’s, a sequence of timestamps. As the speed at which the occasions arrive is time-varying (in each remedy and management), we mannequin the purpose course of as a time-inhomogeneous Poisson point process. This level course of is outlined by an depth perform λ: ℝ → [0, ∞). The variety of occasions within the interval [0,t), denoted N(t), has the next Poisson distribution
N(t) ~ Poisson(Λ(t)), the place Λ(t) = ∫₀ᵗ λ(s) ds.
We search to check the null speculation H₀: λᴬ(t) = λᴮ(t) for all t i.e. the depth features for management (A) and remedy (B) are the identical. This may be executed semiparametrically with out making any assumptions in regards to the depth features λᴬ and λᴮ. Furthermore, the novelty of the analysis is that this may be executed sequentially, as described in section 4 of our paper. Conveniently, the one knowledge required to check this speculation at time t is Nᴬ(t) and Nᴮ(t), the overall variety of occasions noticed to this point in management and remedy. In different phrases, all you might want to check the null speculation is 2 integers, which may simply be up to date as new occasions arrive. Right here is an instance from a simulated A/A check, by which we all know by design that the depth perform is identical for the management (A) and the remedy (B), albeit nonstationary.