A fine-grained community visitors evaluation with Millisampler

What the analysis is: 

Millisampler is one in all Meta’s newest characterization instruments and permits us to look at, characterize, and debug community efficiency at high-granularity timescales effectively. This light-weight community visitors characterization instrument for continuous monitoring operates at fantastic, configurable timescales. It collects time sequence of ingress and egress visitors volumes, variety of lively flows, incoming ECN marks, and ingress and egress retransmissions. Moreover, Millisampler can also be capable of determine in-region visitors and cross-region visitors (longer RTT). Millisampler runs on our server fleet gathering quick, periodic snapshots of this knowledge at 100us, 1ms, and 10ms time granularities, shops it in native disk, and makes it out there for a number of days for on-demand evaluation. For the reason that knowledge is just aggregated flow-level header info, it doesn’t comprise any personally identifiable info (PII). Even with the minimal quantity of data it collects, Millisampler knowledge has confirmed very helpful in apply,  notably when mixed with present coarser-grained knowledge — we’re capable of see clearly how swap buffers or host NICs, for instance, is perhaps unable to deal with the ingress visitors sample.

 

The way it works: 

Millisampler includes userspace code to schedule runs, retailer knowledge, and serve knowledge, and an eBPF-based tc filter that runs within the kernel to gather fine-timescale knowledge. The person code attaches the tc filter and allows knowledge assortment. A tc filter is among the many first programmable steps on the receipt of a packet and close to the final step on transmission. On ingress, which means the eBPF code executes on the CPU core that’s processing the comfortable irq (backside half) because the packet is directed towards the proudly owning socket. As a result of processing occurs on many CPU cores, to keep away from locks, we use per-CPU variables, which improve the reminiscence requirement to eradicate danger of rivalry. To attenuate overhead, we pattern periodically and for brief intervals of time. Userspace due to this fact configures two parameters in Millisampler: the sampling interval and the variety of samples. We schedule runs with three sampling intervals: 10ms, 1ms, and 100μs, with a set variety of samples to 2,000 for all sampling intervals. Because of this our commentary intervals vary from 200ms (100μs sampling fee) to 20s (10ms sampling fee), permitting us to look at occasions at sub-RTT to cross-region RTT time scales, and, on the identical time, repair the reminiscence footprint of every run to 2,000 64-bit counters per CPU core for every worth we measure.

Millisampler collects quite a lot of metrics. It computes ingress and egress whole bytes and ingress ECN-marked bytes from the lengths and CE bits of the packets. Millisampler additionally soundsTTLd marked retransmits. Millisampler makes use of a 128-bit sketch to estimate the variety of lively (incoming and outgoing) connections. Utilizing the sketch leads to an approximation of the connection rely that’s exact as much as a dozen connections and saturates at round 500 connections per sampling interval. Though there’s area for extra precision, in apply, greater than the precise variety of connections, the qualitative variation between a number of connections to dozens or a whole lot of connections has been useful towards figuring out patterns of visitors with extra connections (heavy incast) versus extra visitors with fewer connections.

Why it issues:

Millisampler is a robust instrument for troubleshooting and efficiency evaluation. Two contrasting community efficiency faults that we solved at Meta in the previous couple of years relied on our needing a fine-grained view of visitors. The primary drawback featured synchronized visitors bursts at fantastic time scales, and seeing this motivated us to construct and deploy Millisampler to catch it shortly if it occurred once more. The second, which an early Millisampler prototype helped root-cause, featured a NIC driver bug that prompted it to cease delivering packets for milliseconds at a time, thereby proving the worth of Millisampler in advanced investigations. Whereas Millisampler (or Millisampler-like knowledge) performed an necessary function in these investigations, it was solely as a part of our wealthy ecosystem of knowledge assortment instruments that monitor a dizzying array of metrics throughout hosts and a community.

Past such incidents, Millisampler knowledge has additionally confirmed helpful in characterizing and analyzing visitors traits of providers, permitting us to design and deploy a variety of options to assist enhance their efficiency. For instance, now we have been capable of characterize the character of bursts throughout a lot of providers to be able to perceive the depth of incast and tune transport efficiency accordingly. We’ve got additionally been ready to have a look at advanced interactions between short-RTT and long-RTT flows and perceive how bursts of both have an effect on equity for the opposite. In a following submit, we are going to have a look at an extension of Millisampler — Syncmillisampler — the place we run Millisampler synchronously throughout all hosts in a rack and use that knowledge to determine buffer rivalry within the top-of-rack ASICs.

Learn the total paper:

Acknowledgements:

Ehab Ghabashneh, Cristian Lumezanu, Raghu Nallamothu, and Rob Sherwood additionally contributed to the design and implementation of Millisampler.