tl;dr: Distributed systems must balance their needs for high availability and low latency with consistency guarantees; providing a mostly hit happy path for requests enables these systems to push the boundaries of this tradeoff.
Distributed systems must choose between prioritizing high availability and low latency or providing strong consistency guarantees. The CAP Theorem famously dictates that it is impossible to achieve “consistency” while remaining available in the presence of network and system partitions. Even without partitions, systems must trade off low latency with consistency. Current systems face stark requirements on all sides of this tradeoff. With user attention spans approaching one second, high availability and low latency are essential to online services. Distributed systems, such as TAO, Meta’s social graph store, must serve billions of queries per second with low response times. Conversely, they also must provide desired consistency semantics to ensure correct application behavior (e.g., enforcing privacy constraints). Google built Spanner, a distributed database providing strict serializability, to support crucial applications and services that require strong consistency in the presence of wide-area replication.
However, striking the right balance between availability / latency and consistency is challenging. For example, Google had to build an independent system, Zanzibar, on top of Spanner to manage access control lists (ACLs) with higher performance than what the underlying database could offer. Zanzibar preserves online privacy by ensuring global consistency for a workload of over 10M QPS while serving most requests in under 10ms and maintaining high availability (>99.999% for over 3 years).
From observing real-world systems, we find that providing a happy path for most requests enables a large-scale distributed system to push the envelope on this tradeoff. While implementations can vary, the core idea behind this approach is to (1) make data quickly available in replicas and (2) ensure the system can locally determine whether it has sufficient data to satisfy consistency requirements or if not, indicate what information is missing. For the first condition, systems can optimize their replication protocols or use other techniques to ensure that most requests can be served without going across the wide-area network. For the second requirement, enough information should be present locally to ascertain if replicated data is usable and what to do if it is not. For example, a system can store local metadata about required versions and apply a staleness check to identify any missing data. These strategies don't have to be perfect (e.g., we can tolerate some false negatives in validating the freshness of data). As long as the happy path is mostly hit and the cost of the alternative path is tolerable, the system can operate efficiently in practice while ensuring consistency.
In balancing needs between high availability / low latency and consistency, distributed systems can push the boundaries of this tradeoff by leveraging local information to provide a mostly hit happy path for requests.
We illustrate how three large-scale systems serve demanding workloads by employing the mostly hit happy path strategy.
While the systems above serve read-optimized workloads, there are analogous techniques for writes, such as conflict-free replicated data types (CRDTs), that enable other systems to choose the right tradeoff between availability / latency and consistency.
At scale, there are three particular concerns a distributed system must address when providing a mostly hit happy path: (1) performance isolation by ensuring that clients do not affect each other; (2) hotspot tolerance so that hot keys do not affect availability; (3) bounding the worst-case since tail latency can slow overall user interaction. We provide examples below on how the three systems tackle these issues.
(1) Performance isolation: Given challenging, real-world workloads, any performance degradation resulting from enforcing stronger consistency guarantees for one client should not affect others. Insulating performance effects is also crucial to protecting against misbehaving users. In Zanzibar, clients store and send their own zookies, which contain information only relevant to the contents they access, rather than every write occurring in the system. The limited overhead of zookies allows this technique to provide a mostly hit happy path at large scale.
(2) Hotspot tolerance: Real-world workloads are often bursty and subject to extreme access patterns. In fact, hot spots have been described as "the most critical frontier" in pursuing high availability and low latency. Furthermore, the cache stampede and thundering herd problems can exacerbate poor behavior for hot objects and even lead to feedback loops and catastrophic failures. To deal with these cases, FlightTracker coalesces reads into short time buckets and responds to all reads in a bucket with the same response. It also batches metadata writes since they are conflict-free. Other techniques, including ones that enable incremental repair of read results and reuse of information fetched through expensive cross-region queries, can be especially helpful in dealing with hot objects. Hotspots need to be explicitly addressed by distributed systems to ensure a mostly hit happy path is actually “happy” for extreme cases.
(3) Bounding the worst case: Tail latency can have significant consequences for online services. Web requests often fan out into dozens or hundred of system level queries, which must all be completed before content can be served to a user. The decreasing marginal utility of slow requests means that a system must bound worst case scenarios. RAMP-TAO provides a protocol optimization for cases when recent data is unavailable, which would normally stall requests: it chooses to return stale (replicated) versions that still satisfy atomic visibility for slow-running queries. For distributed systems, the alternative to the mostly hit happy path should not be too slow even if only a fraction of requests are affected.
Ultimately, finding the right balance between availability / latency and consistency depends on application needs. To push the limits of what can be achieved in this tradeoff, distributed systems should identify and leverage local information to provide a mostly hit happy path for requests. This approach can be especially impactful at global scale.
Thanks to Xiao Shi, Aaron Kabcenell, and Shadaj Laddad for feedback on this post.