Unified Grid: How We Re-Architected Slack for Our Largest Clients

All software program is constructed atop a core set of assumptions. As new code is added and new use-cases emerge, software program can turn out to be unmoored from these assumptions. When this occurs, a basic rigidity arises between revisiting these foundational assumptions—which often entails plenty of work—or attempting to assist new conduct atop the present structure. The latter strategy is often suggested, to save lots of time and cut back threat.

Nonetheless, there are occasions when it’s price revising the core structure of a giant software program utility. Just lately at Slack we did simply that, taking a step again to vary how our backend and purchasers (the desktop and cell functions) work on a foundational degree.

Slack launched in 2013 with a easy structure—every person belonged to a single workspace, the place they joined channels and despatched messages. To view messages from a unique workspace (that you simply had been additionally logged in to), you wanted to click on into that workspace. 

This mannequin held till 2017, once we launched Enterprise Grid, which lets Slack’s largest prospects divide their organizations into a number of workspaces, every with a specific focus. To start with Enterprise Grid customers had been often in only a single workspace, however over time utilization patterns modified, and in the present day these customers typically belong to a number of workspaces. Concurrently, we’ve constructed methods for Slack purchasers to share knowledge throughout a number of workspaces on the identical Grid, such because the Threads and Unreads views and cross-workspace channels. 

This led to a pure query: if knowledge is shared between a number of workspaces on the identical Grid, and customers want to change between these workspaces to do their jobs, why not as an alternative present a single, unified view of all the info a person can entry inside their Grid? Not solely would this present a superior person expertise, it might remove a category of bugs brought on by syncing org-wide knowledge throughout a number of workspaces. And it might enhance efficiency, since knowledge for a number of workspaces might be loaded in a single API request.

With this perception, the Unified Grid undertaking was born. However as a result of Slack was architected with the belief that the majority knowledge is explicit to a single workspace, it was initially unclear whether or not Unified Grid was even possible. Nonetheless, we determined that as a result of the product continued to push towards the bounds of a workspace-centric structure, we needed to attempt.

Unified Grid supplies highly effective organizational ideas just like the DMs tab, the Exercise Tab, and Reserve it for Later, whereas nonetheless permitting customers to filter by workspace.

Enterprise Grid: The evolution of Slack’s structure

To grasp what made Unified Grid such an formidable undertaking, it’s price zooming out to research Slack’s structure and the way it’s advanced through the years.

In 2013, Slack launched with a comparatively easy mannequin. Customers belonged to workspaces inside which they joined channels and despatched messages. Every workspace represented a buyer, and all the info for a specific workspace was saved on a single database server, or “shard.” Slack purchasers authenticated their API requests utilizing session tokens containing the person ID and workspace ID (referred to as “workspace tokens”); the backend then parsed the workspace ID and used it to affiliate every API request with a workspace, route queries to that workspace’s database shard and carry out entry management. This mannequin additionally prolonged to the shopper, the place the info for every workspace was saved in a separate repository with distinct login periods.

The unique Slack knowledge mannequin routed all queries to a database shard recognized by the workspace ID within the session token.

As Slack grew, we observed that particular person divisions inside the identical firm typically created separate Slack workspaces. We wished to present firms a easy strategy to administer these workspaces through a single UI, the place they may implement safety insurance policies and deal with billing throughout their whole group. Thus, Enterprise Grid, our resolution for our largest and most advanced prospects, was born.

To assist Enterprise Grid, we launched the idea of an “org” that successfully served as a “mother or father” to a number of workspaces. Customers nonetheless navigated Slack from the angle of a person workspace, however now it was additionally doable for knowledge to be saved on the org degree. For instance, prospects may create cross-workspace (XWS) channels, which had been saved on the org’s database shard and visual throughout a number of workspaces. This meant that the Slack backend was required to question knowledge on each the workspace shard and, if absent there, on the org shard (for workspaces that are a part of an Enterprise Grid). As a result of Enterprise Grid customers might be assigned permissions on the extent of the workspace and/or org, the backend additionally needed to test permissions at each the workspace and org-level.

In Enterprise Grid, the backend queries each the workspace and org shard to resolve knowledge saved on the org-level (and subsequently accessible to all workspaces on the Grid).

The altering panorama

Initially, since finish customers had been often in a single workspace, their expertise didn’t change a lot in Enterprise Grid. Nonetheless, over time the best way prospects use Slack has advanced. Now, a good portion of customers do belong to a number of workspaces on the grid, which led to context switching and missed exercise.

We wished to handle these issues, and a number of other infrastructure-level modifications we’d made advised a approach ahead. With the Vitess migration, we started sharding knowledge alongside axes aside from workspace or org ID, that means that the workspace or org was now not required to route queries to the suitable database shard for our most essential tables. We additionally enhanced our real-time messaging (RTM) stack to take away the necessity to fan-out org-wide knowledge to each workspace on the grid (and a few of our largest prospects have 1000’s of workspaces!). Lastly, we up to date purchasers to share org-wide knowledge throughout all workspaces inside their grid. Leveraging these infrastructure investments, we constructed views that aggregated content material from a number of workspaces, like our Threads and Unreads view.

Nonetheless, even with these enhancements, our workspace-centric structure nonetheless brought about important frustration. We knew that to actually remedy the issue, we’d want to maneuver to an org-wide structure, although this could entail updating 1000’s of APIs, database queries, and permissions checks.

Prototyping the trail

Execs—to not point out engineers—had been understandably involved about the price of Unified Grid, and never satisfied that the payoff could be definitely worth the effort. Due to this fact, somewhat than begin by tackling what had been probably 1000’s of damaged APIs, we determined to construct a proof of idea to higher perceive the advantages of Unified Grid and the work that might be required to ship it end-to-end.

At Slack, we name this prototyping the trail—that’s, constructing incrementally, proving out and refining our concepts as we go. As a result of we’re a number of the heaviest customers of Slack, we knew that if we may use Unified Grid in our day-to-day work, we’d begin getting good indicators about what did and didn’t work. And because the undertaking grew in maturity, we may decide in additional of our friends, gathering helpful suggestions from them.

First, we wanted to have the ability to boot the Slack shopper in Unified Grid mode, with an org-wide view of all of the person’s channels somewhat than a workspace-scoped view. To this finish, we constructed a brand new boot API which returns knowledge for all of the workspaces and channels the person belongs to throughout your entire Grid. We up to date purchasers to retailer this boot knowledge on the org-level, since customers in Unified Grid now not navigate from the angle of a single Grid workspace at a time.

As soon as the shopper may boot, we up to date our homegrown API framework such that an API might be marked appropriate with the brand new Unified Grid shopper. We then started fixing APIs and client-side checks as we encountered points, prioritizing people who impacted our day-to-day work. We had just a few main methods for fixing damaged APIs:

  1. If an API didn’t depend on workspace context for routing—maybe as a result of it had been migrated to a brand new sharding scheme through the Vitess migration—we allowed it to be referred to as in Unified Grid and confirmed that the question nonetheless behaved appropriately. For instance, as a result of the messages desk is now sharded by channel ID, we may effectively fetch messages for a channel with out important modifications.
  2. If an API acted instantly on a workspace, we may typically immediate customers to pick out a workspace after which cross that workspace to the API. For instance, we up to date the channel creation stream such that the person should choose the workspace through which the channel needs to be created, for the reason that workspace can now not be inferred from the state of the shopper.
  3. Lastly, if all else failed, we may iterate over the person’s related workspaces, trying to resolve the question towards every workspace’s shard. As a result of most customers are in solely a handful of workspaces, this strategy is surprisingly performant. Nonetheless, there’s a lengthy tail of customers in tons of of workspaces. As a result of such customers are typically directors who don’t work together with all these workspaces, we determined to cap the variety of “related” workspaces at 50 and permit customers to manually configure this listing. Limiting the related workspaces for every person ensures cheap efficiency and makes Slack usable for these outliers.
With Unified Grid, within the worst case the Slack backend queries the shard for each Enterprise Grid workspace the person belongs to when loading workspace-level knowledge.

Though our prototype had numerous tough edges, we felt the good thing about diminished context switching and an easier UX. From there, we began opting in additional coworkers, finally inviting execs like our then-CEO Stewart Butterfield to attempt the brand new shopper. His suggestions summed up how we felt: “That is clearly higher.”

From prototype to manufacturing

As talked about above, Unified Grid probably impacted each API and permission test invoked by the Slack shopper. It will require important effort from scores of engineers throughout most of Slack’s product engineering groups to make sure these API and permission checks continued to behave appropriately. Concurrently, we had been constructing IA4, a redesign of the Slack shopper which launched our Exercise, DMs, and Later tabs. As a way to keep away from subjecting prospects to separate giant modifications on the identical time, Unified Grid turned a foundational element of IA4, and with it a high firm precedence.

We started with spreadsheets itemizing all APIs which had been invoked by Slack purchasers in addition to all permission checks carried out by purchasers and the backend, dividing the work amongst varied associated product groups. In step with prototyping the trail, we requested engineers to take two passes over every API: a primary cross to make the API work effectively sufficient for inside utilization, after which—maybe weeks later—a second cross to make sure the combination assessments, permissions checks and different edge-cases behaved appropriately. This two-phase strategy allowed us to manually confirm and get a really feel for performance which was not totally prepared for primetime.

The core staff now pivoted our work away from prototyping to extra scalably assist the migration effort with instruments and frameworks:

Docs: Most significantly, we put collectively an in depth information with step-by-step directions for making certain that an API behaves appropriately in Unified Grid, together with the methods for fixing APIs listed within the “Prototyping the trail” part. 

Assessments: We created a parallel integration check suite which ran all our present integration assessments utilizing org context as an alternative of workspace context. This allow us to reuse 1000’s of assessments somewhat than rewriting them from the bottom up. As anticipated, tons of of check suites had been damaged initially, offering us with a concrete listing of check suites to repair as a part of marking an API appropriate with Unified Grid.

Helpers: We added plenty of comfort helpers to appropriately fetch channels and carry out permissions checks throughout all a person’s workspaces on their Enterprise Grid, on each purchasers and the backend. For instance, to test whether or not a person can act as an admin inside a cross-workspace channel, these helpers test whether or not the person is a workspace admin in any of the workspaces with which the channel is shared or is an admin on the org-level.

Shopper Infrastructure: Along with the work wanted to assist these permissions checks, purchasers additionally required new infrastructure emigrate workspace-scoped repositories to the brand new knowledge mannequin. The purchasers solved this downside in several methods: some purchasers added an org-level knowledge retailer however continued to avoid wasting knowledge in workspace-scoped repositories, whereas different purchasers moved all the pieces to an org-wide retailer. These knowledge migrations might be accomplished and shipped in parallel with the general Unified Grid undertaking, which allowed us to de-risk the undertaking itself.

We created a spreadsheet to trace the variety of APIs and permission checks we wanted to repair, and loved watching the graph have a tendency in the direction of zero.

Conclusion

By Summer time 2023, Unified Grid was in a spot the place a lot of the corporate was utilizing it for his or her day-to-day work. We started rolling out to prospects in Fall 2023 and accomplished the rollout in March 2024. What had begun as a barely purposeful prototype was, nearly two years later, a core element of our redesigned shopper and a stable basis atop which to maintain innovating.

It’s a truism that you simply shouldn’t try giant rewrites of present software program functions. However like all truisms, it’s solely nearly all the time true. Generally, when the structure of an utility drifts far sufficient from how that utility is used, prototyping a path in the direction of rewriting the core basis is definitely one of the best ways to attain your objectives.

Now that Unified Grid is dwell, we’re excited to see what’s subsequent. What else could be constructed atop a extra versatile info structure? No matter it’s, we all know that we’ll be prototyping the trail to new, intuitive product experiences effectively into the longer term. If that’s one thing that excites you too, come join us.