Balancing Previous Tips with New Feats: AI-Powered Conversion From Enzyme to React Testing Library at Slack

On the planet of frontend improvement, one factor stays sure: change is the one fixed. New frameworks emerge, and libraries can turn out to be out of date with out warning. Maintaining with the ever-changing ecosystem includes dealing with code conversions, each massive and small. One important shift for us was the transition from Enzyme to React Testing Library (RTL), prompting many engineers to transform their check code to a extra user-focused RTL strategy. Whereas each Enzyme and RTL have their very own strengths and weaknesses, the absence of native help for React 18 by Enzyme offered a compelling rationale for transitioning to RTL. It’s so compelling that we at Slack determined to transform greater than 15,000 of our frontend unit and integration Enzyme assessments to RTL, as a part of the replace to React 18.

We began by exploring probably the most simple avenue of in search of out potential Enzyme adapters for React 18. Sadly, our search yielded no viable choices. In his article titled “Enzyme is dead. Now what?, Wojciech Maj, the writer of the React 17 adapter, unequivocally recommended, “you must think about on the lookout for Enzyme different proper now.”

complicated adapter
Adapter illustrating the mismatch and incompatibility of React 18 and Enzyme

Contemplating our final purpose of updating to React 18, which Enzyme doesn’t help, we began with a radical evaluation of the issue’s scope and methods to automate this course of. Our initiative started with a monumental activity of changing greater than 15,000 Enzyme check circumstances, which translated to greater than 10,000 potential engineering hours. At that scale with that many engineering hours required, it was nearly compulsory to optimize and automate that course of. Regardless of thorough opinions of present instruments and intensive Google searches, we discovered no appropriate options for this quite common downside. On this weblog, I’ll stroll you thru our strategy to automating the Enzyme-to-RTL conversion course of. It contains analyzing and scoping the problem, using conventional Summary Syntax Tree (AST) transformations and an AI Giant Language Mannequin (LLM) independently, adopted by our customized hybrid strategy of mixing AST and LLM methodologies.

Summary Syntax Tree (AST) transformations

Our preliminary strategy centered round a extra standard approach of performing automated code conversions — Summary Syntax Tree (AST) transformations. These transformations allow us to characterize code as a tree construction with nodes and create focused queries with conversions from one code node to a different. For instance, wrapper.discover('selector'); can could be represented as:

ast representation
AST illustration of `wrapper.discover(‘selector’);`

Naturally, we aimed to create guidelines to handle the commonest conversion patterns. Moreover specializing in the rendering strategies, akin to mount and shallow, and varied helpers using them, we recognized probably the most continuously used Enzyme strategies to prioritize the conversion efforts. These are the highest 10 strategies in our codebase:

[
 method: 'find', count: 13244 ,
 method: 'prop', count: 3050 ,
 method: 'simulate', count: 2755 ,
 method: 'text', count: 2181 ,
 method: 'update', count: 2147 ,
 method: 'instance', count: 1549 ,
 method: 'props', count: 1522 ,
 method: 'hostNodes', count: 1477 ,
 method: 'exists', count: 1174 ,
 method: 'first', count: 684 ,
... and 55 more methods
]

One essential requirement for our conversion was attaining 100%-correct transformations, as a result of any deviation would lead to incorrect code technology. This problem was notably pronounced with AST conversions, the place to be able to create transformations with 100% accuracy we wanted to painstakingly create extremely particular guidelines for every situation manually. Inside our codebase, we discovered 65 Enzyme strategies, every with its personal quirks, resulting in a quickly increasing rule set and rising considerations about the feasibility of our efforts.

Take, for instance, the Enzyme technique discover, which accepts quite a lot of arguments like selector strings, element sorts, constructors, and object properties. It additionally helps nested filtering strategies like first or filter, providing highly effective component focusing on capabilities however including complexity to AST manipulation.

Along with the big variety of handbook guidelines wanted for technique conversions, sure logic was depending on the rendered element Doc Object Mannequin (DOM) quite than the mere presence or absence of comparable strategies in RTL. For example, the selection between getByRole and getByTestId relied on the accessibility roles or check IDs current within the rendered element. Nevertheless, AST lacks the potential to include such contextual data. Its performance is confined to processing the conversion logic based mostly solely on the contents of the file being remodeled, with out consideration for exterior sources such because the precise DOM or React element code.

With every new transformation rule we tackled, the issue appeared to escalate. After establishing patterns for 10 Enzyme strategies and addressing different apparent patterns associated to our customized Jest matchers and question selectors, it turned obvious that AST alone couldn’t deal with the complexity of this conversion activity. Consequently, we opted for a practical strategy: we achieved comparatively passable conversions for the commonest circumstances whereas resorting to handbook intervention for the extra advanced eventualities. For each line of code requiring handbook changes, we added feedback with recommendations and hyperlinks to related documentation. This hybrid technique yielded a modest success charge of 45% mechanically transformed code throughout the chosen information used for analysis. Finally, we determined to supply this software to our frontend developer groups, advising them to run our AST-based codemod first after which deal with the remaining conversions manually.

Exploring the AST supplied helpful insights into the complexity of the issue. We confronted the problem of various testing methodologies in Enzyme and RTL with no simple mapping between them. Moreover, there have been no appropriate instruments accessible to automate this course of successfully. Consequently, we needed to hunt down different approaches to handle this problem.

Giant Language Fashions (LLMs) transformations

use of LLMs
Group members enthusiastically discussing AI purposes

Amidst the widespread conversations on AI options and their potential purposes throughout the trade, our group felt compelled to discover their applicability to our personal challenges. Collaborating with the DevXP AI group at Slack, who focus on integrating AI into the developer expertise, we built-in the capabilities of Anthropic’s AI mannequin, Claude 2.1, into our workflows. We created the prompts and despatched the check code together with them to our recently-implemented API endpoint.

Regardless of our greatest efforts, we encountered important variation and inconsistency. Conversion success charges fluctuated between 40-60%. The outcomes ranged from remarkably efficient conversions to disappointingly insufficient ones, relying largely on the complexity of the duty. Whereas some conversions proved spectacular, notably in remodeling extremely Enzyme-specific strategies into purposeful RTL equivalents, our makes an attempt to refine prompts had restricted success. Our efforts to fine-tune prompts could have sophisticated issues, probably perplexing the AI mannequin quite than aiding it. The scope of the duty was too massive and multifaceted, so the standalone software of AI failed to supply the constant outcomes we sought, highlighting the complexities inherent in our conversion activity.

The conclusion that we needed to resort to handbook conversions with minimal automation was disheartening. It meant dedicating a considerable quantity of our group’s and firm’s time to check migration, time that would in any other case be invested in constructing new options for our prospects or enhancing developer expertise. Nevertheless, at Slack, we extremely worth creativity and craftsmanship and we didn’t halt our efforts there. As a substitute, we remained decided to discover each potential avenue accessible to us.

AST + LLM transformations

We determined to watch how actual people carry out check conversions and determine any points we would have neglected. One notable benefit within the comparability between handbook human conversion and automatic processes was the wealth of knowledge accessible to people throughout conversion duties. People profit from precious insights taken from varied sources, together with the rendered React element DOM, React element code (typically authored by the identical people), AST conversions, and intensive expertise with frontend applied sciences. Recognizing the importance of this, we reviewed our workflows and built-in most of this related data into our conversion pipeline. That is our closing pipeline flowchart:

pipeline chart
Challenge pipeline flowchart

This strategic pivot, and the combination of each AST and AI applied sciences, helped us obtain the outstanding 80% conversion success charge, based mostly on chosen information, demonstrating the complementary nature of those approaches and their mixed efficacy in addressing the challenges we confronted.

In our pursuit of optimizing our conversion course of, we applied a number of strategic choices that led to a notable 20-30% enchancment past the capabilities of our LLM mannequin out-of-the-box. Amongst these, two progressive approaches stood out that I’ll write about subsequent:

  1. DOM tree assortment
  2. LLM management with prompts and AST

DOM tree assortment

One essential facet of our strategy was the gathering of the DOM tree of React parts. This step proved essential as a result of RTL testing depends closely on the DOM construction of a element quite than its inside construction. By capturing the precise rendered DOM for every check case, we supplied our AI mannequin with important contextual data that enabled extra correct and related conversions.

This assortment step was important as a result of every check case might need completely different setups and properties handed to the element, leading to various DOM constructions for every check case. As a part of our pipeline, we ran Enzyme assessments and extracted the rendered DOM. To streamline this course of, we developed adaptors for Enzyme rendering strategies and saved the rendered DOM for every check case in a format consumable by the LLM mannequin. For example:

// Import unique strategies
import enzyme,  mount as originalMount, shallow as originalShallow  from 'enzyme';
import fs from 'fs';

let currentTestCaseName: string | null = null;

beforeEach(() => 
   // Set the present check case identify earlier than every check
   const testName = anticipate.getState().currentTestName;
   currentTestCaseName = testName ? testName.trim() : null;
);

afterEach(() => 
   // Reset the present check case identify after every check
   currentTestCaseName = null;
);

// Override mount technique
enzyme.mount = (node: React.ReactElement, choices?: enzyme.MountRendererProps) => 
   const wrapper = originalMount(node, choices);
   const htmlContent = wrapper.html();
   if (course of.env.DOM_TREE_FILE) 
       fs.appendFileSync(
           course of.env.DOM_TREE_FILE,
           `<test_case_title>$currentTestCaseName</test_case_title> and <dom_tree>$htmlContent</dom_tree>;n`,
       );
   
   return wrapper;
;
...

LLM management with prompts and AST

The second artistic change we needed to combine was a extra strong and strict controlling mechanism for hallucinations and erratic responses from our LLM. We achieved this by using two key mechanisms: prompts and in-code directions made with the AST codemod. By way of a strategic mixture of those approaches, we created a extra coherent and dependable conversion course of, making certain larger accuracy and consistency in our AI-driven transformations.

We initially experimented with prompts as the first technique of instructing the LLM mannequin. Nevertheless, this proved to be a time-consuming activity. Our makes an attempt to create a common immediate for all requests, together with preliminary and suggestions requests, have been met with challenges. Whereas we may have condensed our code by using a single, complete immediate, we discovered that this strategy led to a big improve within the complexity of requests made to the LLM. As a substitute, we opted to streamline the method by formulating a immediate with probably the most essential directions that consisted of three elements: introduction and common context setting, primary request (10 specific required duties and 7 optionally available), adopted by the directions on the right way to consider and current the outcomes:

Context setting:

`I want help changing an Enzyme check case to the React Testing Library framework.
I'll offer you the Enzyme check file code inside <code></code> xml tags.
I may also provide the partially transformed check file code inside <codemod></codemod> xml tags.
The rendered element DOM tree for every check case will likely be supplied in <element></element> tags with this construction for a number of check circumstances "<test_case_title></test_case_title> and <dom_tree></dom_tree>".`

Important request:

`Please carry out the next duties:
1. Full the conversion for the check file inside <codemod></codemod> tags.
2. Convert all check circumstances and make sure the similar variety of assessments within the file. $numTestCasesString
3. Exchange Enzyme strategies with the equal React Testing Library strategies.
4. Replace Enzyme imports to React Testing Library imports.
5. Regulate Jest matchers for React Testing Library.
6. Return your complete file with all transformed check circumstances, enclosed in <code></code> tags.
7. Don't modify anything, together with imports for React parts and helpers.
8. Protect all abstracted capabilities as they're and use them within the transformed file.
9. Preserve the unique group and naming of describe and it blocks.
10. Wrap element rendering into <Supplier retailer=createTestStore()><Element></Supplier>. So as to do this it is advisable do two issues
First, import these:
import  Supplier  from '.../supplier';
import createTestStore from '.../test-store';
Second, wrap element rendering in <Supplier>, if it was not accomplished earlier than.
Instance:
<Supplier retailer=createTestStore()>
<Element ...props />
</Supplier>
Be certain that all 10 situations are met. The transformed file needs to be runnable by Jest with none handbook adjustments.

Different directions part, use them when relevant:
1. "data-qa" attribute is configured for use with "display.getByTestId" queries.
2. Use these 4 augmented matchers which have "DOM" on the finish to keep away from conflicts with Enzyme
toBeCheckedDOM: toBeChecked,
toBeDisabledDOM: toBeDisabled,
toHaveStyleDOM: toHaveStyle,
toHaveValueDOM: toHaveValue
3. For consumer simulations use userEvent and import it with "import userEvent from '@testing-library/user-event';"
4. Prioritize queries within the following order getByRole, getByPlaceholderText, getByText, getByDisplayValue, getByAltText, getByTitle, then getByTestId.
5. Use question* variants just for non-existence checks: Instance "anticipate(display.question*('instance')).not.toBeInTheDocument();"
6. Guarantee all texts/strings are transformed to lowercase regex expression. Instance: display.getByText(/your textual content right here/i), display.getByRole('button', identify: /your textual content right here/i).
7. When asserting {that a} DOM renders nothing, change isEmptyRender()).toBe(true) with toBeEmptyDOMElement() by wrapping the element right into a container. Instance: anticipate(container).toBeEmptyDOMElement();`

Directions to guage and current outcomes:

`Now, please consider your output and ensure your transformed code is between <code></code> tags.
If there are any deviations from the required situations, checklist them explicitly.
If the output adheres to all situations and makes use of directions part, you may merely state "The output meets all specified situations."`

The second and arguably simpler strategy we used to regulate the output of the LLM was the utilization of AST transformations. This technique isn’t seen elsewhere within the trade. As a substitute of solely counting on immediate engineering, we built-in the partially transformed code and recommendations generated by our preliminary AST-based codemod. The inclusion of AST-converted code in our requests yielded outstanding outcomes. By automating the conversion of less complicated circumstances and offering annotations for all different cases by means of feedback within the transformed file, we efficiently minimized hallucinations and nonsensical conversions from the LLM. This system performed a pivotal position in our conversion course of. Now we have now established a strong framework for managing advanced and dynamic code conversions, leveraging a large number of knowledge sources together with prompts, DOM, check file code, React code, check run logs, linter logs, and AST-converted code. It’s value noting that solely an LLM was able to assimilating such disparate forms of data; no different software accessible to us possessed this functionality.

Analysis and impression

Analysis and impression assessments have been essential parts of our venture, permitting us to measure the effectiveness of our strategies, quantify the advantages of AI-powered options, and validate the time financial savings achieved by means of AI integration. 

We streamlined the conversion course of with on-demand runs, delivering ends in simply 2-5 minutes, in addition to with CI nightly jobs that dealt with a whole lot of information with out overloading our infrastructure. The information transformed in every nightly run have been categorized based mostly on their conversion standing—absolutely transformed, partially transformed with 50-99% of check circumstances handed, partially transformed with 20-49% of check circumstances handed, or partially transformed with lower than 20% of check circumstances handed—which allowed builders to simply determine and use probably the most successfully transformed information. This setup not solely saved time by liberating builders from operating scripts but additionally enabled them to domestically tweak and refine the unique information for higher efficiency of the LLM with the native on-demand runs.

Notably, our adoption charge, calculated because the variety of information that our codemod ran on divided by the full variety of information transformed to RTL, reached roughly 64%. This adoption charge highlights the numerous utilization of our codemod software by the frontend builders who have been the first customers, leading to substantial time financial savings. 

We assessed the effectiveness of our AI-powered codemod alongside two key dimensions: handbook analysis of code high quality on particular check information and move charge of check circumstances throughout a bigger check information set. For the handbook analysis, we analyzed 9 check information of various complexities (three simple, three medium, and three advanced) which have been transformed by each the LLM and frontend builders. Our benchmark for high quality was set by the requirements achieved by the frontend builders based mostly on our high quality rubric that covers imports, rendering strategies, JavaScript/TypeScript logic, and Jest assertions. We aimed to match their degree of high quality. The analysis revealed that 80% of the content material inside these information was precisely transformed, whereas the remaining 20% required handbook intervention.

The second dimension of our evaluation delved into the move charge of check circumstances throughout a complete set of information. We examined the conversion charges of roughly 2,300 particular person check circumstances unfold out inside 338 information. Amongst these, roughly 500 check circumstances have been efficiently transformed, executed, and handed. This highlights how efficient AI could be, resulting in a big saving of twenty-two% of developer time. It’s essential to notice that this 22% time saving represents solely the documented circumstances the place the check case handed. Nevertheless, it’s conceivable that some check circumstances have been transformed correctly, but points akin to setup or importing syntax could have induced the check file to not run in any respect, and time financial savings weren’t accounted for in these cases. This data-centric strategy offers clear proof of tangible time financial savings, in the end affirming the highly effective impression of AI-driven options. It’s value noting that the generated code was manually verified by people earlier than merging into our primary repository, making certain the standard and accuracy of the automated conversion course of whereas maintaining human experience within the loop.

impact chart
Chart with conversion outcomes

As our venture nears its conclusion in Could 2024, we’re nonetheless within the strategy of accumulating knowledge and evaluating our progress. To date it’s obvious that LLMs supply precious help for the builders’ expertise and have a optimistic impact on their productiveness, including one other software to our repertoire. Nevertheless, the shortage of knowledge surrounding code technology, Enzyme-to-RTL conversion specifically, means that it’s a extremely advanced concern and AI won’t have the ability to be an final software for this sort of conversion. Whereas our expertise has been considerably lucky within the respect that the mannequin we used had out-of-the-box capabilities for JavaScript and TypeScript, and we didn’t need to do any additional coaching, it’s clear that customized implementations could also be obligatory to completely make the most of any LLM potential.

Our customized Enzyme-to-RTL conversion software has confirmed efficient up to now. It has demonstrated dependable efficiency for large-scale migrations, saved frontend builders noticeable time, and acquired optimistic suggestions from our customers. This success confirms the worth of our funding into this automation. Wanting forward, we’re desirous to discover automated frontend unit check technology, a subject that has generated pleasure and optimism amongst our builders concerning the potential of AI.

Moreover, as a member of the Frontend Check Frameworks group, I’d like to specific my gratitude for the collaboration, help, and dedication of our group members. Collectively, we created this conversion pipeline, carried out rigorous testing, made immediate enhancements, and contributed distinctive work on the AST codemod, considerably elevating the standard and effectivity of our AI-powered venture. Moreover, we prolong our because of the Slack DevXP AI group for offering an excellent expertise in using our LLM and for patiently addressing all inquiries. Their help has been instrumental in streamlining our workflows and attaining our improvement objectives. Collectively, these groups exemplify collaboration and innovation, embodying the spirit of excellence inside our Slack engineering neighborhood.

All in favour of constructing progressive initiatives and making builders’ work lives simpler? We’re hiring 💼

Apply now