Introducing Distill CLI: An environment friendly, Rust-powered software for media summarization

Introducing Distill CLI: An environment friendly, Rust-powered software for media summarization
Introducing Distill CLI: An environment friendly, Rust-powered software for media summarization

Distill CLI summarizing The Frugal Architect

A couple of weeks in the past, I wrote a couple of mission our staff has been engaged on known as Distill. A easy software that summarizes and extracts vital particulars from our day by day conferences. On the finish of that publish, I promised you a CLI model written in Rust. After a number of code critiques from Rustaceans at Amazon and a little bit of polish, right now, I’m able to share the Distill CLI.

After you construct from supply, merely go Distill CLI a media file and choose the S3 bucket the place you’d wish to retailer the file. At this time, Distill helps outputting summaries as Phrase paperwork, textual content recordsdata, and printing on to terminal (the default). You’ll discover that it’s simply extensible – my staff (OCTO) is already utilizing it to export summaries of our staff conferences on to Slack (and dealing on help for Markdown).

Tinkering is an effective technique to be taught and be curious

The way we build has changed quite a bit since I started working with distributed systems. Today, if you want it, compute, storage, databases, networking are available on demand. As builders, our focus has shifted to faster and faster innovation, and along the way tinkering at the system level has become a bit of a lost art. But tinkering is as important now as it has ever been. I vividly remember the hours spent fiddling with BSD 2.8 to make it work on PDP-11s, and it cemented my never-ending love for OS software. Tinkering provides us with an opportunity to really get to know our systems. To experiment with new languages, frameworks, and tools. To look for efficiencies big and small. To find inspiration. And this is exactly what happened with Distill.

We rewrote one of our Lambda functions in Rust, and observed that cold starts were 12x faster and the memory footprint decreased by 73%. Before I knew it, I began to think about other ways I could make the entire process more efficient for my use case.

The original proof of concept stored media files, transcripts, and summaries in S3, but since I’m running the CLI locally, I realized I could store the transcripts and summaries in memory and save myself a few writes to S3. I also wanted an easy way to upload media and monitor the summarization process without leaving the command line, so I cobbled together a simple UI that provides status updates and lets me know when anything fails. The original showed what was possible, it left room for tinkering, and it was the blueprint that I used to write the Distill CLI in Rust.

I encourage you to give it a try, and let me know once you discover any bugs, edge circumstances or have concepts to enhance on it.

Builders are selecting Rust

As technologists, we now have a duty to construct sustainably. And that is the place I actually see Rust’s potential. With its emphasis on efficiency, reminiscence security and concurrency there’s a actual alternative to lower computational and upkeep prices. Its reminiscence security ensures remove obscure bugs that plague C and C++ initiatives, lowering crashes with out compromising efficiency. Its concurrency mannequin enforces strict compile-time checks, stopping information races and maximizing multi-core processors. And whereas compilation errors could be bloody aggravating within the second, fewer builders chasing bugs, and extra time targeted on innovation are all the time good issues. That’s why it’s change into a go-to for builders who thrive on fixing issues at unprecedented scale.

Since 2018, we now have more and more leveraged Rust for vital workloads throughout numerous providers like S3, EC2, DynamoDB, Lambda, Fargate, and Nitro, particularly in eventualities the place {hardware} prices are anticipated to dominate over time. In his visitor publish final 12 months, Andy Warfield wrote a bit about ShardStore, the bottom-most layer of S3’s storage stack that manages information on every particular person disk. Rust was chosen to get kind security and structured language help to assist establish bugs sooner, and the way they wrote libraries to increase that kind security to purposes to on-disk constructions. When you haven’t already, I like to recommend that you just read the post, and the SOSP paper.

This pattern is mirrored throughout the trade. Discord moved their Learn States service from Go to Rust to deal with giant latency spikes brought on by rubbish assortment. It’s 10x quicker with their worst tail latencies lowered nearly 100x. Equally, Figma rewrote performance-sensitive components of their multiplayer service in Rust, they usually’ve seen important server-side efficiency enhancements, similar to lowering peak common CPU utilization per machine by 6x.

The purpose is that if you’re critical about price and sustainability, there isn’t any cause to not think about Rust.

Rust is difficult…

Rust has a reputation for being a difficult language to learn and I won’t dispute that there is a learning curve. It will take time to get familiar with the borrow checker, and you will fight with the compiler. It’s a lot like writing a PRFAQ for a new idea at Amazon. There is a lot of friction up front, which is sometimes hard when all you really want to do is jump into the IDE and start building. But once you’re on the other side, there is tremendous potential to pick up velocity. Remember, the cost to build a system, service, or application is nothing compared to the cost of operating it, so the way you build should be continually under scrutiny.

But you don’t have to take my word for it. Earlier this year, The Register revealed findings from Google that confirmed their Rust groups had been twice as productive as staff’s utilizing C++, and that the identical measurement staff utilizing Rust as an alternative of Go was as productive with extra correctness of their code. There aren’t any bonus factors for rising headcount to sort out avoidable issues.

Closing ideas

I need to be crystal clear: this isn’t a name to rewrite every little thing in Rust. Simply as monoliths are not dinosaurs, there is no single programming language to rule them all and not every application will have the same business or technical requirements. It’s about using the right tool for the right job. This means questioning the status quo, and continuously looking for ways to incrementally optimize your systems – to tinker with things and measure what happens. Something as simple as switching the library you use to serialize and deserialize json from Python’s standard library to orjson might be all you need to speed up your app, reduce your memory footprint, and lower costs in the process.

If you take nothing else away from this post, I encourage you to actively look for efficiencies in all aspects of your work. Tinker. Measure. Because everything has a cost, and cost is a pretty good proxy for a sustainable system.

Now, go build!

A special thank you to AWS Rustaceans Niko Matsakis and Grant Gurvis for his or her code critiques and suggestions whereas growing the Distill CLI.