Apache Flink® on Kubernetes. Airbnb’s Use of A New Flink platform… | by Ran Zhang | The Airbnb Tech Weblog | Jul, 2024
Airbnb’s Use of A New Flink platform developed from Apache Hadoop® Yarn
At Airbnb, Apache Flink was launched in 2018 as a supplementary answer for stream processing. It ran alongside Apache Spark™ Streaming for a number of years earlier than transitioning to change into the first stream processing platform. On this weblog put up, we’ll delve into the evolution of Flink structure at Airbnb and evaluate our prior Hadoop Yarn platform with the present Kubernetes-based structure. Moreover, we’ll talk about the efforts undertaken all through the migration course of and discover the challenges that arose throughout this journey. Ultimately we’ll summarize the influence, learnings alongside the best way and future plans.
The evolution of Airbnb’s streaming processing structure primarily based on Apache Flink will be categorized into three distinct phases:
Section One: Flink jobs operated on Hadoop Yarn with Apache Airflow serving because the job scheduler.
Round 2018, a number of groups at Airbnb adopted Flink as their streaming processing engine, primarily resulting from its superior low-latency capabilities in comparison with Spark Streaming. Throughout this era, Flink jobs had been working on Hadoop Yarn, and Airflow was employed because the workflow supervisor for job scheduling and dependency administration.
The choice of Airflow because the workflow supervisor was largely influenced by its widespread use in addressing numerous job scheduling wants, as there have been no different user-friendly open-source options available at the moment. Every staff was answerable for dealing with their Airflow Directed Acyclic Graphs (DAGs), job supply code, and the requisite dependency JARs. Usually, Flink JAR information had been domestically constructed earlier than deployment to Amazon S3.
The structure catered to our necessities throughout that interval with a restricted vary of use instances.
From 2019 onwards, Apache Flink gained important traction at Airbnb, changing Spark Streaming as the first stream processing platform. With the scaling in utilization of Flink we encountered numerous challenges and limitations on this structure. To start with, Airflow’s batch-oriented design, counting on polling intervals, didn’t match Airbnb’s wants, and we skilled important delays in job begin and failure restoration, usually inflicting SLA violations for low-latency use instances. Airflow additionally prompted a singleton problem as duplicate job submissions often happen resulting from race situations amongst Airflow employees and consumer operations not following anticipated patterns. In addition to, Airflow’s Directed Acyclic Graph (DAG) construction is complicated and doesn’t perform effectively with a few of Airbnb’s streaming use instances. We additionally encountered engineering context mismatch on this structure: product engineers may discover themselves unfamiliar with Apache Airflow and Hadoop, leading to a steep studying curve when establishing new Apache Flink jobs.
To sort out the above technical and operational challenges, we began to discover new potentialities. Our preliminary step concerned changing Airflow with a custom-made light-weight streaming job scheduler, marking the inception of Section Two.
Section Two: Flink jobs operated on Hadoop Yarn, with a light-weight streaming job scheduler.
At a excessive degree, Airflow was changed by a light-weight streaming job scheduler working on Kubernetes. The job scheduler incorporates a grasp node and a pool of employee nodes:
The grasp node is answerable for managing the metadata of all Flink jobs and making certain the correct life cycle of every employee node. This consists of duties akin to parsing user-provided job configurations, synchronizing metadata and job statuses with Apache Zookeeper™, and making certain that employee nodes persistently preserve their anticipated states.
A employee node is answerable for dealing with the dependencies and life cycle of a single Flink job. Staff bundle the required dependencies, submit the Flink job to Hadoop Yarn, constantly monitor its standing, and within the occasion of a failure, it triggers a right away restart.
The Section 2 design resulted in sooner turnaround time and diminished downtime throughout job restarts. It additionally resolved single level of failure points with Zookeeper.
As utilization of Flink grew, we encountered new challenges in Section Two:
- Lack of CI/CD: Flink builders needed to devise their very own model management methods.
- Absence of native secrets and techniques administration: There is no such thing as a vanilla secrets and techniques administration on Hadoop Yarn.
- Restricted useful resource and dependency isolation: Every supported Flink model needed to be manually preinstalled on the Yarn cluster. Whereas Yarn’s useful resource queues may present some degree of useful resource isolation, job-level isolation was absent.
- Service Discovery complexity: As extra use instances had been onboarded, every doubtlessly requiring entry to numerous inner Airbnb providers, configuring service entry on Yarn proved to be cumbersome. It pressured a binary selection between enabling service entry for all the cluster or none in any respect.
- Monitoring and debugging challenges: Managing and sustaining the logging pipeline and SSH entry turned non-trivial duties on a multi-tenant Yarn cluster.
- Ongoing complexity and dependencies: Though the Flink job scheduler was light-weight in comparison with Airflow, it launched further complexities.
Section Three (present state): Flink jobs run on Kubernetes, and the job scheduler is eradicated.
Deploying Flink on Kubernetes permits direct Flink deployment on a working Kubernetes cluster. With this integration we will discover enabling environment friendly autoscaling and the Kubernetes operator to simplify the administration of Flink jobs and clusters.
Flink on Kubernetes affords a number of benefits over Hadoop Yarn addressing the above challenges:
- Developer expertise: Standardized by integrating with the present CI/CD programs.
- Secrets and techniques Administration: With Flink on Kubernetes, every Flink job can securely retailer its personal secrets and techniques inside the pods. This gives a safer technique to handle delicate info.
- Remoted Surroundings: Jobs working on Flink on Kubernetes profit from isolation at each the useful resource and dependency ranges. Every job can run on its devoted Flink model if supported by its picture, permitting for higher administration of dependencies.
- Enhanced Monitoring: Integration with Airbnb’s pre-defined logging and metric sidecars on Kubernetes simplifies setup and improves monitoring. This permits detailed insights into particular person pods and charge limiting for logging per pod, making it simpler to trace and troubleshoot points.
- Service Discovery: Flink jobs now adhere to Airbnb’s standardized strategy for service discovery, utilizing the cluster mesh. This ensures constant and dependable communication between providers.
- Simplified SSH entry: Customers with the suitable permissions can now SSH into the Flink pod with out the necessity for an SSH tunnel. This gives larger flexibility and management over SSH permissions per job.
Moreover, we’ve noticed an rising degree of Kubernetes assist and adoption inside the Flink group, which elevated our confidence in working Flink on Kubernetes.
It’s value mentioning that Kubernetes brings its personal dangers and limitations. For example, a single Flink job supervisor failover can result in the pause of all the job course of. This could pose points in eventualities with frequent node rotations inside Kubernetes and huge jobs deployed with tons of of job managers. For context, node rotation on Kubernetes is carried out to make sure the operability and stability of the cluster. It includes changing current nodes with new ones, sometimes with up to date configurations or to carry out upkeep duties, with the targets of making use of host configuration modifications, sustaining node steadiness and enhancing operational effectivity. As compared, node rotations on Yarn happen much less continuously, so the influence on job availability is much less important. We are going to discover how we’re mitigating these challenges within the Future Work part.
Under is an summary of our present structure:
To offer a greater understanding of the system, beneath is a deep dive of the 5 major parts, in addition to how customers work together with them when establishing a brand new Flink job:
- Job configurations: This serves as an abstraction layer over Kubernetes and CI/CD parts, offering Flink customers with a simplified interface for creating Flink utility templates. It shields customers from the complexities of the underlying Kubernetes infrastructure. Flink customers outline the core specs of their Flink job through a configuration file. This consists of vital info just like the entrypoint class identify, job parallelism, and the required ingress providers and sinks.
- Picture administration: This element includes the pre-construction of Flink base photographs, that are bundled with important dependencies required to entry Airbnb sources. These photographs are saved in Amazon Elastic Container Registry and will be readily deployed with consumer Jars or additional custom-made to fulfill particular consumer wants.
- CI/CD: By introducing a number of customizations to assist Flink’s stateful deployment, we’ve built-in Flink with our current CI/CD system, offering a standardized model management and steady supply expertise. Flink jobs are deployed inside Kubernetes, every residing in its distinct namespace to make sure isolation and efficient administration.
- Flink portal: an API service that gives important options for managing the states of Flink jobs. These options embody stopping a Flink job with a savepoint and querying accomplished checkpoints on Amazon S3. Moreover, it gives a self-service UI portal, enabling customers to observe and verify the standing of their jobs. Customers additionally acquire entry to vital job state administration functionalities, empowering them to both provoke the job from a bootstrapped savepoint or resume it from a earlier checkpoint.
- Flink job runtime: Every Flink job is deployed as an impartial utility cluster on Kubernetes. To make sure fault tolerance and state storage, Zookeeper, ETCD, and Amazon S3 are utilized. Moreover, pre-configured sidecar containers accompany the Flink containers to supply assist for vital features akin to logging, metrics, DNS, and extra. A service mesh is employed to facilitate communication between Flink jobs and different microservices.
Improved Developer Velocity
Onboarding Flink jobs is quicker, the place our builders famous that it takes hours as a substitute of days, and builders can focus extra on their utility logic.
Enchancment in Flink Job Availability and Latency
The structure of Flink on Kubernetes improves job availability and scheduling latency by eliminating sure parts of the Flink shopper and job scheduler present in Flink on Yarn.
Price Financial savings in Infrastructure
The streamlining of Flink infrastructure complexity and the elimination of sure parts, such because the job scheduler, have resulted in price financial savings in our infrastructure. Moreover, by working Flink jobs on a shared Kubernetes cluster at Airbnb, we may doubtlessly enhance the general price effectivity of our firm’s infrastructure.
Enchancment in Job Availability
Within the Flink world, node rotations in Kubernetes may cause job restarts and end in downtime. Whereas Flink itself can get well from job restarts with out information loss, the potential downtime and availability influence could also be unfavorable for extremely latency-sensitive purposes. To handle this, there are a number of approaches we’re evaluating.
- Lowering the variety of node rotations to attenuate job restarts.
- Quicker job restoration.
Allow Job Autoscaling
With the introduction of Reactive Mode in Flink 1.13, customers can dynamically modify the parallelism of their jobs with out the necessity for a job restart. This job auto scaling function can improve job stability and price effectivity. Sooner or later we may allow autoscaling for Flink Kubernetes workloads by leveraging system metrics (akin to CPU utilization) and Flink metrics (akin to backpressure), to find out the suitable parallelism.
Flink Kubernetes Operator
The Flink Kubernetes Operator makes use of Customized Sources and features as a controller to handle all the manufacturing lifecycle of Flink purposes. By leveraging the operator, we will streamline the operation and deployment processes for Flink jobs. It gives higher management over deployment and lifecycle of jobs, and an out of field answer for autoscaling and auto tuning.
To summarize, the migration of Airbnb’s streaming processing structure primarily based on Apache Flink from Hadoop Yarn to Kubernetes has been a major milestone in enhancing our streaming information processing capabilities. This transition has resulted in a extra streamlined and user-friendly expertise for Flink builders. By overcoming challenges that had been complicated to deal with on Yarn, we have now laid the inspiration for extra environment friendly and efficient streaming information processing.
As we glance forward, we’re dedicated to additional refining our strategy and resolving any remaining challenges. We’re enthusiastic in regards to the ongoing progress and potential of Apache Flink inside our firm, and we anticipate continued innovation and enchancment sooner or later.
If this type of work sounds interesting to you, take a look at our open roles — we’re hiring!
The Flink on Kubernetes platform wouldn’t have been potential with out cross-functional and cross-org collaborators in addition to management assist. They embody, however should not restricted to: Jingwei Lu, Lengthy Zhang, Daniel Low, Weibo He, Zack Loebel-Begelman, Justin Cunningham, Adam Kocoloski, Liyin Tang and Nathan Towery.
Particular because of the broader Airbnb information group members who offered enter or support to the implementation staff all through the design, improvement, and launch phases.
We additionally wish to thank Wei Hou and Xu Zhang for his or her assist in authoring this put up throughout their time at Airbnb.
Apache Spark™, Apache Airflow™, and Apache ZooKeeper™ are emblems of The Apache Software program Basis.
Apache Flink® and Apache Hadoop® are registered emblems of The Apache Software program Basis.
Kubernetes® is a registered trademark of The Linux Basis.
Amazon S3 and AWS are emblems of Amazon.com, Inc. or its associates.
All product names, logos, and types are property of their respective homeowners. All firm, product and repair names used on this web site are for identification functions solely. Use of those names, logos, and types doesn’t indicate endorsement.