Introducing Configurable Metaflow | by Netflix Expertise Weblog | Dec, 2024
David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*, Shashank Srikanth*, Chaoying Wang*, Regina Wang*, Darin Yu*
*: Mannequin Growth Staff, Machine Studying Platform
^: Content material Demand Modeling Staff
A month in the past at QConSF, we showcased how Netflix utilizes Metaflow to power a diverse set of ML and AI use cases, managing hundreds of distinctive Metaflow flows. This adopted a earlier weblog on the identical matter. Many of those tasks are beneath fixed growth by devoted groups with their very own enterprise targets and growth greatest practices, such because the system that helps our content material choice makers, or the system that ranks which language subtitles are Most worthy for a particular piece of content material.
As a central ML and AI platform staff, our position is to empower our associate groups with instruments that maximize their productiveness and effectiveness, whereas adapting to their particular wants (not the opposite means round). This has been a guiding design precept with Metaflow since its inception.
Standing on the shoulders of our in depth cloud infrastructure, Metaflow facilitates quick access to information, compute, and production-grade workflow orchestration, in addition to built-in greatest practices for frequent considerations similar to collaboration, versioning, dependency management, and observability, which groups use to setup ML/AI experiments and programs that work for them. Consequently, Metaflow customers at Netflix have been capable of run hundreds of thousands of experiments over the previous few years with out losing time on low-level considerations.
Whereas Metaflow goals to be un-opinionated about a number of the higher ranges of the stack, some groups inside Netflix have developed their very own opinionated tooling. As a part of Metaflow’s adaptation to their particular wants, we continually attempt to perceive what has been developed and, extra importantly, what gaps these options are filling.
In some circumstances, we decide that the hole being addressed may be very staff particular, or too opinionated at too excessive a stage within the stack, and we due to this fact determine to not develop it inside Metaflow. In different circumstances, nevertheless, we understand that we will develop an underlying assemble that aids in filling that hole. Word that even in that case, we don’t at all times goal to fully fill the hole and as an alternative deal with extracting a extra common decrease stage idea that may be leveraged by that specific person but in addition by others. One such recurring sample we seen at Netflix is the necessity to deploy units of intently associated flows, usually as half of a bigger pipeline involving desk creations, ETLs, and deployment jobs. Often, practitioners need to experiment with variants of those flows, testing new information, new parameterizations, or new algorithms, whereas maintaining the general construction of the stream or flows intact.
A pure resolution is to make flows configurable utilizing configuration recordsdata, so variants could be outlined with out altering the code. Up to now, there hasn’t been a built-in resolution for configuring flows, so groups have constructed their bespoke options leveraging Metaflow’s JSON-typed Parameters, IncludeFile, and deploy-time Parameters or deploying their very own home-grown resolution (usually with nice ache). Nevertheless, none of those options make it simple to configure all elements of the stream’s habits, decorators particularly.
Exterior Netflix, we’ve seen comparable regularly requested questions on the Metaflow community Slack as proven within the person quotes above:
In the present day, to reply the FAQ, we introduce a brand new — small however mighty — characteristic in Metaflow: a Config object. Configs complement the present Metaflow constructs of artifacts and Parameters, by permitting you to configure all elements of the stream, decorators particularly, previous to any run beginning. On the finish of the day, artifacts, Parameters and Configs are all saved as artifacts by Metaflow however they differ in when they’re endured as proven within the diagram under:
Mentioned one other means:
- An artifact is resolved and endured to the datastore on the finish of every activity.
- A parameter is resolved and endured at first of a run; it could actually due to this fact be modified as much as that time. One frequent use case is to make use of triggers to cross values to a run proper earlier than executing. Parameters can solely be used inside your step code.
- A config is resolved and endured when the stream is deployed. When utilizing a scheduler similar to Argo Workflows, deployment occurs when create’ing the stream. Within the case of a neighborhood run, “deployment” occurs simply previous to the execution of the run — consider “deployment” as gathering all that’s wanted to run the stream. Not like parameters, configs can be utilized extra extensively in your stream code, significantly, they can be utilized in step or stream stage decorators in addition to to set defaults for parameters. Configs can in fact even be used inside your stream.
For example, you may specify a Config that reads a pleasantly human-readable configuration file, formatted as TOML. The Config specifies a triggering ‘@schedule’ and ‘@useful resource’ necessities, in addition to application-specific parameters for this particular deployment:
[schedule]
cron = "0 * * * *"[model]
optimizer = "adam"
learning_rate = 0.5
[resources]
cpu = 1
Utilizing the newly launched Metaflow 2.13, you may configure a stream with a Config like above, as demonstrated by this stream:
import pprint
from metaflow import FlowSpec, step, Config, sources, config_expr, schedule@schedule(cron=config_expr("config.schedule.cron"))
class ConfigurableFlow(FlowSpec):
config = Config("config", default="myconfig.toml", parser="tomllib.hundreds")
@sources(cpu=config.sources.cpu)
@step
def begin(self):
print("Config loaded:")
pprint.pp(self.config)
self.subsequent(self.finish)
@step
def finish(self):
cross
if __name__ == "__main__":
ConfigurableFlow()
There’s a lot happening within the code above, just a few highlights:
- you may seek advice from configs earlier than they’ve been outlined utilizing ‘config_expr’.
- you may outline arbitrary parsers — utilizing a string means the parser doesn’t even must be current remotely!
From the developer’s viewpoint, Configs behave like dictionary-like artifacts. For comfort, they assist the dot-syntax (when attainable) for accessing keys, making it simple to entry values in a nested configuration. You can even unpack the entire Config (or a subtree of it) with Python’s normal dictionary unpacking syntax, ‘**config’. The usual dictionary subscript notation can be accessible.
Since Configs flip into dictionary artifacts, they get versioned and endured mechanically as artifacts. You possibly can access Configs of any past runs easily through the Client API. Consequently, your information, fashions, code, Parameters, Configs, and execution environments are all saved as a constant bundle — neatly organized in Metaflow namespaces — paving the best way for simply reproducible, constant, low-boilerplate, and now simply configurable experiments and sturdy manufacturing deployments.
Whereas you will get far by accompanying your stream with a easy config file (saved in your favourite format, because of user-definable parsers), Configs unlock numerous superior use circumstances. Think about these examples from the up to date documentation:
A significant good thing about Config over earlier extra hacky options for configuring flows is that they work seamlessly with different options of Metaflow: you may run steps remotely and deploy flows to manufacturing, even when counting on customized parsers, with out having to fret about packaging Configs or parsers manually or maintaining Configs constant throughout duties. Configs additionally work with the Runner and Deployer.
When used along with a configuration supervisor like Hydra, Configs allow a sample that’s extremely related for ML and AI use circumstances: orchestrating experiments over a number of configurations or sweeping over parameter areas. Whereas Metaflow has at all times supported sweeping over parameter grids simply utilizing foreaches, it hasn’t been simply attainable to change the stream itself, e.g. to vary @resources or @pypi/@conda dependencies for each experiment.
In a typical case, you set off a Metaflow stream that consumes a configuration file, altering how a run behaves. With Hydra, you may invert the control: it’s Hydra that decides what will get run primarily based on a configuration file. Due to Metaflow’s new Runner and Deployer APIs, you may create a Hydra app that operates Metaflow programmatically — for example, to deploy and execute lots of of variants of a stream in a large-scale experiment.
Take a look at two interesting examples of this pattern within the documentation. As a teaser, this video exhibits Hydra orchestrating deployment of tens of Metaflow flows, every of which benchmarks PyTorch utilizing a various variety of CPU cores and tensor sizes, updating a visualization of the ends in real-time because the experiment progresses:
To provide a motivating instance of what configurations appear like at Netflix in observe, let’s contemplate Metaboost, an inner Netflix CLI instrument that helps ML practitioners handle, develop and execute their cross-platform tasks, considerably just like the open-source Hydra mentioned above however with particular integrations to the Netflix ecosystem. Metaboost is an instance of an opinionated framework developed by a staff already utilizing Metaflow. In reality, part of the inspiration for introducing Configs in Metaflow got here from this very use case.
Metaboost serves as a single interface to 3 totally different inner platforms at Netflix that handle ETL/Workflows (Maestro), Machine Studying Pipelines (Metaflow) and Knowledge Warehouse Tables (Kragle). On this context, having a single configuration system to handle a ML venture holistically provides customers elevated venture coherence and decreased venture threat.
Configuration in Metaboost
Ease of configuration and templatizing are core values of Metaboost. Templatizing in Metaboost is achieved by way of the idea of bindings, whereby we will bind a Metaflow pipeline to an arbitrary label, after which create a corresponding bespoke configuration for that label. The binding-connected configuration is then merged into a worldwide set of configurations containing such info as GIT repository, department, and so forth. Binding a Metaflow, may also sign to Metaboost that it ought to instantiate the Metaflow stream as soon as per binding into our orchestration cluster.
Think about a ML practitioner on the Netflix Content material ML staff, sourcing options from lots of of columns in our information warehouse, and creating a large number of fashions towards a rising suite of metrics. When a model new content material metric comes alongside, with Metaboost, the primary model of the metric’s predictive mannequin can simply be created by merely swapping the goal column towards which the mannequin is educated.
Subsequent variations of the mannequin will consequence from experimenting with hyper parameters, tweaking characteristic engineering, or conducting characteristic diets. Metaboost’s bindings, and their integration with Metaflow Configs, could be leveraged to scale the variety of experiments as quick as a scientist can create experiment primarily based configurations.
Scaling experiments with Metaboost bindings — backed by Metaflow Config
Think about a Metaboost ML venture named `demo` that creates and hundreds information to customized tables (ETL managed by Maestro), after which trains a easy mannequin on this information (ML Pipeline managed by Metaflow). The venture construction of this repository would possibly appear like the next:
├── metaflows
│ ├── customized -> customized python code, utilized by
| | | Metaflow
│ │ ├── information.py
│ │ └── mannequin.py
│ └── coaching.py -> defines our Metaflow pipeline
├── schemas
│ ├── demo_features_f.tbl.yaml -> desk DDL, shops our ETL
| | output, Metaflow enter
│ └── demo_predictions_f.tbl.yaml -> desk DDL,
| shops our Metaflow output
├── settings
│ ├── settings.configuration.EXP_01.yaml -> defines the additive
| | config for Experiment 1
│ ├── settings.configuration.EXP_02.yaml -> defines the additive
| | config for Experiment 2
│ ├── settings.configuration.yaml -> defines our international
| | configuration
│ └── settings.setting.yaml -> defines parameters primarily based on
| git department (e.g. READ_DB)
├── exams
├── workflows
│ ├── sql
│ ├── demo.demo_features_f.sch.yaml -> Maestro workflow, defines ETL
│ └── demo.principal.sch.yaml -> Maestro workflow, orchestrates
| ETLs and Metaflow
└── metaboost.yaml -> defines our venture for
Metaboost
The configuration recordsdata within the settings listing above include the next YAML recordsdata:
# settings.configuration.yaml (international configuration)
mannequin:
fit_intercept: True
conda:
numpy: '1.22.4'
"scikit-learn": '1.4.0'
# settings.configuration.EXP_01.yaml
target_column: metricA
options:
- runtime
- content_type
- top_billed_talent
# settings.configuration.EXP_02.yaml
target_column: metricA
options:
- runtime
- director
- box_office
Metaboost will merge every experiment configuration (*.EXP*.yaml) into the worldwide configuration (settings.configuration.yaml) individually at Metaboost command initialization. Let’s check out how Metaboost combines these configurations with a Metaboost command:
(venv-demo) ~/tasks/metaboost-demo [branch=demoX]
$ metaboost metaflow settings present --yaml-path=configurationbinding=EXP_01:
mannequin: -> outlined in setting.configuration.yaml (international)
fit_intercept: true
conda: -> outlined in setting.configuration.yaml (international)
numpy: 1.22.4
"scikit-learn": 1.4.0
target_column: metricA -> outlined in setting.configuration.EXP_01.yaml
options: -> outlined in setting.configuration.EXP_01.yaml
- runtime
- content_type
- top_billed_talent
binding=EXP_02:
mannequin: -> outlined in setting.configuration.yaml (international)
fit_intercept: true
conda: -> outlined in setting.configuration.yaml (international)
numpy: 1.22.4
"scikit-learn": 1.4.0
target_column: metricA -> outlined in setting.configuration.EXP_02.yaml
options: -> outlined in setting.configuration.EXP_02.yaml
- runtime
- director
- box_office
Metaboost understands it ought to deploy/run two unbiased situations of coaching.py — one for the EXP_01 binding and one for the EXP_02 binding. You can even see that Metaboost is conscious that the tables and ETL workflows are not certain, and may solely be deployed as soon as. These particulars of which artifacts to bind and which to go away unbound are encoded within the venture’s top-level metaboost.yaml file.
(venv-demo) ~/tasks/metaboost-demo [branch=demoX]
$ metaboost venture recordTables (metaboost desk record):
schemas/demo_predictions_f.tbl.yaml (binding=default):
table_path=prodhive/demo_db/demo_predictions_f
schemas/demo_features_f.tbl.yaml (binding=default):
table_path=prodhive/demo_db/demo_features_f
Workflows (metaboost workflow record):
workflows/demo.demo_features_f.sch.yaml (binding=default):
cluster=sandbox, workflow.id=demo.branch_demox.demo_features_f
workflows/demo.principal.sch.yaml (binding=default):
cluster=sandbox, workflow.id=demo.branch_demox.principal
Metaflows (metaboost metaflow record):
metaflows/coaching.py (binding=EXP_01): -> EXP_01 occasion of coaching.py
cluster=sandbox, workflow.id=demo.branch_demox.EXP_01.coaching
metaflows/coaching.py (binding=EXP_02): -> EXP_02 occasion of coaching.py
cluster=sandbox, workflow.id=demo.branch_demox.EXP_02.coaching
Beneath is a straightforward Metaflow pipeline that fetches information, executes characteristic engineering, and trains a LinearRegression mannequin. The work to combine Metaboost Settings right into a person’s Metaflow pipeline (carried out utilizing Metaflow Configs) is as simple as including a single mix-in to the FlowSpec definition:
from metaflow import FlowSpec, Parameter, conda_base, step
from customized.information import feature_engineer, get_data
from metaflow.metaboost import MetaboostSettings@conda_base(
libraries=MetaboostSettings.get_deploy_time_settings("configuration.conda")
)
class DemoTraining(FlowSpec, MetaboostSettings):
prediction_date = Parameter("prediction_date", sort=int, default=-1)
@step
def begin(self):
# get show_settings() at no cost with the mixin
# and get handy debugging information
self.show_settings(exclude_patterns=["artifact*", "system*"])
self.subsequent(self.get_features)
@step
def get_features(self):
# characteristic engineers on our extracted information
self.fe_df = feature_engineer(
# hundreds information from our ETL pipeline
information=get_data(prediction_date=self.prediction_date),
options=self.settings.configuration.options +
[self.settings.configuration.target_column]
)
self.subsequent(self.prepare)
@step
def prepare(self):
from sklearn.linear_model import LinearRegression
# trains our mannequin
self.mannequin = LinearRegression(
fit_intercept=self.settings.configuration.mannequin.fit_intercept
).match(
X=self.fe_df[self.settings.configuration.features],
y=self.fe_df[self.settings.configuration.target_column]
)
print(f"Match slope: self.mannequin.coef_[0]")
print(f"Match intercept: self.mannequin.intercept_")
self.subsequent(self.finish)
@step
def finish(self):
cross
if __name__ == "__main__":
DemoTraining()
The Metaflow Config is added to the FlowSpec by mixing within the MetaboostSettings class. Referencing a configuration worth is as simple as utilizing the dot syntax to drill into whichever parameter you’d like.
Lastly let’s check out the output from our pattern Metaflow above. We execute experiment EXP_01 with
metaboost metaflow run --binding=EXP_01
which upon execution will merge the configurations right into a single settings file (proven beforehand) and serialize it as a yaml file to the .metaboost/settings/compiled/ listing.
You possibly can see the precise command and args that had been sub-processed within the Metaboost Execution part under. Please word the –config argument pointing to the serialized yaml file, after which subsequently accessible through self.settings. Additionally word the handy printing of configuration values to stdout through the begin step utilizing a combined in perform named show_settings().
(venv-demo) ~/tasks/metaboost-demo [branch=demoX]
$ metaboost metaflow run --binding=EXP_01Metaboost Execution:
- python3.10 /root/repos/cdm-metaboost-irl/metaflows/coaching.py
--no-pylint --package-suffixes=.py --environment=conda
--config settings
.metaboost/settings/compiled/settings.branch_demox.EXP_01.coaching.mP4eIStG.yaml
run --prediction_date20241006
Metaflow 2.12.39+nflxfastdata(2.13.5);nflx(2.13.5);metaboost(0.0.27)
executing DemoTraining for person:dcasler
Validating your stream...
The graph seems to be good!
Bootstrapping Conda setting... (this might take a couple of minutes)
All packages already cached in s3.
All environments already cached in s3.
Workflow beginning (run-id 50), see it within the UI at
https://metaflowui.prod.netflix.web/DemoTraining/50
[50/start/251640833] Activity is beginning.
[50/start/251640833] Configuration Values:
[50/start/251640833] settings.configuration.conda.numpy = 1.22.4
[50/start/251640833] settings.configuration.options.0 = runtime
[50/start/251640833] settings.configuration.options.1 = content_type
[50/start/251640833] settings.configuration.options.2 = top_billed_talent
[50/start/251640833] settings.configuration.mannequin.fit_intercept = True
[50/start/251640833] settings.configuration.target_column = metricA
[50/start/251640833] settings.setting.READ_DATABASE = data_warehouse_prod
[50/start/251640833] settings.setting.TARGET_DATABASE = demo_dev
[50/start/251640833] Activity completed efficiently.
[50/get_features/251640840] Activity is beginning.
[50/get_features/251640840] Activity completed efficiently.
[50/train/251640854] Activity is beginning.
[50/train/251640854] Match slope: 0.4702672504331096
[50/train/251640854] Match intercept: -6.247919678070083
[50/train/251640854] Activity completed efficiently.
[50/end/251640868] Activity is beginning.
[50/end/251640868] Activity completed efficiently.
Performed! See the run within the UI at
https://metaflowui.prod.netflix.web/DemoTraining/50
Takeaways
Metaboost is an integration instrument that goals to ease the venture growth, administration and execution burden of ML tasks at Netflix. It employs a configuration system that mixes git primarily based parameters, international configurations and arbitrarily certain configuration recordsdata to be used throughout execution towards inner Netflix platforms.
Integrating this configuration system with the brand new Config in Metaflow is extremely easy (by design), solely requiring customers so as to add a mix-in class to their FlowSpec — similar to this example in Metaflow documentation — after which reference the configuration values in steps or decorators. The instance above templatizes a coaching Metaflow for the sake of experimentation, however customers may simply as simply use bindings/configs to templatize their flows throughout goal metrics, enterprise initiatives or another arbitrary traces of labor.
It couldn’t be simpler to get began with Configs! Simply
pip set up -U metaflow
to get the most recent model and head to the updated documentation for examples. In case you are impatient, you could find and execute all config-related examples in this repository as properly.
If in case you have any questions or suggestions about Config (or different Metaflow options), you may attain out to us on the Metaflow community Slack.
We wish to thank Outerbounds for his or her collaboration on this characteristic; for rigorously testing it and growing a repository of examples to showcase a number of the prospects supplied by this characteristic.