Building Machine Learning Operations for Businesses

Author:Murphy | View: 26426 | Time: 2025-03-23 18:19:12

Image by Author: Generated with Midjourney

Background – Navigating MLOps

In my career, I've noticed that the key to successful AI strategies lies in the ability to deploy machine learning models into production, thus unlocking their commercial potential at scale. Yet, this is no small feat – it involves the integration of various technologies, teams, and often necessitates a cultural shift within organisations, a system referred to as MLOps.

However, there's no one-size-fits-all MLOps strategy. In this piece, I offer a flexible MLOps blueprint that can be a starting point or a means to fine-tune your current workflow. Although the MLOps journey can be complex, I strongly advise viewing it as an indispensable initial step in integrating AI into your business, rather than a secondary consideration.

MLOps Goes Beyond Technology

Image by Author: The components of successful MLOps strategies

Before diving into the technicalities, I'd like to share (non-technical) insights from my experience observing various MLOps strategies. MLOps is more than just technology – it hinges on three key components: Investment, Culture, and Technology. Companies that have considered all three from the outset tend to have more success with their strategies. A common mistake I've seen is businesses prioritising investment in solutions without considering requisite cultural shifts. This oversight could critically undermine your strategy, potentially wasting funds and diminishing confidence from your executives or investors

Culture

Introducing a new culture to any business is no mean feat requiring wholehearted support from its people. A common pitfall I have seen is when businesses abruptly replace old tools with new, shiny ones without considering cultural change. This approach can breed resentment and result in these tools being overlooked or misused.

On the contrary, companies managing cultural change effectively have involved end users in crafting the MLOps strategy and assigned them responsibilities promoting ownership. Moreover, they've furnished essential support and training to upskill users rewarding engagement in these initiatives.

A solution may indeed be technically superior, but without driving cultural change, it risks inefficacy. After all, it's people who operate technologies, not the other way around.

Technology

For the sake of brevity, I've defined technology as a combination of both the technical infrastructure and data management services.

An effective MLOps strategy is built on top of a mature data ecosystem. By leveraging data management tools, data scientists should be empowered to access data for model development in a secure and regulatory compliant way.

From the viewpoint of technical infrastructure, we should be empowering data scientists and ML engineers to access the hardware and software required to facilitate the development and delivery of AI products. For many companies, leveraging cloud infrastructure is essential an enabler for this.

Investment

There are no shortcuts in MLOps, particularly when it comes to investment. An efficient MLOps strategy should prioritise investments in both people and technology. A recurring issue I encounter with clients is the tendency to construct an MLOps strategy centred on a single data scientist due to budget constraints. In such cases, I generally recommend a reassessment, or at the very least, a tempering of expectations.

From the outset, it's imperative to establish the extent of your investment in innovation and its duration. In truth, ongoing investment is vital if you wish for AI to become fundamental to your operations and to yield the associated benefits.

For a view on developing AI strategies, you may wish to read my article on crafting AI strategies with Wardley Maps:

Building AI Strategies for Businesses

A High-level Blueprint for MLOps

Now that we've laid the groundwork, we shall delve into some of the technical components of MLOps. To aid visualisation, I've designed a flowchart illustrating relationships between the processes. Where dashed lines are present, data flows. Where a solid line exists, there's a transition from one activity to another.

Image by Author: High Level MLOps Workflow

Model Development Laboratory

The process of model development is inherently unpredictable and iterative. Firms that fail to recognise this will struggle to build effective AI strategies. In truth, model development tends to be the most chaotic aspect of the workflow, filled with experimentation, repetition, and frequent failures. All these elements are essential in exploring new solutions; this is where innovation is born. Thus, what do data scientists need? The freedom to experiment, innovate, and collaborate.

There's a prevailing belief that data scientists should be adhering to software engineering best practices in their code writing. Whilst I don't disagree with this sentiment, there's a time and place for everything. I don't believe that model development labs are necessarily the arena for this. Instead of attempting to quell this chaos, we should embrace it as a necessary part of the workflow, and seek to utilise tools that help us to manage it – an effective model development lab should provide this. Let's examine some potential components.

Experimentation & Prototyping – Jupyter Labs

Jupyter Labs offers a versatile Integrated Development Environment (IDE) suitable for the creation of preliminary models and proof-of-concepts. It provides access to notebooks, scripts, and command line interfaces, all features that are often well known to data scientists.

As an open-source tool, Jupyter Labs boasts seamless integration with Python and R, encompassing the majority of contemporary Data Science model development tasks. Most data science workloads can be conducted in the lab IDE.

Environment Management – Anaconda

Image by Author: Schematic of Anaconda virtual environments and model sharing in model development labs

Effective environment management can streamline subsequent MLOps workflow steps, focusing on safe access to open-source libraries and reproducing the development environment. Anaconda, a package manager, allows data scientists to create virtual environments and install necessary libraries and packages for model development with its simple Command-Line Interface (CLI).

Anaconda also offers repository mirroring, which assesses open-source packages for secure commercial use, though the associated risks of third-party management should be considered. The use of virtual environments is crucial in managing the experimental phase, essentially providing a contained space for all packages and dependencies for a given experiment.

Version Control & Collaboration – GitHub Desktop

Collaboration is a crucial part of a successful model development lab, and leveraging GitHub Desktop is an effective way to facilitate this. Data scientists, through GitHub Desktop, can create a repo for each lab. Each repo stores the model development notebook or script, along with an environment.yml file that instructs Anaconda on how to reproduce the environment in which the notebook was developed on another machine.

The combination of all three lab components Jupyter Labs, Anaconda, and GitHub provides data scientists with a safe space to experiment, innovate, and collaborate.

#An example environment.yml file replicating a conda environment

name: myenv
channels:
  - conda-forge
dependencies:
  - python=3.9
  - pandas
  - scikit-learn
  - seaborn

Model Pipeline Development

In my discussions with clients who are in the early stages of their MLOps maturity, there seems to be this idea that data scientists develop models and then "hand over" to Machine Learning engineers to "productionise". This approach doesn't work and is probably the quickest way to lose your machine learning engineers. Nobody wants to deal with someone else's messy code, and quite frankly it's unfair to expect this of your engineers.

Instead, organisations need to foster a culture where data scientists are responsible for developing models within data labs and then formalising them as end-to-end model pipelines. Here's why:

Data scientists understand their models better than anyone else. Making them responsible for creating the model pipeline will improve efficiency.
You establish a culture of software engineering best practices at every stage of development.
Machine learning engineers can focus on aspects of their job that add value, such as resource provisioning, scaling, automation, instead of refactoring someone's notebook.

Building end-to-end pipelines may seem daunting at first, but thankfully there are tools targeted at data scientists to help them achieve this.

Model Pipeline Build – Kedro

Kedro is a Python open-source framework from McKinsey Quantum Black to assist data scientists in building model pipelines.

# Standard template for Kedro projetcs

{{ cookiecutter.repo_name }}     # Parent directory of the template
├── conf                         # Project configuration files
├── data                         # Local project data
├── docs                         # Project documentation
├── notebooks                    # Project related Jupyter notebooks
├── README.md                    # Project README
├── setup.cfg                    # Configuration options for tools 
└── src                          # Project source code
    └── {{ cookiecutter.python_package }}
       ├── __init.py__
       ├── pipelines
       ├── pipeline_registry.py
       ├── __main__.py
       └── settings.py
    ├── requirements.txt
    ├── setup.py
    └── tests

Kedro provides a standard template for building end-to-end model pipelines with software engineering best practices. The concept behind it is to encourage data scientists to build modular, reproducible, and maintainable code. Once a data scientist completes the Kedro workflow, they've essentially built something that can be more easily deployed to a production environment. Here are the overarching concepts:

Project Template: Kedro provides a standard and easy-to-use project template, enhancing structure, collaboration, and efficiency.
Data Catalog: The Data Catalog in Kedro is the registry of all data sources that the project can use. It provides a straightforward way to define how and where data is stored.
Pipelines: Kedro structures your data processing as a pipeline of dependent tasks, enforcing a clear code structure and visualising data flow and dependencies.
Nodes: In Kedro, a Node is a wrapper for a Python function that names the inputs and outputs of that function, serving as the building blocks of a Kedro pipeline.
Configuration: Kedro manages different configurations for various environments (development, production, etc.) without hardcoding any configuration into your code.
I/O: In Kedro, I/O operations are abstracted from the actual computation, which increases code testability and modularity and eases switching between different data sources.
Modularity and Reusability: Kedro promotes a modular coding style that results in reusable, maintainable and testable code.
Testing: Kedro integrates with PyTest, a testing framework in Python, making it easy to write tests for your pipeline.
Versioning: Kedro supports versioning for data and code, enabling reproduction of any previous state of your pipeline.
Logging: Kedro offers a standardised logging system for tracking events and changes.
Hooks and Plugins: Kedro supports hooks and plugins, extending the framework capabilities as per project requirements.
Integration with other tools: Kedro can be integrated with various tools like Jupyter notebook, Dask, Apache Spark, and others to facilitate different aspects of a data science workflow.

All Kedro projects follow this basic template. Enforcing this standard across your data science teams will enable reproducibility and maintainability.

For a more extensive overview of the Kedro framework, please visit these resources:

Kedro Documentation: link
The Importance of Layered thinking in Data Engineering: link

Registry & Storage – Data Version Control (DVC)

Registry and storage underpin reproducibility in machine learning, something that any business looking to incorporate ML should bear in mind. ML models are essentially composed of code, data, model artefacts, and environment – all of which must be traceable for reproducibility.

DVC is a tool that provides version control and tracking for models and data. While GitHub could be an alternative, it's limited in its capacity to store large objects, posing issues for extensive datasets or models. DVC essentially extends Git, offering the same version control capabilities while enabling storage of larger datasets and models in a DVC repo, which can be either local or cloud-based.

In commercial settings, there are obvious security benefits to versioning your code in a Git repo, while storing actual model artefacts and data separately in a controlled environment.

Remember, model reproducibility will become increasingly important as regulations tighten around the use of AI commercially. Reproducibility facilitates auditability.

Model Pipeline Deployment – Docker

Image by Author: Schematic of Docker deployment of inference pipeline. Note this same approach can be applied to the model monitoring and retraining pipeline

Deployment isn't merely a single task but rather a meticulously crafted fusion of tools, activities, and processes; Docker ties all these together for model deployment. Crucial for intricate ML applications with numerous dependencies, Docker ensures consistency across any machine by encapsulating the application with its environment.

The process begins with a Dockerfile; Docker then uses its commands to construct an image, a ready-packaged model pipeline fit for any Docker-enabled machine. Teamed with Kedro's pipeline functionality, Docker can proficiently deploy both model retraining and inference pipelines, assuring reproducibility across all stages of the ML workflow.

Model Monitoring & Retraining Pipeline – MLflow

Over time, machine learning models suffer from performance deterioration, which can be due to concept drift or data drift. We want to be able to monitor when our models' performance begins to falter and re-train them when necessary. MLflow provides us the ability to do this via its tracking API. The tracking API should be incorporated into the model training and inference pipelines built by the data scientists. Although I have specified MLflow for tracking in the model monitoring and retraining pipeline, tracking can also be done in the model development lab, particularly for experiment tracking.

The Inference Endpoint

Given that the inference pipeline has been encapsulated into a Dockerfile, we can create a Docker image of the pipeline anywhere to be used as an API endpoint for any app. Depending on the use case, we will have to decide where we deploy the Docker image. That, however, is beyond the scope of this article.

Roles & Responsibilities

Assigning distinct roles and responsibilities within MLOps is pivotal to its success. The multifaceted nature of MLOps, which spans across disciplines necessitates a clear demarcation of roles. This ensures that each task is performed efficiently. Further, it fosters accountability, facilitating a quicker resolution of issues. Lastly, clear delegation reduces confusion and overlap, making the team more efficient and helping to maintain a harmonious working environment. It's much like a well-oiled machine, with each cog playing its part to perfection.

Data Scientists

Role: The main function of data scientists within MLOps strategies is to concentrate on model development. This encompasses initial experiments, prototyping and setting up modelling pipelines for validated models.
Responsibilities: Data scientists ensure models adhere to machine learning best practices and align with business cases. Beyond lab tasks, they engage with business stakeholders to identify impactful solutions. They take full ownership for the data labs, a lead data scientist should set the operating rhythm and best practices for setting up labs.

Machine Learning Engineers

Role: ML engineers oversee the MLOps' technical infrastructure, exploring innovative solutions, crafting strategies alongside data scientists, and enhancing process efficiencies.
Responsibilities: They ensure the functionality of the technical infrastructure, monitor performance of components to control costs, and confirm production models meet demand at the required scale.

Data Governance Professionals

Role: Data governance professionals maintain security and data privacy policies, playing a pivotal role in the secure transfer of data within the MLOps framework.
Responsibilities: Although data governance is everyone's responsibility, these professionals create policies and ensure compliance through regular checks and audits. They keep up with regulations and ensure compliance from all data consumers.

Conclusion

Navigating the realm of MLOps is a task that demands deliberate planning, the right blend of Technology and talent, and an organisational culture that endorses change and learning.

The journey may appear complex, but by employing a well-designed blueprint and by approaching MLOps as a holistic, iterative process rather than a one-off project, you can derive immense value from your AI strategies. Remember, though, that no single approach fits every scenario. It's crucial to tailor your strategy to your specific needs and to remain agile to changing circumstances.

Follow me on LinkedIn

Subscribe to medium to get more insights from me:

Join Medium with my referral link – John Adeojo

Should you be interested in integrating AI or data science into your business operations, we invite you to schedule a complimentary initial consultation with us: