Back

 Industry News Details

 
Streamlining data science with open source: Data version control and continuous machine learning Posted on : Mar 03 - 2021

Can an open source-based workflow leveraging version control and continuous integration and deployment help streamline machine learning, like it did for software development?

MLOps, short for machine learning operations, is the equivalent of DevOps for machine learning models: Taking them from development to production, and managing their lifecycle in terms of improvements, fixes, redeployments, and so on.

Achieving MLOps nirvana is a major barrier to getting value out of machine learning and data science. Version control systems like Git and practices like continuous integration / continuous deployment (CI/CD) have helped operationalize software development.

What if those systems and practices could also be used for MLOps? Iterative.ai wants to address this question with open source projects Data Version Control and Continuous Machine Learning.

BRINGING VERSION CONTROL TO MACHINE LEARNING

Data engineers, machine learning, and data science practitioners work with a wide range of data. They need to have a workflow and tools to support it to keep track of their artifacts and their versions, resolve issues, and collaborate across teams and systems.

Iterative.ai is an MLOps company dedicated to streamlining the workflow of data scientists. Today they announced the latest releases of Data Version Control (DVC) and Continuous Machine Learning (CML) open-source projects.

Iterative.ai claims DVC and CML remove the need for proprietary AI platforms by extending traditional software tools like Git and CI/CD to meet the needs of machine learning Engineers. ZDNet connected with Dmitry Petrov, CEO and founder of Iterative.ai, to find out more about DVC and CML.

The goal of DVC is to bring agility, reproducibility, and collaboration into existing data science workflows. DVC provides users with a Git-like interface for versioning data and models, bringing version control to machine learning to address the challenges of reproducibility.

DVC is built on top of Git, allowing users to create lightweight metafiles and enabling the system to handle large files, rather than storing them in Git. It works with remote storage for large files in the cloud or on-premise network storage.

CML is an open-source library for implementing continuous integration and delivery (CI/CD) in machine learning projects. Users can automate parts of their development workflow, including model training and evaluation, comparing machine learning experiments across their project history, and monitoring changing datasets. CML will also auto-generate reports with metrics and plots in each Git pull request. View More