Back

 Industry News Details

 
WHAT CAPABILITIES A CLOUD MACHINE LEARNING PLATFORM SHOULD HAVE? Posted on : Sep 25 - 2020

How to pick a cloud machine learning platform?

To create an effective machine learning and deep learning model, you need more data, a way to clean the data and perform feature engineering on it. It is also a way to train models on your data in a reasonable amount of time. After that, you need a way to install your models, surveil them for drift over time, and retrain them as required.

If you have invested in compute resources and accelerators such as GPUs, you can do all of that on-premises. However, you may find that if your resources are adequate, they are also inactive much of the time. On the other side, it can sometimes be more cost-effective to run the entire pipeline in the cloud, applying large amounts of compute resources and accelerators and then releasing them.

The cloud providers have put significant effort into building out their machine learning platforms to support the entire machine learning lifecycle, from planning a project to maintaining a model during production. What are the capabilities every end-to-end machine learning platform should provide?

Know your data well

If you have the extensive amount of data to create precise models, you may not want to ship it halfway across the world. Distance isn’t a problem here; however, it’s about time. Data transmission speed is bounded by the speed of light, even on a perfect network with infinite bandwidth. Long-distance indicates latency.

The ideal case for large datasets is to create a model where the data already exists so that mass data transmission can be avoided. Several databases support it to a limited extent.

Support an ETL or ELT pipeline

ETL (export, transform, and load) and ELT (export, load, and transform) are two common data pipeline configurations in the database world. Machine learning and deep learning increases the need for these, especially the transform part. ELT provides more flexibility when your transformations need to alter as the load phase is usually the most time-consuming for big data. View More