Back

 Industry News Details

 
Machine Learning, from Single Core to Whole Cluster Posted on : Jun 23 - 2021

The demand for production-quality software for mining insights from datasets across scales has exploded in the last several years. The growing size of datasets throughout industry, government, and other fields has increased the need for scalable distributed machine learning solutions that can make full use of available hardware to analyze the largest datasets. This article is intended to provide a brief introduction to just a few of the many available tools for machine learning across scales. First, we will look at interactive tools suited for exploratory data analysis on a single workstation. Next, we will consider the development of machine learning pipelines for small-to-medium datasets on a single node. Finally, we will survey some of the solutions available for leveraging cluster resources for large-scale machine learning applications.

Getting Acquainted with Your Data: GUI Machine Learning Frameworks

For users who are new to machine learning, or for those who prefer an interactive interface for preliminary data exploration, GUI-based tools like Weka and Orange are great options for quickly getting acquainted with a dataset. Both packages have facilities for loading, sampling, transforming, and visualizing data, as well as for applying and evaluating supervised and unsupervised models.

Weka in particular has an impressive selection of algorithms, while Orange has an especially intuitive, elegant interface based on a directed network model. While these tools are not suited for production-scale processing of large datasets, they represent a convenient means of guiding early decision-making in building up machine learning pipelines. View More