Global Big Data Conference

Why Machine Learning Is the Future Posted on : Dec 01 - 2016

The first is the appearance of Intel's latest Xeon Phi (Knights Landing) and Nvidia's latest Tesla (the Pascal-based P100) on the Top500 list of the fastest computers in the world; both systems landed in the top 20. The second is a big emphasis on how chip and system makers are taking concepts from modern machine learning systems and applying these to supercomputers.

On the current revision of the Top500 list, which gets updated twice yearly, the top of the chart is still firmly in the hands of the Sunway TaihuLight computer from China's National Supercomputing Center in Wuxi, and the Tianhe-2 computer from China's National Super Computer Center in Guangzhou, as it has been since June's ISC16 show. No other computers are close in total performance, with the third- and fourth- ranked systems—still the Titan supercomputer at Oak Ridge and the Sequoia system at Lawrence Livermore—both delivering about half the performance of Tianhe-2.

The first of these is based on a unique Chinese processor, the 1.45GHz SW26010, which uses a 64-bit RISC core. This has an unmatched 10,649,600 cores delivering 125.4 petaflops of theoretical peak throughput and 93 petaflops of maximum measured performance on the Linpack benchmark, using 15.4 Megawatts of power. It should be noted that while this machine tops the charts in Linpack performance by a huge margin, it doesn't fare quite as well in other tests. There are other benchmarks such as the High Performance Conjugate Gradients (HPCG) benchmark, where machines tend to only see 1 to 10 percent of their theoretical peak performance, and where the top system—in this case, the Riken K machine—still delivers less than 1 petaflop.

But the Linpack tests are the standard for talking about high-performance computing (HPC) and what is used to create the Top500 list. Using the Linpack tests, the No. 2 machine, Tianhe-2, was No. 1 on the chart for the past few years, and uses Xeon E5 and older Xeon Phi (Knights Corner) accelerators. This offers 54.9 petaflops of theoretical peak performance, and benchmarks at 33.8 petaflops in Linpack. Many observers believe that a ban on the export of the newer versions of Xeon Phi (Knights Landing) led the Chinese to create their own supercomputer processor.

Knights Landing, formally Xeon Phi 7250, played a big role in the new systems on the list, starting with the Cori supercomputer at Lawrence Berkeley National Laboratory coming in at fifth place, with a peak performance of 27.8 petaflops and a measured performance of 14 petaflops. This is a Cray XC40 system, using the Aries interconnect. Note that Knights Landing can act as a main processor, with 68 cores per processor delivering 3 peak teraflops. (Intel lists another version of the chip with 72 cores at 3.46 teraflops of peak theoretical double precision performance on its price list, but none of the machines on the list use this version, perhaps because it is pricier and uses more energy.)

Earlier Xeon Phis could only run as accelerators in systems that were controlled by traditional Xeon processors. In sixth place was the Oakforest-PACS system of Japan's Joint Center for Advanced High Performance Computer, scoring 24.9 peak petaflops. This is built by Fujitsu, using Knights Landing and Intel's Omni-Path interconnect. Knights Landing is also used in the No. 12 system (The Marconi computer at Italy's CINECA, built by Lenovo and using Omni-Path) and the No. 33 system (the Camphor 2 at Japan's Kyoto University, built by Cray and using the Aries interconnect).

Nvidia was well represented on the new list as well. The No. 8 system, Piz Daint at The Swiss National Supercomputing Center, was upgraded to a Cray XC50 with Xeons and the Nvidia Tesla P100, and now offers just under 16 petaflops of theoretical peak performance, and 9.8 petaflops of Linpack performance—a big upgrade from the 7.8 petaflops of peak performance and 6.3 petaflops of Linpack performance in its earlier iteration based on the Cray XC30 with Nvidia K20x accelerators.

The other P100-based system on the list was Nvidia's own DGX Saturn V, based on the company's own DGX-1 systems and an Infiniband interconnect, which came in at No. 28 on the list. Note that Nvidia is now selling both the processors and the DGX-1 appliance, which includes software and eight Tesla P100s. The DGX Saturn V system, which Nvidia uses for internal AI research, scores nearly 4.9 peak petaflops and 3.3 Linpack petaflops. But what Nvidia points out is that it only uses 350 kilowatts of power, making it much more energy efficient. As a result, this system tops the Green500 list of the most energy-efficient systems. Nvidia points out that this is considerably less energy than the Xeon Phi-based Camphor 2 system, which has similar performance (nearly 5.5 petaflops peak and 3.1 Linpack petaflops).

It's an interesting comparison, with Nvidia touting better energy efficiency on GPUs and Intel touting a more familiar programming model. I'm sure we'll see more competition in the years to come, as the different architectures compete to see which of them will be the first to reach "exascale computing" or whether the Chinese home-grown approach will get there instead. Currently, the US Department of Energy's Exascale Computing Project expects the first exascale machines to be installed in 2022 and go live the following year.

I find it interesting to note that despite the emphasis on many-core accelerators like the Nvidia Tesla and Intel Xeon Phi solutions, only 96 systems use such accelerators (including those that use Xeon Phi alone); as opposed to 104 systems a year ago. Intel continues to be the largest chip provider, with its chips in 462 of the top 500 systems, followed by IBM Power processors in 22. Hewlett-Packard Enterprise created 140 systems (including those built by Silicon Graphics, which HPE acquired), Lenovo built 92, and Cray 56.

Machine Learning Competition

There were a number of announcements at or around the show, most of which dealt with some form of artificial intelligence or machine learning. Nvidia announced a partnership with IBM on a new deep-learning software toolkit called IBM PowerAI that runs IBM Power servers using Nvidia's NVLink interconnect.

AMD, which has been an afterthought in both HPC and machine-learning environments, is working to change that. In this area, the company focused on its own Radeon GPUs, pushed its FirePro S9300 x2 server GPUs, and announced a partnership with Google Cloud Platform to enable it to be used over the cloud. But AMD hasn't invested as much in software for programming GPUs, as it has been emphasizing OpenCL over Nvidia's more proprietary approach. At the show, AMD introduced a new version of its Radeon Open Compute Platform (ROCm), and touted plans to support its GPUs in heterogeneous computing scenarios with multiple CPUs, including its forthcoming "Zen" x86 CPUs, ARM architectures starting with Cavium's ThunderX and IBM Power 8 CPUs. View More

Get the