Back

 Industry News Details

 
Which Programming Language Is Best for Big Data? Posted on : May 23 - 2018

Nothing is quite so personal for programmers as what language they use. Why a data scientist, engineer, or application developer picks one over the other has as much to do with personal preference and their employers’ IT culture as it does the qualities and characteristics of the language itself. But when it comes to big data, there are some definite patterns that emerge.

The most important factor in choosing a programming language for a big data project is the goal at hand. If the organization is manipulating data, building analytics, and testing out machine learning models, they will probably choose a language that’s best suited for that task. If the organization is looking to operationalize a big data or Internet of Things (IoT) application, there are another set of languages that excel at that.

In the data science exploration and development phase, the most popular language today unquestionably is Python. One big reason for Python’s popularity is the plethora of tools and libraries available to help data scientists explore big data sets. Python was recently ranked the number one language by IEEE Spectrum, where it moved up two spots to beat C, Java, and C++, although Python trails these languages on the TIOBE Index. As a general purpose language, Python is also widely used outside of data science, which only adds to its usefulness.

Another popular data science language is R, which has long been a favorite of mathematicians, statisticians, and hard sciences. The SAS environment from the company of the same name continues to be popular among business analysts, while MathWorks‘ MATLAB is also widely used for the exploration and discovery phase of big data. You also can’t go far in data science without knowing some SQL, which remains a very useful language.

The choice of data science language may also be determined what notebook a data scientist is using.  Jupyter is the successor to the iPython notebook, and as such is closely aligned with Python, but it also supports R, Scala, and Julia. The Apache Zeppelin notebook includes Python, Scala, and SparkSQL support.

Programmers will often opt for a different set of languages when it comes to developing production analytics and IoT apps. While they may choose Python or R during the experimental phase of the project, programmers will often rewrite the application and re-implement the machine learning algorithms using entirely different languages. View More