Executive Briefing

Top programming languages for Data Science: first partial survey findings

Alessandro Piva, Big Data Analytics & Business Intelligence Observatory Director

Top programming languages for Data Science: first partial survey findings

The proliferation of data and the huge potentialities for companies to turn data into valuable insight are leading to a sharply increasing demand for Data Scientists.

 

But what skills and educational background must a Data Scientist have? What is their role within the organization? What tools and programming languages do they mostly use? These are some of the questions that the Observatory for Big Data Analytics of Politecnico di Milano is investigating through an international survey submitted to Data Scientists: if you work with data in your company, please support us in our research and complete this anonymous survey.

 

Programming is one of the five main competence areas at the base of a Data Scientist’s skill set, even if is not the most relevant in terms of expertise (see What is the right mix of competences for Data Scientists?). According to the results of the survey, that as of today involved more than 200 Data Scientist worldwide, there isn’t one prevailing choice among programming languages used for data science activities. However, the choice appears to be limited mainly to a narrow set of alternatives: almost 96% of respondents declare they use at least one of RSQL or Phyton.

 

In particular, ranking highest in the current sample we find R used by 53% of Data Scientists, supported by the R Foundation for Statistical Computing. Initially widespread mainly among statisticians or in academic environments, the use of R for data science activities has increased considerably in recent years. Today it’s one of the most popular open source languages and it’s supported by a large and helpful community.

 

Even if it was developed in the early 1970s, SQL plays a key role still today (in second position with 49% of preferences). Although SQL is not designed for the task of handling unstructured datasets (typical of Big Data), there is still a strong need to analyse structured data in organizations, and SQL is a very popular choice for data crunching.

 

The third position of the ranking is held by Python (43%), that has become very popular in recent years because of its flexibility and because it is relatively easy to learn. Like R, it also has a large community dedicated to improving the product and developing specific and vertical? packages.

Completing the top 5 ranking are Unix Shell/AWK/Gawk (15%) and Java (8%).

If you are a Data Scientist and you want to receive more detailed results with the main and final findings of the research, complete the questionnaire and leave your email in order for us to send you the material.

For more information about the research: Observatory for Big Data Analytics

For more information about the research group: Digital Innovation Observatories