Background¶
In the past a lot of articles have been written comparing different programming languages and tools for data science. Depending on exactly what was studied, and how the data was collected, different results were published. I think it is fair to say that SAS was the largest analytics program in the past, but it was overtaken by open source alternative, most notably R. Historically python has been used for much the same tasks as R, but did not have quite as large following in the data science field.
A trend that cannot have escaped anyone, at least not readers of this blog, is that the topic of data science has completely exploded in the last few years. It has gone from something that only geeks do for fun to a hot board room question. So which programming language is lingua franca of data science today?
What do we mean by "popular" anyway?¶
Previous studies have been made on mentions in online job postings. That is a good measure of demand for skills. However, it is difficult to obtain an unbiased result, since python is used by many people not having anything to do with data science. Also, words in job description is a measure of how popular some topics are in the minds of HR, but I have some doubts that they really know what the role really requires, and is often based on outdated assumptions.
There are many other metrics we could use to measure the popularity, e.g., activity on Stack Overflow, questions on Quora, number blog posts, discussions on twitter etc. Ultimately it comes down to what we mean by popularity.
What is measured¶
With that said, I decided to extract data from what people have been searching on with Google. The data source was Google Trends, and I pulled all data since the beginning of time until 18 August 2017. To ensure that we only capture searches relevant to data science, the search terms were "r data science", "python data science" and "sas data science". This was the result: