DATA SCIENCE SURVEY 2020 – INSIGHTS
Introduction
Data Science Survey 2020 | Survey Team | Data Science Earth
Data Science Earth was established on the purpose of producing high-level data science solutions in 2019. The community whose members are entirely volunteers primarily aims to develop data awareness and ensure the correct use of data power in a globalizing competitive environment.
Data Science Survey Team which is part of organization released its first survey to obtain some exploratory insights about Data Scientists all over the world in 2020. Survey aims not only to explore some insights but also to be a guide to all Data Scientists who are just new in the Data Science ecosystem and who are already in this sector and want to improve themselves.
The guide answers many frequently asked questions about Data Science including:
- Where should new generations start?
- What more can be added to current skills?
- Which tools are necessary to learn, and which ones will most probably lead in the future?
- What are the options to improve our skills?
- What is the purpose of data science?
When it comes to results of the survey, the dataset includes many different perspectives from different regions. The different regions imply almost 40 countries and many cities all over the world. After our Survey Team had handled the pre-processing of dataset, they presented interactive dashboards to explore some insights for us.
Let’s continue step by step 😊
What is the Demography of Data Scientists?
Here are some of our discoveries about demography of data scientists all over the world!
You can find some information about country, age, gender, major, education level and business level of respondents;
- Even if the major of respondents come from Turkey, this lasts with USA, India, Azerbaijan, and Australia.
- Although the males are clearly dominant according to females, when it is focused on generation Z, the number of male and female approaches each other. It clearly shows us that female labor force participation rate will increase for the Data Science area in the next future.
- It can be easily seen that the total of generation Y and Z are almost entire of survey. The insight can be that the data related areas and the business title as Data Scientist is new concepts for business life. But when it is focused on just generation X, their business level mostly academician. So that, it can be interpreted that the academy goes before business life.
- Most of respondents generally are professional (junior), expert or student. It is clearly seen that reversed mentoring is very important for companies. Because the young talents will ensure that companies become data-driven companies in the next future.
- While major of the participants are respectively Computer / Software Engineering, Statistics / Statistics and Computer Science etc. and Industrial / Management Engineering, their degree is mostly bachelor. Statistics / Statistics and Computer Science is largely preferred for the master’s degree. It shows us that Statistics / Statistics and Computer Science is basis of Data Science.
- Data Analysis / Reporting, Data Science, Business Intelligence / Decision Support, Business Development & Process Improvement and Artificial Intelligence are the dominant business title of Data Science area for private sector.
What is the Business Information of Data Scientists?
Here are some of our discoveries about business information of data scientists all over the world!
You can find some insights about experience, business field, company and data size, speaking languages and salary of respondents;
- Most of respondents know Turkish as proficient level since most of them come from Turkey. This lasts with English, German, Arabic, and Spanish. It is a normal result because these languages are commonly spoken around the world. But in these languages while language level of respondents who say to know English and Spanish is upper intermediate and advance, this level is elementary for German and Arabic yet.
- When we look at the experience of Data Scientists, the average is approx. 7 years and this supports that Data Science is almost new working area for business. When it is focused on the experienced leaders, business fields as consulting and education directly attract the attention. These leaders certainly shape the future for business.
- Most of respondents work in companies whose size in terms of number of workers are between 150 and 5000. But the companies that have less than 10 workers should be also underlined. While these boutique companies especially are accepted by fresh graduated or comparatively less experienced talents, they also handled mostly relatively less sized data.
- Most of respondents work on data whose size are between 100 MB and 1 TB. While the business fields are distributed smoothly including Computer (Hardware, Software, Hosting), Finance and Insurance, Consulting, Education, Legal Services, Automotive and Textile, it is found out that more data sized is considerably correlated with Finance and Insurance and Computer (Hardware, Software, Hosting).
- The salary is a really important issue to should be spoken on it. Before data pre-processing, the raw data set not only include two type of salary periods including monthly and yearly but also include three different currency types including TRY, USD and EUR. This issue is a critical data handling process to convert the raw data to just one type as monthly salary period and USD currency type. After that, salary can be become to analyze. It is clearly seen that more experience more salary. Especially after 16 years’ experience in sector, it can be got more salary with achievement of business titles as manager and above on organizational chart. The insight is that the salary is sharply getting to increase after a certain experience in sector.
What are the commonly used Data Science tools and for what purpose do they use?
Here are some of our exploratory insights of data scientists all over the world!
In this section, we will provide some insight on the tools that Data Scientists use the most and for what purposes. Nowadays one of the most frequently asked questions is which tools should I learn for Data Science and where and how should I start. The results of this survey show exactly the insights that answers these question marks.
- As a result of the incoming answers, it is understood that approx. 30 different tools are used extensively by Data Scientists. Data Science always works by combining the most suitable tools to do the best job. For this reason, it would be wrong to set standards such as the following tool should be used in Data Science. The fact that we always have to keep in mind, using the right tool for every job will increase the quality and minimize the costs.
- When we look at the Top 5 Data Science tools; Excel, Python, SQL, Anaconda and R are standing out.
- We must admit that although we have the opportunity to use direct data from different sources today, Excel is one of the most frequently used data sources from the past to the present. However, as our resistance to use new tools decreases and our habits change, Excel usage rates will decrease. As a matter of the fact, when we look at the Z generation, it is seen that this rate has decreased from 68% to 57%. The important point here is that when Z GENs start learning data science, many different tools are on their hand and their habits are formed accordingly.
- Data Science is used in improving business processes and developing new products by using Statistics and Computer Sciences together. As can be seen in the survey results on statistics and computer science, the most used tools are Python and R. While R has been preferred more in this regard in the past, today Python is preferred because it has a strong interpreter, its use by big technology companies, it contains many different libraries and they are easy to use. You can easily see this difference, especially if you filter Junior employees and the Y-Z generations.
- Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing. Thus, it takes its place in the Top 5 among the tools used by the participants. Another exploratory insight is that Python users prefer the Anaconda platform and Jupyter Notebook in it.
- Much of the world’s data resides in databases. SQL (or Structured Query Language) is a powerful language which is used for communicating with and extracting data from databases. As in the results given by the participants, one of the systems used by Data Scientists in daily life is SQL. A working knowledge of relational databases and queries is necessary if you want to become a data scientist.
- Although there are many tools in the field of Data Science today, each one differs from the other with a feature. The greatest flexibility of Data Scientists is due to the variety of these tools. When we asked this in the survey, Data Scientists stated that 89% of them were satisfied with these tools. One reason for this high satisfaction rate might be that most of the Data Science tools are open source and have very strong communities behind them.
- When we look at the purpose of Data Science usage, the results do not surprise us at all. The Top 5 include Statistical Analysis, Data Management, Artificial Intelligence Applications, Decision Support and Report generation. This situation reflects the basic process of Data Science to us. The process is starting from collecting and managing data, creating a model by applying statistical analysis on it and making decisions on created reports at the end of the day.
How long does a Data Scientist work per day?
Here it is meaningful to make a distinction first. Rather than working, how much of this work do we work efficiently?. This was one of the other most frequently asked questions for us. We have reserved a section about this in our survey. We asked data scientists how long efficiently they work and how much R & D they do per day. When we look at the results after performing the necessary operations on the outliers;
- It is understood from the survey that Data Scientists can work efficiently on average 5.4 hours a day. White collars around the world work on average 7-9 hours a day. This is much-debated issue today. In last year Microsoft tested out a four-day work during a week in its Japan offices and found as a result employees were not only happier but also significantly more productive.
- Data Science is a constantly changing and developing field. That’s why Data Scientists have to take time to improve themselves apart from their daily work. At that point when we examine the results of the survey, it is seen that Data Scientists allocate an average of 2.5 hours a day for R&D studies.
- On the other hand, when we look at what Data Scientists do to improve themselves during the avg. 2.5 hours/day ; It seems that a high percentage (> 83%) of participants are using online channels. Especially online trainings, content and communities are the channels Data Scientists have chosen to develop themselves. It seems that this situation will increase considerably after the COVID-19. Otherwise, Data Scientists stated that they improved themselves by participating in in-company trainings. It is understood that companies that want to make a difference from the data should give importance to in-company training in order to keep and develop the human resources they have in Data Science field.
- When we consider on these two averages, Data Scientists spend an average of 8 hours working in a day. In this case, we recommend all data scientists to have fun with the other 8 hours, and rest the remaining 8 hours.
Are Data Scientists satisfied with their current situation and what are they planning for the Future?
Although Science is a fact, we should not forget that Data Scientists are human. Human beings are seriously affected by their living and work conditions. If the most suitable living conditions and working conditions can be provided to scientists, then they will carry Science to the place it deserves in the highest quality. Therefore, developed countries are always ahead in science.
- When we look at the numbers, the situation is unfortunately not very satisfying. While some of the participants (30-40 %) preferred not to comment on both work and living conditions, others (40-50 %) stated that they are not satisfied. The problems experienced all over the world and the recurrence of these problems for a long time affect people’s views. Another reality is the fact of globalization. Although it has good sides, its bad effects are faster and bigger in problematic times. All links in the supply chain are affected. For this reason, if we want success in Science of the World, the whole should be happy, not a certain segment. We believe that both states and companies will take the necessary precautions in the following days.
- Data Scientists (40%) stated that although they are not satisfied with their job and living conditions, they want to improve themselves in their fields of work as a future plan in the survey results. We see this as a positive situation. This rate shows us that Data Scientists have high hopes for their jobs and science. Apart from this, those who are not satisfied with the current situation and want to change jobs and go abroad also have a serious rate (30%). Finally, the entrepreneurs (15%), the boss of his own business, are among those who do not lose hope. We hope all Data Scientists can do whatever they want best.
Conclusion
In general, we see that the Data Science ecosystem continues to evolve with young talents and its dynamic structure. As every technology has a hype circle, this is valid in this field too. When we analyze the results in depth, we understand that we are in the early stages of this circle.
In particular, it seems that the Y and Z generation will have a serious voice in this field in the coming years and will direct this field. It is understood that companies which want to make a difference through the data should support these young talents and invest in this field.
In this area, where online content consumption increases especially in the direction of learning, solutions are also produced quickly. The flexibility and diversity of the tools used and the strong communities behind these tools provide considerable flexibility and convenience for Data Scientists.
Last but not least, big thanks to our heroes – Ahmet Cengiz Bayraktar, Çağrı Aksu, Halil İbrahim Akça, Kerem Kargın, Mehmet Sezer, Yunus Emre Mızrakçı who contributed to this work. We can say that Data Science is a field where solutions are created in an enjoyable way as knowledge and experience are shared. We created this study to be a guide for Data Scientists. Our request from you, let us continue to contribute to Data Science all together and beautify the world with Data Science.
Data Science Survey Team
- Ahmet Cengiz Bayraktar
- Çağrı Aksu
- Halil İbrahim Akça
- Kerem Kargın
- Mehmet Sezer
- Yunus Emre Mızrakçı
Kenan Gözoğlu
Tebrikler güzel çalışma
Çağrı Aksu
Teşekkürler Kenan bey.