Tom Becker, Regional Vice President Central Europe at Alteryx, warns that technological development could automate many tasks of the data scientist in the future. In the COMPUTERWOCHE interview, he explains the consequences for data scientists and how the job profile will change.
Astronaut, fireman, veterinarian – these were dream jobs of our childhood. Did the data scientist job take on this role?
Becker: In my view, it is high time to critically examine the hype surrounding this profession. The data scientist is a data scientist. Now the digital transformation is being pushed forward in the companies. For this we need people who work with data on an operational level and create analyzes. However, this does not mean that every data worker needs training as a data scientist.
As an all-encompassing wonder weapon, data science no longer fulfills its role that it may once have had in the past. We see two main reasons for this: First, data projects require specialist knowledge from the departments, so a strong vertical alignment of the data teams and more domain expertise are required. Second, more and more data science tasks are being automated, are available via the self-service app, or are just a click away in the cloud. This gives significantly more employees in the specialist areas access to analyzes.
What are the consequences and do you even advise against studying Data Science?
Becker: I don’t see it that critically. However, if more people are to work with data in the future, handling them must be simplified. We should therefore expand our knowledge of data analysis in the specialist areas. A student already learns some of the topics that are currently in the Data Science course in other subject areas, such as computer science, mathematics or mechanical engineering.
And now comes the crucial innovation. Today we can provide employees with completely new tools to evaluate data faster and easier. The cloud simplifies the way we use powerful analytics tools and the capabilities of artificial intelligence (AI), deep learning and machine learning (ML). There are also data management platforms that provide new data pipelines at the click of a mouse. Users from all departments access these self-service solutions and are able to make decisions faster.
So do we need an additional data science course in each apprenticeship?
Becker: The role of the data scientist is changing, it is becoming wider. Statisticians with a mathematical-scientific background become universal data specialists with programming knowledge. We need additional definitions to classify the more specialized fields of activity. Sure, the classic data scientist will continue to develop models that can be used to generate added value from data. However, companies need new data-savvy employees, such as data workers and data engineers. We should therefore make sure that the value of data is recognized and that appropriate data analytics courses are integrated in specialist training.
It is already fundamental for business and society that we can handle data. These skills should be taught at school. I have already given lessons in elementary school for programming Lego Mindstorms, i.e. for the robotics platform of the well-known plastic blocks. This already helps to teach children how to use data, computers and robots.
So we’re going to see another specialization of data scientists?
Becker: In any case, we needed new roles for emerging special areas around data management and data analysis. Technologies such as machine learning, deep learning and AI live from fail-safe infrastructures, always available data pipelines and, very importantly, from high data quality. This is where data engineers help to develop the IT infrastructure. Specialists such as machine learning engineers are also needed, who, for example, set up IoT environments and ensure that self-learning systems are created. Then we have the broad field of data quality. In the future, we could see a Data Quality Security Officer who ensures that the data quality is correct. Because incorrect input of an ML model also leads to incorrect analyzes of an AI application. You can apply for all of these tasks today as a data scientist. Do these experts also have the necessary domain knowledge?
So it gets more complex again because a lot of specialists are needed?
Becker: Not necessarily. The automation of processes will further simplify the work of data specialists. This is supported by a new generation of solutions for analytical process automation that simplify working together on data and analyzes. An analysis by Forrester says that Citizen Data Scientists or Data Workers will be able to complete more tasks in 2021 than highly qualified data specialists.
The following figures also show why automation is urgently required. Many employees spend most of their time looking for data. This includes the highly paid data scientists. According to IDC, data analysts spend up to 70 percent of their working time searching for data. Data workers waste up to 44 percent of their working time on unsuccessful research. In addition, data workers use between four and seven different software tools for their data-related tasks, which also means wasted time.
Are there already solutions for this automated data world?
Becker: Everyone speaks of the digital transformation in the IT industry. However, this must first reach people’s minds before it becomes a reality in the workplace. In my view, a new data culture is necessary and this is more important than hiring a group of highly paid academics. Last year there was a survey by NewVantage Partners. The IT consulting firm has determined that 72 percent of the companies have not defined a data culture and 69 percent do not see themselves as a data-driven organization. In addition, an insightful statement from the analysts of IDC: A third of the business decision-makers have considerable difficulties in using data more specifically for business decisions.
So the question arises for me how a data scientist can make a lasting difference, especially if he works in the position of a lone wolf? So it is time to talk about fundamental things in the company.
- Julia Ertl, Accenture
“You started with proof of concepts in data science projects, which were often isolated and very experimental analyzes. A lot has happened since then when it came to building IT infrastructure, and the much greater challenge is how to actually use the results. The crux of the matter is now to bring the IT infrastructure together with the organization, its processes and above all people. On the one hand, the right people have to be brought on board, on the other hand, new knowledge and new roles have to be built up. ”
- Dr. Kay Knoche, Pegasystems
“In many cases, the status quo is total blind flight, and it makes it harder than it already is. We always advise our customers to make a decision from the existing data so that at least one action is operationalized. The final results, the KPIs, can be constantly measured against each other and thereby determine which model ultimately performs best. ”
- Mehmet Yildizoglu, Data Reply
“It’s about how you can get as much as possible out of the respective use case and create added value with the different models. So you cannot say in general in advance which algorithm delivers the best fit for the problem. You have to try it, and if you want to put a solution into operation, it takes more than a pure data scientist. That is also the reason why its profile is changing: away from a purely academic perspective and towards going live, coupled with software engineering know-how. “
- Manuel Namyslo, SAP
“There is still a big gap between the data scientist and IT: models that were developed locally are discarded just because you don’t know how to integrate them into your system landscape. There is a great demand for a platform in which data pipelines can be built, models can be put into production and workflows can be stored. Because at the end of the day, the insights that I gain from the data must be reflected in the company’s business processes. “
- Walter Obermeier, UiPath
“Face recognition in China is a good example of the fact that there are always two ways of looking at data protection. On the one hand, nobody wants them to be recognized anywhere. On the other hand, you also want to have security in Europe. But the two do not work together. A machine learning tool only takes the data that is made available to it. So the danger does not come from machine learning, but from when which data can be used, how and for what purpose. ”
- Dr. Christian Schneider, wetter.com
“No matter what you invent, no matter how good it is – you can almost always misuse it for bad things. So that machine learning is not wrongly called into disrepute, the framework conditions must be set in such a way that the algorithm is only used for the corresponding task. ”
Do you think that such a mood of change is currently being heard?
Becker: Right now is the right time. Many people do their jobs in the home office and are forced to work purely digitally. Children learn how to use e-learning platforms because schools are not yet fully open. So there is already a transformation in the mind. There is a risk that we will fall back into old patterns after the situation has normalized.
At the beginning of the Corona crisis, practically all processes from everyday work were digitized ad hoc so that employees can work remotely. These processes generate new data that makes the performance of an organization transparent and shows deficits. This data can now help optimize supply chains, which will be extremely important when the economy restarts.
In our projects, we are committed to empowering employees to use this data. Self-service tools help you to carry out analyzes quickly and without IT experts, and to take on tasks that used to be carried out exclusively by data scientists. Those who have to do short-time work due to Corona can get an online course on Udacity free of charge, which teaches the basics of data science in 150 hours.
So the bottom line is: the data scientist is dead, long live the data scientist?
Becker: The operational tasks surrounding data science have changed. Data Science 2.0 is approaching us, like the automated rollout of models for data analysis. Software manufacturers such as DataRobot use ML to develop automated analysis models that employees in the specialist departments can use without a scientific background. However, in order to interpret the data, employees with statistical know-how may be required again, which speaks for the thesis that we primarily have to define new data science specialties.
Above all, we have to simplify and automate the data analysis tasks. With the new concept from Analytic Process Automation, companies create the organizational basis for this, since this unites people, processes and data. Only the combination of these three factors enables a sustainable change in the workplace, because it democratizes the use of data, making it possible for everyone. We call it a new data culture. This turns every employee into a data worker who can take on tasks in the operational area that would otherwise end up on the table of a data scientist. (fm / pg)