According to the Harvard Business Review, data scientists have the “sexiest job” of the century. Their ability to create knowledge from huge data sources is essential for the digital transformation of companies. This also explains the enthusiasm for this professional group, which opens up data sources, improves them and can even turn them into money. However, not all companies find the candidates that match their desired profile.
Citizen data scientists can expand the search environment in these cases. Putting the term “citizen” in front of the job title of “data scientist” may seem confusing at first glance. Specifically, it is about people without specific scientific training, in contrast to the technical and highly specialized data scientists. A profession that, according to Gartner, “creates models using advanced analytical techniques or predictive features, but whose original function is outside the field of statistics and analytics”. Citizen data scientists tell stories about a company based on company data by translating this data into a language that everyone can understand. In theory, they combine the skills of several specialists: mathematicians, computer scientists and statisticians – even without a specific scientific education.
Reading tip: What data scientists have to do
But what – in addition to technical expertise – ultimately makes the difference is the “soft skills”. Above all, data scientists must be curious. They must be able to identify potentially useful information in a large amount of data, work out this interest for the other employees or departments and “translate” it.
According to Gartner, 40 percent of data science tasks will be automated by 2030. By making these technologies accessible to an expanded group of employees, companies can promote the (further) development of the “Citizen Data Scientist”. In practice, this can also be based on simplified analysis tools.
The emergence of this new generation of data scientists leads to a positive feedback effect. On the one hand, through qualification and, on the other hand, through modern tools that can relieve users of part of the complexity, developers, analysts, technicians and specialist users have the opportunity to develop into a citizen data scientist.
The following questions should therefore be answered at the start of a project:
How do I get access to the data?
What is the quality of the data?
What do I have to do to get a high quality, consistent set of data that I can use to train and test my model?
How do I create the data set for training and validating the model?
How can I simply train several models based on different algorithms and then automatically select the model with the best result?
By reducing the technical requirements, you can concentrate on the procedure and method in employee training:
What does the data mining process look like?
What do I have to do to get good data quality?
Which machine learning method (supervised / unsupervised, clustering, classification, regression) is the most suitable to solve the problem?
Practice has shown that it is not efficient to separate the role of the data scientist from that of the developer, analyst, technician or specialist user. Because of this separation, a lot of knowledge has to be transferred from one role to the other. The following questions should be answered in such a project:
Who or what provides which data for what purpose?
What information (stories) is there in the data?
Which statements are possible and reasonable based on this data – and which problem is addressed with it?
One would have to explain this context to a data scientist before he can provide meaningful support. So it makes more sense to provide a data scientist with advice to the developer, analyst, technician or specialist user. The Citizen Data Scientist can bridge the gap between traditional data analysis and the advanced techniques of data scientists.
By harmonizing the various interest groups and contributing to “data democratization”, the “everyone-data scientists” could become the figurehead of the company’s own data culture. Still, even using sophisticated tools, there are very specific skills needed to find the real gold in the data, and they require the use of experts. The Citizen Data Scientist is therefore in no way a substitute for the Data Scientist: on the contrary, the two functions coexist and develop synergy effects to improve the company’s competitiveness.
Reading tip: The data scientist has to hand in tasks
Through their active contribution to building internal company data dynamics, both the Citizen Data Scientist and the Data Scientist seem to be the new epitome of the fourth industrial revolution. The importance of internal division of labor in connection with technical support was already demonstrated in the first industrial revolution. (bw)