Can external people be productive data scientists?

godatadriven/
11 March, 2015

Now that some larger services companies are starting to hire "data scientists", the hype is clearly
heating up. So time to consolidate! In a series of short blog posts I will touch upon some topics
related to data science and describe how I see and feel about those.

Skills and domain knowledge

In a recent
article,
Omid Shiraji, CIO at Working Links is quoted to say that "[y]ou need skilled people internally that
understand your data and how it creates something useful for the organisation. Externally contracted
people can't do that." Now, since we run a business where you can hire
external, highly skilled people, I started to ponder upon this statement.

I presume the basis of Shiraji's argument is the well-known
Data Science Venn Diagram. This
diagram labels data science as the overlap of "Hacking Skills", "Math & Statistics Knowledge" and
"Substantive Expertise". Based on my experience in data science, I do not doubt that the combination
of "Math" and "Hacking" is essential, but I've always struggled with "Substantive Expertise". Reason
is that "Math" and "Hacking" are translational skill sets (they can be applied to any domain),
while "Substantive Expertise" is tied to some domain. For example, I could apply my knowledge on
time series analysis in a Financial domain or in a Health domain, while much of the domain knowledge
from the Financial domain is not directly applicable in the Health domain (unless you would to
remark cynically that Health is all about Finance, nowadays).

People with domain knowledge are very valuable to any organisation, and we need actual business
problems to guide our analyses and development. Otherwise, we're just doing research. Moreover, it
helps when a data scientist knows the organisation, what it stands for and what current problems and
challenges are. Also, it definitely helps to know where to get internal data sets and how to
interpret these data. But still, these questions can be asked to people that are not data
scientists. The same goes for the interpretation of analyses done by data scientists: when clearly
presented in the context of the business problems at hand, people that have substantive expertise
can interpret the results and contribute.

An example from science

The question on whether external people can be productive reminds me of research I did when I was
at the Leiden University. Back then, I looked into
converging fields of science. For me,
converging fields are those that start to merge their research. Often such merging is temporary, but
ever so often it can result in a new field. A well-known example is bioinformatics, where advanced
statistical and computational techniques are combined with (biological) genetics. Often, this
merging is the result of external researchers who start to apply tools and techniques from their own
field to research topics of some other field. In the papers of such converging fields, researchers
combine references to boths fields: one for the methods and techniques and one for the the research
topics and domain knowledge. Subsequent papers are submitted to journals in the "external" field
and, when deemed relevant and up to standard, these papers will become part of the knowledge base of
the converging field.

The take home message is that those "external" scientists were able to produce relevant research in
a field that was (initially) not their own. And this again illustrates that skills related to tools
and techniques can be translated to other knowledge domains. For me, this shows that external data
scientists with an inquisitive, open, attitude can be productive in any domain, provided that they
have access to people that have substansive expertise and who are willing to transfer that expertise
to the data scientists.

Conclusion

Now, I strongly applaud companies that start to invest in strong teams of internal data
scientists. As I stated in a previous blog post,
this is much better than only to invest in technology. And it is also way better than to invest in
only external people, because then you will not internalise data science capabilities. In the end,
I think that perhaps it's better to change "Substantive Expertise" in the Data Science Venn Diagram
to "Inquisitive Attitude and Business Sense". This skill is the ability to learn fast, combined with
an open mind set to and a sincere interest in the domain where you are applying your skills. Such
people people can be external to your organisation and still create business value.

But to suggest that external people cannot be productive, is simply not true.

Subscribe to our newsletter

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.