remember laughing at the scene a couple seasons ago in House of Cards where thegenius data scientist stood shirtless in a soundproof room, blared music andscreamed as he came up with deep insights to help the Underwoods use data to
win the election.
scientists spend most of their time cleaning and preparing data. They aren’t
data scientists. They are data janitors. Of course a lot of science is like
that (talk to anyone who has worked in a lab.)
over a decade now. I don’t write code and can’t do much math. So what am I?
communicator who could translate between the data science time and the C-suite decision-makers.
communicating results (not that that isn’t really important!) I help
conceptualize the project. What are the problems people are trying to solve?
This is sometimes not obvious. People don’t know what they don’t know. In
discussing the surface problem, bigger problems can emerge.
many, many cases the really interesting information is unstructured or
qualitative. What does it really mean and how do we best incorporate it? In
some projects I have played a key role in collecting the data, but in the case
of terrorism research much of the information is narrative – how do we meaningfully
describe this numerically.
them and balance them against what is already known or believed.
abstract problems, I often find myself asking for examples. At the core of the
process I have described are stories. There is the story of the client. What
are they saying about their workflow and challenges?
story with fascinating details into a number or group of numbers. In my decade
at UMIACS, I worked on projects modeling terrorist groups. Some things, like
terrorist attacks were relatively easy to turn into numerical data (how many
killed, codes for targets etc.) Other things, like information on the terrorist
group’s public statements or their internal dynamics, were a bit more
challenging to quantify. It is worth noting that computer scientists (and data
scientists) tend to be interested in type. The analysts and SMEs tend to be
interested in instance. The instance is a story, how to categorize the instance
to a type without losing too much critical information is a hard challenge. The
decisions about how this is done will shape the results – that too is part of
what is already known. What stories do we tell ourselves about this issue? How
does the analysis inform these stories? Does it upend them, modify them, or
confirm them. How confident can we be in the new findings? What are the broader
organizational impacts of these findings for the client?
values. Values are expressed through stories. Relying only on cold hard facts
will not result in acceptable policies – values have to be part of the equation.
to and telling stories.