Formulating data science problems is an uncertain and
difficult process. It requires various forms of discretionary
work to translate high-level objectives or strategic goals into
tractable problems, necessitating, among other things, the
identification of appropriate target variables and proxies.
While these choices are rarely self-evident, normative
assessments of data science projects often take them for
granted, even though different translations can raise
profoundly different ethical concerns. Whether we consider a
data science project fair often has as much to do with the
formulation of the problem as any property of the resulting
model. Building on six months of ethnographic fieldwork with
a corporate data science team—and channeling ideas from
sociology and history of science, critical data studies, and early
writing on knowledge discovery in databases—we describe the
complex set of actors and activities involved in problem
formulation. Our research demonstrates that the specification
and operationalization of the problem are always negotiated
and elastic, and rarely worked out with explicit normative
considerations in mind. In so doing, we show that careful
accounts of everyday data science work can help us better
understand how and why data science problems are posed in
certain ways—and why specific formulations prevail in
practice, even in the face of what might seem like normatively
preferable alternatives. We conclude by discussing the
implications of our findings, arguing that effective normative
interventions will require attending to the practical work of
problem formulation
Williamson TuckerProfessor
Ethics issue in Data Science
Share