The Shortcomings of Data Analysis

Screen Shot 2014-07-12 at 11.50.27 AM

Over the past year, my belief that more information can lead to meaningful change is waning. One thing is certain, more information in open and digital formats has tremendous potential to improve society, companies and lives – but it has limits. No amount of information will solve homelessness, poverty, environmental pollution or other serious problems that we are facing.

The Open Data mouvement of which I was an active part and still play a role in remains very important. Yet fundamentally, it can only lead to incremental improvements to a system that is arguable broken at its roots. Take for example the debate around homelessness, an emerging trend is the comparison of costs between the medical, policing related to their existence with the cost of offering housing, money and assistance. Social scientists have crunched the numbers and clearly demonstrate it is more cost effective to house a person at taxpayer cost than to let them live on the streets and land up in the hospital or in jail. This analysis was made famous by Malcolm Gladwell’s story of Million Dollar Murray and has since been confirmed by other studies and used by the city of Cleveland in its recent attempts to end homelessness. Yet, should we be making these types of decisions based on monetary costs?

Harvard professor Michael Sandel has repeatedly argued that we have strayed too far down the path of financialization of our decisions (The lost art of democratic debate). I would go further and argue we have relied too much on data analysis and not enough on morality. If we were to look at statistics on the state of black youth in the United States, where over 60% get arrested once in their life (more sad stats), we could almost say that they must be genetically prone to a life of crime. Of course, we know it is rather their social environment and state discrimination that has led to this horrifying statistics. The latter decision is a moral one that returns to the idea that all humans are created equal – this is not based on any data analysis, but rather on our deep rooted morality and centuries of struggle for social justice.

The list areas of society where we have turned towards data analysis instead of meaningful debate and morality is long. When I worked as an environmental consultant, we conducted life cycle analysis of products. The goal was to determine the impact of a product on the environment by determining the impact of all of its components – ressources, transport, waste collection, etc. We could then compare alternatives and try to piece together a less impactful product by swapping parts or changing transport methods. When asked how he achieved massive cost savings in rocketry, Elon Musk, todays greatest industrial innovator, stated that they reason from first principles (lecture). Instead of simply building on existing rocket technology and doing data analysis, they returned to basics and ask fundamental questions. Elon and his team asked, “what are the lowest possible costs, based on physics, for rockets to be built and launched?”. Returning to lice cycle analysis, what I found after two years of work was that the best way to reduce the impact on the environment of products is not to swap parts but rather to return the original design, and rethink it from the ground up. That is a much harder task.

Fundamentally, this inability to analyze complex systems and determine solutions from data analysis is tied to chaos theory and complexity (and quantum mechanics, but that’s another rabbit hole). Systems – human and technological – are so complex that true innovation can only be done through deep reflection. Another interesting example of the failure of algorithms to solve problems is search and rescue technology used to find sailors who have been thrown overboard. In this great article about a fisherman who was thrown overboard, they describe the use of a computer algorithm to predict his location based on the weather and ocean currents. After days of searching, they returned to the old methods and eventually found him. The fisherman had latched onto a lobster cage, which altered his path dramatically. The algorithms could not possibly have taken that into account. I am not saying that all technology is bad or that we should return to stone tablets, but rather that we should not think that we can simply outsource thinking to computer algorithms or data analysis.

This thought was discussed by Noam Chomsky at a recent presentation at Google. He was asked about data analysis, AI and innovation through statistical analysis of things like search terms and large data sets. He responded that deep insights about things such as linguistics, his field of expertise, were not and cannot be brought about through statistical analysis of language. Rather, innovation in understanding language is done through insights that are then confirmed by data, not the other way around.

A last example of the failure or upcoming failure of data analysis is the idiotic trend towards smart cities. Adam Greenfield wrote a highly insightful book entitled “Against Smart Cities (buy)”. Greenfield explains how certain governments are attempting to build systems that monitor and calculate everything in a city from the size of policing forces to street size and resource allocation. Even Montréal is going down this path with their recent Smart City initiative and their restructuring of funding based on mysterious algorithms developed by bureaucrats. This tactic has been tried and has failed. Just in Montréal, top-down planning based on ‘data’ led to things like the Mirabel airport that is now scheduled for demolition (link) and car centric monstrosities such as the Parc-Pine interchange (photos and details). Those two situations took statistics – the number of flights (link) to Montréal and the number of cars in Montréal – and simply extrapolated them based on years. Both failed to account for changing economic conditions, regulatory frameworks and physical limitations of auxiliary infrastructure. The point here is that no matter how much data you have, there is inevitably important data that you do not have and can never have. It is therefore imperative that your decisions be based on logic that has been challenged through debate, not just data.

If we should not make large decisions based on data, it follows that large data analysis or access to more data is not likely to lead to meaningful positive change. At best, we can hope for incremental improvements or optimization. When I began working in the Open Data mouvement, I thought more access to data could actually change power politics. But, now I am rather less certain. Data is necessary, not not the enough. In a capitalistic society, like ours, money is power. If we want to empower people we need to give them actual power, which really means monetary capital. In a great article by Adam Greenfield, he stated quite eloquently that technological or even structural changes in resource allocation will not liberate individuals, he said:

“My mistake in the past — and, in retrospect, it’s an astonishingly naïve and determinist one — was to think that emergent networked forms of shared resource utilization might in themselves give rise to any particularly liberatory politics of everyday life. Experience has taught me that such notionally transformative frameworks as do arise very readily get appropriated by existing ways of valuing, doing and being; whatever emancipatory potential may reside in them swiftly falls before path dependency and the weight of habit, and the gesture as a whole comes to nought.” Link

This thought is echoed and backed up by mountains of data in the recent best seller Capital in the 21st Century. At the end of the second part of the book, Thomas Picketty clearly states “Si l’on souhaite véritablement fonder un ordre social plus juste et rationnel, fondé sur l’utilité commune, il n’est pas suffisant de s’en remettre aux caprices de la technologie”. This basically translates to “If we want to truly change the social order and make it more just and fair, based on common utility, it is not enough to rely on technological innovation”. And while he is talking about the ability of new technology to change the old order, the argument could easily be extended to data. No amount of data will bring about a just world and it remains unclear data even bends the arc of history towards justice.


P.S. These are the reasons I want more deep, challenging debate in society and why we have started the Fight club politique

A couple more links bibliography

How Politics Makes us Stupid – and More Information will not help, it will hurt

TED talking of Stephen Pinker and Rebbecca Goldstein Animation on the power of reason

GE rebuilding a home heater made in China (NPR)

Published on July 12, 2014