Correlation Wins
Once upon a time data was hard to come by.
https://www.theatlantic.com/science/archive/2017/01/billy-barr-climate-change/512198/
For 44 years Billy Barr lived on a mountain in the Rockies and kept notebooks about much of what he saw around him.
He'd record stuff like when a particular flower opened or bird appeared and also what the weather was like and how deep the snow was; every day for years!
He did it for the heck of it - to keep himself amused.
What a trove of data his notebooks are now for climatologists. But there was no easy way to get that data.
I remember gathering data for physics 101. One lab involved measuring a block of wood. It took an hour of painstaking and repeated measurements entered into a table and then another couple of hours of calculation to get an useful result.
Another lab involved setting up a sheet of graph paper on a sheet of glass arranged to make a slope. Carbon paper was placed on top of the graph paper. A ball bearing was pushed onto the slope and allowed to roll down.
This left an impression of the path the ball took on the graph paper. Setting up took a few minutes. Performing the experiment took a few seconds. After a couple of hours of painstaking calculation I got a number that agreed with the acceleration due to gravity.
Data can be very hard to gather and theorizing is needed to narrow the search space so we get useful information. That theorizing is based in part on recognizing patterns in that data.
We live immersed in a technology that gathers data in unimaginable quantities and then analyses that data and presents conclusions about that data.
Facebook and Google and social media in general profits hugely from that capacity. But nobody really knows how the conclusion was drawn. And the conclusions aren't perfect. Google keeps sending me ads for diapers based on my age (I think). But the conclusions are good enough that the diaper maker finds it profitable use the conclusions in marketing campaigns.
But the same sort of tech is used by science a lot. Machines like the Super Collider in Switzerland produce vast streams of data for every instant they are turned on. No human can interpret that data. The raw data get's processed by the machine into a form that some specially trained people can understand. That processing involves various sorts of machine learning. This happens a lot in Big Science. Machines can predict how a protein will fold correctly, but we can't quite explain how it works. It just does. Machines predict the weather from vast data streams fed by myriads of weather stations fed into electronic learning systems. Machines are getting better at interpreting chest xrays for certain sorts of cancer than experienced doctors are, but nobody knows in detail how it is done.
https://www.theguardian.com/technology/2022/jan/09/are-we-witnessing-the-dawn-of-post-theory-science
"Somewhere between Newton and Mark Zuckerberg, theory took a back seat.
In 2008, Chris Anderson, the then editor-in-chief of Wired magazine, predicted its demise. So much data had accumulated, he argued, and computers were already so much better than us at finding relationships within it, that our theories were being exposed for what they were - oversimplifications of reality.
Soon, the old scientific method - hypothesise, predict, test - would be relegated to the dustbin of history. We'd stop looking for the causes of things and be satisfied with correlations."
With the benefit of hindsight, we can say that what Anderson saw is true (he wasn't alone).
The complexity that this wealth of data has revealed to us cannot be captured by theory as traditionally understood.
"We have leapfrogged over our ability to even write the theories that are going to be useful for description," says computational neuroscientist Peter Dayan, director of the Max Planck Institute for Biological Cybernetics in Tubingen, Germany. ""We don't even know what they would look like."
This situation is shocking within an old paradigm of the scientific method of hypothesise, predict, test. Now it's more like gather lots of data and look for an useful pattern within.
This way of working is satisfied with correlations; as in event A correlates with event B rather than event B causes event A. It turns out that situations that are simple enough to be expressed in causal terms are actually quite rare in reality.
I don't think this is as shocking as it may seem.
The thing that makes it shocking is that we are used to thinking of causality in terms of causal chains.
Event A causes event B which causes event C . . .
In reality, rather than a causal chain we have a causal web. Any event has an innumerable number of causes. Saying that event A causes event B is a simplification that leaves all the other influences on B out of consideration.
In effect, causal explanations just express correlations that people feel are very secure. Sometimes causal explanations turn out to be wrong.
The human brain takes in data from senses and memory and acts on it.
If I'm crossing the street and a car isn't slowing down properly I start sprinting away before I can think about it.
But if someone asks why I sprinted away I need to answer with a causal story that leaves out most of the influences on me at the moment.
This causes the illusion that those influences weren't there.
Causal explanations may not be necessary for science. Science deals with what actually works and the "shut up and calculate" way of thinking is common. But causality may be significant when it comes to social matters. Perhaps it's significant for anti-vaxxers. They point out that (for instance) that a vaccine doesn't provide certain protection against infection. They may think that the doctors don't know what they are talking about since they don't have a hard causal explanation for why that is so. All the doctors have is a correlation.
I can sympathize. If people try to tell me what to do I need to know the story that justifies the demand. Correlation is generally to complex for stories. A story has a linear structure where one event causes another in an easy to understand way.
But with vast streams of data and machine learning we may be having to deal with stuff where stories don't work very well.
What do you think?