Accessing the hidden structure of complex systems using information theory

One of the more useful tools in the complexity scientist’s toolbox is information theory. Now, don’t worry, I’m not going to dive into this much, but I do want to talk about the central concept to information theory called Shannon entropy (or information entropy as it is also known). 

In 1952, Claude Shannon – a research engineer at Bell Laboratories – was tasked to invent a method for improving the transmission of information between a transmitter and a receiver. His invention – which he called ‘a mathematical theory of communication’ – was based on a very simple idea: surprising events carry more information than routine events. In other words, if you wake up in the morning and the sun has turned green, then that is going to jolt you into a hyper-aware state of mind where your brain is working overtime to try and make sense of what is going on. When our interactions with friends or our environment reveals information that we were not expecting, then we seek to make sense of it. We process that information with a heightened sense of consciously doing so. 

This response to surprise is no different whether we are individuals (), in a team (discovering that a colleague is also a part-time taxidermist), an organisation (the sacking of a well-respected CEO) or an entire country (the death of Princess Diana). We seek to understand why and in seeking to answer this, we traverse Judea Pearl’s ladder of causation. However, there is one key difference. When we are dealing with a complex system, or situation, then there is uncertainty over cause-and-effect. This uncertainty is the result of a structural motif of a complex system – feedback loops which I will discuss in a future post – that leads to what is called non-linear behaviour.

Information as a level of surprise is measured in binary digits (bits). The more unlikely an event is to occur, the higher the information that is generated if it should occur. Let me illustrate this with the example of flipping a coin. 

When you flip an unbiased coin there is a 50/50 chance of it landing on heads or tails. Because both events are possible – it was heads, or it was tails – then our uncertainty of the result is at its peak. We cannot have more certainty that the coin will land heads up. Here, the Shannon Entropy of flipping an unbiased coin is 1 bit which is the maximum information that can be obtained from a system (a coin flip) that can only generate two outcomes (heads or tails). 

Now, let’s assume that we’ve been given a biased coin that always lands on tails. We know that the coin is biased and so there is no surprise for us when the coin always lands on tails. If there is no surprise, then there is no information. The chances of the coin landing on tails is 100%. In this case, the Shannon entropy is 0 bits. Certainty does not yield new information.

Now, we don’t need to be too concerned with whether something is 1 bit, or 0.5 bit, or 0 bits or whatever. The point I am making here is that the greater the uncertainty we have about something, the greater the information we can gain from that situation. Likewise, if we have total certainty then there is no information, no knowledge, to be gained. Intuitively this makes sense – if I am observing something that is not changing then I am not learning anything new about it. However, if I perturb that system – add a component, remove a component – then I may be cajoling the system into a different state. This new state may yield new information, especially if I have managed to move the system into an improbable state. (Incidentally, this is why the modes of creativity – breaking, bending, blending – are fundamental to discovering new knowledge).

For Shannon entropy to be used in more practical ways, a probabilistic model of a system would need to be constructed. This simply means that we have identified the different states that a system can occupy, and we have estimated the likelihood of the system being in that state at a moment in time. We can construct a probabilistic model through observing and recording the frequency with which different states are observed. The more frequently we observe the system in a given state, over time we may infer that the system is more likely to be found in that state at a future point. Ordinarily we need to capture enough of the history of the system for us to have sufficient confidence in the probabilistic model we are building. This learning takes time and requires continual sampling of the environment; and there are some challenges to solve – like how to represent the environment – but the idea is to invest time in building a probability distribution, a probabilistic model, of our environment. Novelty is a previously unseen state and so that too should trigger a response, not least requiring an update of our probabilistic model.

As we build our probabilistic model we are forming a hypothesis, an untested belief, about how the environment behaves. Every time we observe and capture the state that the system is in, we are testing that hypothesis. The Law of Large Numbers is relevant here. We expect to see a system move in and out of different states. It may spend more time in one state than we have observed before, or the opposite. We would need to see a persistent, recurring change in the frequency with which each state of the system is observed before we begin to suspect that our hypothesis of the system may need to be re-visited.

Now that we have constructed a probabilistic model of our environment (or, indeed, any system of interest), we can calculate its Shannon Entropy. If we have a good degree of confidence that our probabilistic is sufficiently correct, then we can baseline these measures. We can then set a sampling rate of how often we re-calculate the Shannon Entropy of the probabilistic model (we may use machine learning techniques to optimise the sampling rate). If the Shannon Entropy measurement begins to diverge from the baseline value – by some pre-determined tolerance of +/- x bits – then we could infer from this that the system may be changing in some way. This out-of-tolerance measurement could flag the need for further investigation – either by an intelligent agent or a human.

What I am describing here is an idea. I am not aware of any existing technique, or concept, that achieves this. Neither do I know if there is much utility in what I have described. I believe it is technically feasible – the computational complexity of updating a probabilistic model and calculating its Shannon entropy can be achieved in polynomial time (i.e. very efficiently). As such, you should interpret this for what it is; an idea that I hope interests people enough to pursue it further. 

I believe the utility of this technique – of parsing the environment and comparing it against a probabilistic model – could be a very efficient way to manage a vast amount of automated monitoring of an environment for changes that may warrant further investigation. Of course, this ‘further investigation’ would call into play more expensive resources such as AI and/or humans.

My motivation for conceiving of this idea comes back to the need for any organisation to become highly proficient at anticipating change. When the organisation’s environment (internal or external) may be changing in unexpected ways, then we want to be observing the change as it is happening in real-time, rather than analysing after the event. Why is this important? If we are observing the genesis of an enduring change in our operating environment, then we have the opportunity to gain insights to the causes that led to that change.

Applying Shannon Entropy as an early warning system signals an alarm that our knowledge of our environment may no longer be accurate. We can respond to these warning signals by expending effort to understand the changes that may be occurring. From this we may create new knowledge and, therefore, update our semantic graph to represent that new understanding. The semantic graph is critical, because all of our collective intelligence draws on it to make good decisions. If that semantic graph is erroneous or significantly out of date, the quality of our decisions are impacted. As an organisation harnesses AI to the fullest – where we are talking about millions, if not billions, of decisions being taken every second – then the accuracy of the semantic graph becomes a critical and protected asset.

Anticipation gives us time to prepare; yet to accurately anticipate our environment we need to be sufficiently open to detecting changes that suggest that our understanding of the environment may no longer be up-to-date.

I’d like to finish this discussion by making one final point. The use of information theory to measure the behaviour of a dynamic system is not a new concept. Indeed, information theory is one of the most promising tools in the complexity scientist’s toolbox for unravelling the mysteries of a complex system. One of the biggest challenges for the complexity scientist is having access to information about the system of interest. Most of the time we simply cannot access a complex system with the tools we have. To give just a few examples: the brain, the weather, genetics. It is neither practical, nor feasible, for a complexity scientist to have access to every element or aspect of systems of this kind. Yet we are not without hope. As long as we can capture the signals, the data, the transmissions, from these systems then we can begin to understand the system, even though it is hidden from us. Of course, as we gain more knowledge of these systems, we can then devise precise interventions that may yield crucial insights that either confirms our hypotheses or takes us completely by surprise. 

Up until recently, I had been researching the use of information theory to infer the causal architecture of a system. Techniques such as Feldman & Crutchfield’s causal state reconstruction, or Schreiber’s Transfer Entropy, or Tononi’s Integrated Information Theory were all part of my toolkit. They are all valuable as they can tell us something interesting about a complex system. However, they do not have the explanatory power of causality, especially Judea Pearl’s do-calculus. I pass on this observation here to those readers who may be more familiar with these subjects.

The rise of the supercompetitor (Part 1)

I have previously defined cognitive advantage as ‘the demonstrable superiority gained through comprehending and acting to shape a competitive environment at a pace and with an ingenuity that an adversary’s ability to comprehend and act is compromised‘. The driver of such a capability is the wide scale adoption of AI to hyper-accelerate decision-making either by automated action or in augmenting a human.

But what would the impact be if a company, or a government, held a cognitive advantage? I’ve been thinking about this and discussing it with colleagues too, and I don’t believe it is a position that we would want any firm or government to get too without strong ethical principles guiding their actions. There is a lot to unpack here – and I am still feeling my way too – and so I am going to cover this topic over a series of posts.

My overriding concern, at this time, is that to hold a cognitive advantage would equip the benefactor with a profound and deeply defensible position. To understand this better its worth exploring what having a cognitive advantage means.

A cognitive advantage could result from the optimal orchestration of artificial and human intellects alongside beneficial data accesses to yield an ability to act with deep sagacity (‘the trait of solid judgment and intelligent choices’). Out-thinking and out-manoeuvring rivals and adversaries will require hyper-accelerated decision-making with the agency to act with precision, foresight and an understanding of complex system effects. (I will explore how an organisation may develop a cognitive advantage in a future post).

With such a capability, an organisation could begin to actively shape a competitive environment (eg. the digital economy) at a pace in a way that competitors simply cannot keep up or cannot understand the meaning of such changes. Insufficient understanding can lead to irrational responses and poor outcomes.

Compromising the cognitive abilities of a competitor may lead to a decline in their ability to act with accuracy and insight. This would lead to poorer decision-making thus further reducing that organisation’s competitive position. Subsequently, this could feed a regressive cycle of cognitive decline and thus increasingly poor decisions. 

As the ability of competitors to compete continues to decline, would this lead the single competitor – that holds a cognitive advantage – to become a supercompetitor that dominates in an unassailable winner-takes-all outcome? What might this mean? What would the impact be? What if it becomes impossible to regulate such organisations; a supercompetitor that may not wish to be understood may not be possible to regulate.

How would we know which companies, or countries, are already on a cognitive advantage trajectory? How might we begin to prepare for such a possibility? Are there already super competitors amongst us? The big tech unicorns, China?

In an AI world winning in business and politics will go to those that have a ‘cognitive advantage’

When I was at the Complex Systems Conference in Singapore in September 2019, I found myself musing on the question: in a world where we have all maxed out our use of AI, how will that change the way that a business outcompetes their rivals? In a world where automated decision-making will take over more and more of the running of businesses and entire countries how do you compete to win? The conclusion I came to was that it was about out-thinking and out-manouevring your rivals and adversaries to the extent that you shape the environment in which you are competing in a way that your adversaries (humans and AI) can no longer accurately comprehend it and, thus, would begin to make increasingly bad decisions.

Now, this has happened throughout history and to quote Sun Tzu “… the whole secret lies in confusing the enemy, so that he cannot fathom our real intent”. However, the key difference this time is that AI will have a significant role in shaping that competed environment at a speed and a propensity for handling big data that humans are simply left behind. We may enter a cognitive war of AI versus AI.

I am hypothesising that AI will come to dominate global action that shapes our offline and online worlds. So, if you want to compete you will need to shape the digital environment that AI is attempting to predict, understand and act in. In other words, the competitive moves we make in the future will (a) be done automatically by AI on our behalf, and (b) therefore we will need to consider how AI will perceive our actions (recognising that, for the moment at least, most AI is dependent on big data). If the long-established practice of marketing to convince people to buy your product is extended to marketing to artificial intellects too, to persuade an AI to behave in a way that you want it to, then you start to get the point.

I call this having a cognitive advantage which I define as:

the demonstrable superiority gained through comprehending and acting to shape a competitive environment at a pace and with an ingenuity that an adversary’s ability to comprehend and act is compromised

I wrote a paper about this last year:

I will also be giving a talk on Cognitive Advantage at this year’s Future of Information & Communication conference in Vancouver (FICC 2021). A version of the conference paper will also be published in Springer’s ‘Advances in Intelligent Systems’.