Decision-making that relies on instinct and intuition is risky because human beings are not inherently impartial. Scientists have observed and named many cognitive biases and logical fallacies, which adversely affect decision-making. Fortunately, machine learning can overcome these challenges by learning from the data itself. Because computers do not hold any inherent biases and are capable of analyzing massive amounts of information, they have a distinct advantage over humans. Utilizing data for machine learning can even uncover contradictory or surprising results.

However, because humans are the ones programming the computers and supervising machine learning, there is potential for the introduction of human biases. It is tempting to give more weight to certain data points based on prior experience or beliefs. There is also the huge potential to introduce bias in the data itself through the way it is structured and collected. For example, an important indicator may be overlooked because no one anticipates its significant impact on the model, or even considers collecting that information in the first place. It is important to be aware of potential biases and form strategies to minimize the risks in order to ensure the accuracy of a model.

Humans and the Risk of Cognitive Biases

The workings of the human brain are complex and not yet completely understood by the scientific community. However, one known theme underlying cognition is the fact that the brain uses shortcuts, called cognitive heuristics, to process information quickly and efficiently. The mechanisms of the human brain are impressive, but at times, the “side effects” of the shortcuts that optimize human cognition demonstrate the limitations of human reasoning. These shortcuts can manifest themselves as cognitive biases which can impair decision-making.

When a choice needs to be made, a decision-maker often relies on a gut feeling based on their past experiences. But how accurate is this intuition? Can years of experience actually cause them to ignore information to the contrary, and lead them astray? Unfortunately, it can. Humans have a natural tendency to seek or remember information that confirms their suspicions or beliefs, while devaluing or ignoring information to the contrary. This phenomenon is known as the confirmation bias. Here is an example: an auto executive owns a car with a hatchback, prefers it, and believes that consumers as a whole prefer cars with hatchbacks. As he is driving around town, he makes note of all the hatchbacks on the road (while paying less attention to the cars with standard trunks), and uses this information to confirm his belief. When it comes time to approve a new car model, he is more likely to approve the hatchback design because that is what his gut tells him that customers will want in the next model year (despite, perhaps, car sales evidence to the contrary). In the hatchback example, the executive’s beliefs influence the data collected and subsequently retrieved in his memory; not the other way around.

Another bias related to this idea is functional fixedness. Traditionally, this term refers to the inability to imagine new or novel ways to use an object that differ from the object’s intended function. For example, one might not think to use a small trash can to clear snow in the absence of a shovel because that is not how a trash can is typically used. Functional fixedness can also be said to occur with concepts as well, in the event of an inability to think of alternative solutions to a problem. This bias is the enemy of innovation as it can stifle creative thought.

Simple face drawing By Germo (Own work) [Public domain], via Wikimedia Commons

Apophenia is the tendency for humans to see patterns in randomness. A common example of this phenomenon occurs in human facial recognition. When presented with a collection of dots and lines, a human can perceive a face, despite the very abstract nature of the image. It is hypothesized that this ability to distinguish faces has an evolutionary benefit, despite the risk of false positives when overgeneralizing (such as seeing an image of a religious figure on a piece of toast).

The human tendency to find significance in randomness is not limited to perceiving faces on inanimate objects. One such bias is the clustering illusion, in which one considers clusters or streaks in small sample sizes to be statistically significant. An example of this is a baseball player with a .325 batting average who goes 0-4 in a championship series game. The coach may conclude that this player is on a “cold-streak,” and be tempted to replace him in the next game. However, a single game performance is not a statistically relevant measure of his ability to perform over the long-term. Statistically, it would be better to keep him in the game than substitute the .200 hitter who went 4-4 in the previous game.

One bias that can be particularly detrimental when making business decisions is an illusory correlation. This bias is said to occur when a correlation between two variables is perceived when no relationship actually exists. Sometimes this goes a bit further, and a person perceives a cause-and-effect relationship between unrelated events simply because one follows another. This is known as post hoc ergo propter hoc (which is Latin for “after this, therefore because of this”). Even if one picks up on an actual correlation through intuition, they are likely to have a higher confidence in their judgement than actually exists, referred to as the overconfidence effect.

Cognitive shortcuts allow humans to quickly process information on a day-to-day basis, but the resulting biases can lead decision-makers astray. Fortunately, analytics offers a solution.

Computers: The Unbiased Solution

Computers are impartial; they have no preconceived notions, and do not jump to erroneous conclusions based on little evidence. They follow a specific set of instructions (programs) and do not have beliefs or feelings to affect their decision-making. Computers exist in a binary world—0 or 1, true or false—in which there is no grey area. This is why computers are a boon to decision-makers. In the case of machine learning, the data determines the outcome. Instead of fitting the data to match a certain paradigm (or belief), the data speaks for itself.

However, the cognitive biases of the person writing the program, or the person interpreting the results, could taint the final outcome. This is why it is important to be aware of human biases and what steps can mitigate the risks.

How Humans Can Increase Impartiality In Analytics

Machine learning does not happen in a vacuum, a world removed from human contact. Computers are programmed and controlled by humans, leading to possible overfitting, or mistaking “noise” for a signal. Additionally, the way data is structured is determined by humans, and, frequently, is collected and entered into a database by humans, leading to possible errors. Humans must even decide which data is worth collecting in the first place. Bias can be introduced in any one of these situations. But there are a few steps an organization or data scientist can take to reduce bias in machine learning:

  • Be aware that the these biases exist and consciously work to keep an open mind; let the data speak for itself. For example, when assigning weight to data points, do not let previous experience overly influence the decision. And likewise, don’t exclude data based solely on “intuition.”
  • Use more data sources. This way, no single data set is given a higher priority, and “bad” data sets do not disproportionately influence the final result.
  • Collect as many data points as you can, even if it seems irrelevant at first glance. Just because it seems like one data point would be the best indicator, doesn’t necessarily mean it will be. An even better indicator could exist, but is simply overlooked.
  • When collecting and structuring data, strive to create a clear, unbiased, and objective dataset. “Messy” data can lead to subjectivity when the data is translated into a metric that can be used for analytics. For example, when collecting a patient’s health information, instead of asking the open ended question, “How are you feeling?”, and recording the resulting responses (such as “good,” “in pain,” or “feeling better”), ask the patients to rate their discomfort on a 1-10 scale. While this still adds a bit of subjectivity (one patient’s 3 may be another patient’s 7), it at least attempts to put the responses on the same scale so that all the patients can be compared as a whole.

The best way to reduce cognitive bias is to rely on data to make decisions, not simply human intuition. Data scientists must be careful to stay impartial when supervising machine learning, and organizations must be open to collecting all manner of objective data in the first place.