What Exactly is "Artificial Intelligence"?

If you use an automated assistant, make a simple Google search, get recommendations on Netflix or Amazon, or find a great deal in your inbox, then you will have interacted with AI (Artificial Intelligence). Indeed, it seems that every company and service today is incorporating AI in some way or another. But let’s dissect what the phrase ‘Artificial Intelligence’ means.

Most people would agree that AI is not so advanced that these companies would have Rosie from The Jetsons analyzing customer data or Skynet making product recommendations on their store page. And on the other end, at some level it is commonly understood that AI is more complex than simple business rules and nested ‘if this, then that’ logical statements.

Things start to get murky when other phrases, often conflated with AI, are added to the mix. Amongst these terms are Machine Learning (ML) and Deep Learning (DL). One company might say they use ML in their analytics, while another might claim to use DL to help enhance creativity. Which one is better or more powerful? Are either of these actually AI? Indeed, a single company may even use these words interchangeably, or use the overlap of definitions to their marketing advantage. Still others may be considering replacing an entire analytics department with DL specialists to take advantage of this new ‘AI Revolution’.

Don’t get swept up by the hype; let’s shine a light on what these terms really mean.

Teasing out the Differences between AI, ML and DL

These three terms⁠—Artificial Intelligence, Machine Learning, and Deep Learning—are critical to understand on their own, but also how they relate to each other; from a sales team explaining the services they provide, to the data scientists who must decide which of these model types to use. And while it is true that each of AI, ML, and DL have their own definitions, data requirements, level of complexity, transparency, and limitations—what that definition is and how each relate is entirely dependent on the context at which you look at them.

For example, what constitutes Machine Learning from a data acquisition perspective might look an awful lot like Deep Learning in that both require massive amounts of labeled data, while neither look at all similar in the context of the types of problems each can solve or even in the context that examines the skill sets that are required to get a specific model up and running.

For the purposes of this thought piece, the context we will be using will be the case of complexity—how the ability of each of Artificial Intelligence, Machine Learning, and Deep Learning simulate human intelligence and how they incrementally build on each other. This simulation of human intelligence, called simply machine intelligence, is measured by the machine’s ability to predict, classify, learn, plan, reason, and/or perceive.

Relationship between AI, ML and DL

The interlink between Artificial Intelligence, Machine Learning, and Deep Learning is an important one, and it is built on the context of increasing complexity. Due to the strong hierarchical relation between these terms, the graphic above demonstrates how we at Aunalytics have chosen to best to organize these ideas. Artificial Intelligence is the first of the three terms as historically it originated first, as well as the fact that it is the overarching term that covers all work within the field of machine intelligence. AI, as we use it, can be best described in two ways. The most general case definition of Artificial Intelligence is any technique that enables machines to mimic human intelligence.

Indeed, it may seem that any number of things computers are capable of today could be seen as an AI, although the focus here is not the ability to do math or maintain an operating system⁠—these are not ‘intelligent’ enough. Rather, we are considering such application like game AI, assistive programs like Microsoft’s ‘Clippy’, and expert systems which must predict useful material or actions, classify tasks and use cases, or perceive user and environmental behaviors to drive some action. In short, they display machine intelligence.

The key here is that all of these things perform an activity that we might attribute with human intelligence⁠—moving a bar to follow a perceived ball in the classic video game Pong, classifying that you are writing what looks to be a letter and then provide a useful template, or predict an answer for you based on your current problems. In each scenario, the AI is provided some sort of input and must respond with some form of dynamic response based on that input.

Artificial Intelligence: Machines Simulating Human Intelligence

These kinds of activities are all rule-driven, a distinction that leads to our second, more application based definition of AI: any rule-based application that simulates human intelligence. Rule-based activities possess a very limited ability to learn, opting instead to simply execute a predetermined routine given the same input. The easy Pong AI will always execute the rule provided⁠—to follow the ball – and no matter how long it plays it will only be able to play at an easy level. Clippy will always show up on your screen when it thinks that you are writing a letter, no matter how many letters you write or how annoyed you may get. This outright inability to learn leaves much to be desired to reach the bar of what we would consider human-level intelligence.

Machine Learning: Learning from the Data

This brings us to machine learning. Machine learning is a subset of AI that incorporates math and statistics in such a way that allows the application to learn from data. Machine Learning, then, would be primarily considered a data-driven form of Artificial Intelligence, although rule-driven material can still be applied in concert here where appropriate. Again, the key differentiator is that the algorithms used to build a Machine learning model are not hardcoded to yield any particular output behavior. Rather, Machine Learning models are coded such that they are able to ingest data with labels⁠—e.g. this entry refers to flower A, that entry refers to flower B⁠—and then use statistical methods to find relationships within that data in dimensions higher than would be possible for a human to conceptualize. These discovered relationships are key as they represent the actual ‘learning’ in machine learning. Therefore it is the data, not the code, where the desired intelligence is encoded.

Because of this ability to learn from a set of data, generalized models can be made that do great for certain tasks, instead of needing to hardcode a unique AI for each use-case. Common use cases for Machine Learning models include classification tasks, where a Machine Learning model is asked to separate different examples of data into groups based on some learned features. Examples here are such things like decision trees, which learn and show how best to branch features so that you arrive at a homogenous group (all flower A, or all Churning customer). Another common case for Machine Learning is clustering, where an algorithm is not provided labeled data to train on, but rather is given a massive set of data and asked to find what entries are more alike to one another.

In both of these applications there is not only the opportunity for learning, but continual learning⁠—something that hardcoded, rule-based AI simply cannot do effectively. As more data is collected, there is a growing opportunity to retrain the Machine Learning model and thus yield a more robust form of imitated intelligence. Much of modern business intelligence is built on this style of artificial intelligence given the massive amount of data that businesses now posses.

Machine Learning Life Cycle

Limitations of Machine Learning

This is not to say that machine learning is the pinnacle of AI, as there are some severe limitations. The largest limitation to this approach is that we, as humans, must painstakingly craft the datasets used to train machine learning models. While there are many generalized models to choose from, they require labeled data and handcrafted ‘features’—categories of data that are determined to be valuable in the learning process. Many datasets already contain useful data, but in some domains this is much less so. Imagine, for example, you wish to build a machine learning model that can intelligently classify cats from cars. Well, perhaps you pull out the texture of fur and the sheen of cars—but this is a very difficult thing to do, and it is made even harder when one considers that the solution of this model should be general enough to apply to all cats and cars, in any environment or position. Sphynx cats don’t have fur, and some older cars have lost their sheen. Even in simpler, non-image cases, the trouble and time spent constructing these datasets can in some cases cost more than the good they can accomplish.

Crafting these feature-rich, labeled datasets is only one of the limitations. Certain data types, like the case with images we already have described, are simply too dimensionally complex to adequately model with machine learning. Indeed, processing images, audio, and video all suffer from this, a reminder that while these forms of AI are powerful, they are not the ultimate solution to every use case. Indeed, there are other use cases, like natural language processing (NLP) where the goal is to understand unstructured text data as well as a human can, where a machine learning model can be constructed—although it should be acknowledged that there exist more powerful approaches that can more accurately model the contextual relations that exist within spoken language.

Deep Learning: A More Powerful Approach Utilizing Neural Networks

We call this more powerful approach ‘Deep Learning’. Deep Learning is a subset of Machine Learning in that it is data-driven modeling, although Deep Learning also adds the concept of neural networks to the mix. Neural networks sound like science fiction and indeed feature prominently in such work, although the concept of neural networks have been around for quite some time. They were first imagined in the field of psychology in the 1940’s around the hypothesis of neural plasticity, and migrated a time later to the field of computer science in 1948 around Turing’s B-type machines. Research around them stagnated, however, due to conceptual gaps and a lack of powerful hardware.

Modern forms of these networks, having bridged those conceptual and hardware gaps, are able to take on the insane level of dimensionality that data-driven tasks demand by simulating, at a naive level, the network-like structure of neurons within a living brain. Inside these artificial networks are hundreds of small nodes that can take in and process a discrete amount of the total data provided, and then pass its output of that interaction onto another layer of neurons. With each successive layer, the connections of the network begin to more accurately model the inherent variability present in the dataset, and thus are able to deliver huge improvements in areas of study previously thought to be beyond the ability of data modeling. With such amazing ability and such a long history, it is important to reiterate that neural networks, and thus Deep Learning, have only become relevant recently due to the availability of cheap, high volume computational power required and the bridging of conceptual gaps.

When people are talking about AI, it is Deep Learning and its derivatives that are at the heart of the most exciting and smartest products. Deep Learning takes the best from Machine Learning and builds upon it, keeping useful abilities like continual learning and data-based modeling to generalize for hundreds of use cases, while adding support for new use cases like image and video classification, or novel data generation. A huge benefit from this impressive ability to learn high dimensional relationships is that we, as humans, do not need to spend hours painstakingly crafting unique features for a machine to digest. Instead of creating custom scripts to extract the presence of fur of a cat, or a shine on a car, we simply provide the Deep Learning models the images of each class we wish to classify. From there, the artificial neurons begin to process the image and learn for itself the features most important to classify the training data. This alone frees up hundreds if not thousands of hours of development and resource time for complex tasks like image and video classification, and yields significantly more accurate results (than other AI approaches).

One of the more exciting possibilities that Deep Learning brings is the capability to learn the gradients of variability in a given dataset. This provides the unique ability to sample along that newly learned function to pull out a new, never-before-seen datapoint that matches the context of the original dataset. NVidia has done some amazing work that demonstrates this, as seen below, using a type of Deep Learning called Generative Adversarial Networks (GANs) which when provided thousands of images of human faces can then sample against the learned feature distribution and by doing so pull out a new human face, one that does not exist in reality, to a startling level of canniness.

A collection of faces generated by Nvidia's StyleGAN AI system.
A collection of faces generated by Nvidia's StyleGAN AI system.

Deep Learning Limitations

Like its complexity-predecesor Machine Learning, Deep Learning has its share of drawbacks. For one, Deep Learning yields results in an opaque way due to its methodology, an attribute known as ‘black box modeling’. In other words, the explainability of the model and why it classifies data as it does is not readily apparent. The same functionality that allows Deep Learning so much control in determining its own features is the same functionality that obscures what the model determines as ‘important’. This means that we cannot say why a general Deep Learning model classifies an image as a cat instead of a car—all we can say is that there must be some statistical commonalities within the training set of cats that differs significantly enough from that of the car dataset—and while that is a lot of words, it unfortunately does not give us a lot of actionable information. We cannot say, for example, that because an individual makes above a certain amount of money that they become more likely to repay a loan. This is where Machine Learning techniques, although more limited in their scope, outshine their more conceptually and computationally complex siblings as ML models can and typically do contain this level of information. Especially as DL models become more depended on in fields like self-driving vehicles, this ability to explain decisions will become critical to garner trust in these Artificial Intelligences.

Another large drawback to Deep Learning is the sheer size of the computational workload that it commands. Because these models simulate, even at only a basic degree, the connections present in a human brain, the volume of calculations to propagate information through that network in a time scale that is feasible requires special hardware to complete. This hardware, in the form of Graphics Processing Units (GPUs), are a huge resource cost for any up-and-coming organization digging into Deep Learning. The power of Deep Learning to learn its own features may offset the initial capital expenditure for the required hardware, but even then it is the technical expertise required to integrate GPUs into any technology stack that is still more often than not the true pain point in the whole acquisition, and can be the straw that breaks the camel’s back. Even with such a large prerequisite, the power and flexibility of Deep Learning for a well-structured problem cannot be denied.

Looking Forward

As the technology continues to grow, so too will the organizing ontology we submit today. One such example will be with the rise of what is known as reinforcement learning, a subset of Deep Learning and AI (specific) that learns not necessarily from data alone, but from a combination of data and some well-defined environment. Such technologies take the best of data-driven and rule-driven modeling to become self-training, enabling cheaper data annotation due to a reduction in initial training data required. With these improvements and discoveries, it quickly becomes difficult to predict too far into the future for what may be mainstream next.

The outline of Artificial Intelligence, Machine Learning, and Deep Learning presented here will remain relevant for some time to come. With a larger volume of data every day, and the velocity of data creation increasing with mass adoption of sensors and the mainstream support of the Internet of Things, data-driven modeling will continue to be a requirement for businesses that wish to remain relevant, and important for consumers to be aware of how all this data is actually being used. All of this in the goal of de-mystifying AI, and pulling back the curtain on these models that have drummed up so much public trepidation. Now that the curtain has been pulled back on the fascinating forms of AI available, we can only hope that the magic of mystery has been replaced with the magic of opportunity. Each of AI, ML, and DL has a place in any organization that has the data and problem statements to chew through it, and in return for the effort, unparalleled opportunity to grow and better tailor themselves for their given customer base.

Special thanks to Tyler Danner for compiling this overview. 

Glossary


Artificial Intelligence (AI): Any technique that enables machines to mimic human intelligence, or any rule-based application that simulates human intelligence.

Machine Learning (ML): A subset of AI that incorporates math and statistics in such a way that allows the application to learn from data.

Deep Learning (DL): A subset of ML that uses neural network to learn from unstructured or unlabeled data.

Feature: A measurable attribute of data, determined to be valuable in the learning process.

Neural Network: A set of algorithms inspired by neural connections in the human brain, consisting of thousands to millions of connected processing nodes.

Classification: Identifying to which category a given data point belongs.

Graphics Processing Units (GPUs): Originally designed for graphics processing and output, GPUs are processing components that are capable of performing many operations at once, in parallel, allowing them to perform the more complicated processing tasks necessary for Deep Learning (which was not possible with traditional CPUs).

Reinforcement Learning: A form of Machine Learning where an agent learns to take actions in a well-defined environment to maximize some notion of cumulative reward.

Sampling: Within the context of AI/ML, sampling refers to the act of selecting or generating data points with the objective of improving a downstream algorithm.