Basic AI/ML Safety Guide for the Non-Technical
Artificial Intelligence (AI) and Machine Learning (ML) affect and will continue to affect many aspects of our everyday lives at an increasing rate. Because it affects everybody, it is important for everybody to understand these issues at a basic level.
The content of this blog post will likely seem a gross simplification to people who are deeply technical. It’s purpose is to shed light on and make comprehensible at a high level the current and future issues we face as a species when it comes to Artificial Intelligence and Machine Learning.
This blog post is divided into three sections:
- Introduction to Artificial Intelligence and Machine Learning
- Current Issues and Dangers (Weak/Narrow AI)
- Future Issues and Dangers (Strong/General AI)
What is Artificial Intelligence and Machine Learning?
The Crash Course video below is a short introduction to AI/ML with many visuals and I encourage you to watch it if you’re interested in the basics.
Main points include:
- Machine Learning (ML) can be defined as: “Algorithms that give computers the ability to learn from data, and then make predictions and decisions.”
- “ML is a set of techniques that sits inside the even more ambitious goal of Artificial Intelligence (AI).”
- Some ML systems use statistical techniques to classify things e.g. if the wingspan of a moth is <X it’s likely Species A and if >X it’s likely Species B, but with many different features at once. In a simplified way, it looks something like this:
- Another technique is to use Artificial Neural Networks wherein “each artificial neuron takes a series of inputs, combines them, and emits a signal.” It looks like this:
- The numbers that are used by neurons in the hidden layer to make calculations start out randomly and are fine-tuned over time, essentially learning from the data. Sometimes people talk about “black box” AI and that’s when we really have no idea what’s going on in the hidden layers and how ML systems are generating their outputs.
- Artificial Neural Networks remind me of the way we as a society decide if things are good or bad. Think of a newsworthy event trending on twitter and imagine each of us as a neuron and each tweet as an input or an output. We each take in information on the topic, process it, and tweet out our judgement. These judgments are taken as inputs by others and after many, many layers of debate, X% will decide it’s good and Y% will decide it’s bad. This is how we process things in the social universe. In a way bad actors like Putin hack our social neural net by inserting fake “neurons” that don’t process inputs in good faith but instead have a predetermined idea of what the final output ought to be and pretend to process inputs, while really just spitting out a bunch of false outputs for the rest of us to process in good faith.
- When you get many hidden layers in an artificial neural network, it’s called Deep Learning. Deep learning looks like this:
- Deep Learning especially allows for Reinforcement Learning which is when systems are able to learn through trial and error through experience and data, much like humans.
- The AI/ML systems of today can be classified as Weak or Narrow AI, meaning they can only do specific things such as driving a car or identifying human faces (though some are moving into more general areas).
- Future AI concerns stem from the possibility of creating Strong or General AI which is a well-rounded intelligence like a human, but without the processing constraints our brains posses in their biological incarnations. We’ll get to that later.
Current Issues and Dangers
Many of the risks posed by AI/ML systems now and in the future stem from unintended consequences.
There are two parts to this:
- When the systems don’t perform as expected.
- When systems are used maliciously.
Let’s look first at what happens when systems behave in harmful ways accidentally.
The non-profit OpenAI has been at the vanguard of this type of research. Their 2016 paper detailing concrete problems in AI safety is what originally drew me to the field. This is a good paper for non-technical people because they explore the issues through a fictional robot designed to clean office buildings, which is easy to visualize and understand.
The main problem is avoiding accidents which they define broadly “as a situation where a human designer had in mind a certain (perhaps informally specified) objective or task, but the system that was designed and deployed for that task produced harmful and unexpected results.”
They outline five main ways accidents would happen:
Avoiding Negative Side Effects: How can we ensure that our cleaning robot will not disturb the environment in negative ways while pursuing its goals, e.g. by knocking over a vase because it can clean faster by doing so? Can we do this without manually specifying everything the robot should not disturb?
Avoiding Reward Hacking: How can we ensure that the cleaning robot won’t game its reward function? For example, if we reward the robot for achieving an environment free of messes, it might disable its vision so that it won’t find any messes, or cover over messes with materials it can’t see through, or simply hide when humans are around so they can’t tell it about new types of messes.
Scalable Oversight: How can we efficiently ensure that the cleaning robot respects aspects of the objective that are too expensive to be frequently evaluated during training? For instance, it should throw out things that are unlikely to belong to anyone, but put aside things that might belong to someone (it should handle stray candy wrappers differently from stray cellphones). Asking the humans involved whether they lost anything can serve as a check on this, but this check might have to be relatively infrequent — can the robot find a way to do the right thing despite limited information?
Safe Exploration: How do we ensure that the cleaning robot doesn’t make exploratory moves with very bad repercussions? For example, the robot should experiment with mopping strategies, but putting a wet mop in an electrical outlet is a very bad idea.
Robustness to Distributional Shift: How do we ensure that the cleaning robot recognizes, and behaves robustly, when in an environment different from its training environment? For example, strategies it learned for cleaning an office might be dangerous on a factory workfloor.
These problems might seem silly when thinking about a robot cleaning an office building, which has relatively lower stakes (unless the robot decides a cleaner that completely ruins all your office’s computer screens is the most efficient one and you realize that the next morning), but what happens when AI/ML systems are running larger, higher stakes things like driving cars and planes and regulating the flow of traffic? Or when they’re performing more personal tasks like caring for children or the elderly?
Because AI/ML systems will continue to be deployed in more and more sensitive areas of our lives and societies, it’s important to spend time thinking deeply about these problems and dedicating research dollars to them.
Much of the technical work and research being done on AI safety relates to solving the above-mentioned problems. But there’s another side to AI safety that’s become more popular in recent years and that is anticipating the ways various AI/ML systems might be used by malicious actors.
In my opinion, much of the issue here has been that most AI/ML researchers are good people who are not creative enough to think through all malicious uses because their brains aren’t wired to think up ways to harm people.
For example the dangers of being able to create fake yet realistic human faces weren’t debated much before the technology became available, though it does pose dangers. For example, before such convincing deep fake technology existed, when Russia wanted to make fake accounts at scale for the purposes of election interference and general hybrid warfare campaigns, the easiest way to do it at scale was to copy the identifies of real people, which is identity theft and a crime. When they can generate fake people at scale it could be harder to catch because the lack of identity theft takes away the obvious first crime in the process.
OpenAI has recently generated a discussion about responsible disclosure and thus dissemination of technology with their new “large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.”
Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.
Issues surrounding malicious use and disclosure for research purposes are not going to be resolved any time soon, but I think it’s a step in the right direction for OpenAI to start that conversation.
For example, I immediately wondered if this technology in the hands of Russian operatives could make it possible to produce convincing disinformation at scale without the need to recruit native English speakers.
Disclosing more information is what Jeff Bezos might describe as a one-way door, a decision that is permanent. Withholding information is a two-way door, a decision that can be reversed.
So I applaud OpenAI’s initial caution and the sparking of discussion on these issues.
Future Issues and Dangers
Current problems in AI/ML regarding weak systems are certainly important enough that it’s reasonable to put most of our focus there. But it’s also good to have a general idea about where we are headed regarding strong AI in order to keep an eye on it.
Nick Bostrom is a philosopher and was my first introduction to superintelligence because he’s good at clearly laying out the dangers for non-technical people. I would encourage you to watch his TED talk below and to read his book if it’s a topic that really interests you.
Some Main Points:
- Once general AI reaches human-level intelligence, it won’t stop there and hang out. It will become so intelligent we can’t even comprehend it at this point in time, pretty much instantaneously. The picture below shows the scale of human intelligence as we perceive it, with a large gap between the most and least intelligent humans.
- But general AI will make this gap look laughably small and we will hardly be more intelligent than a mouse in comparison.
- “General AI is the last inventions humans will ever need to make because it will be better at inventing than we are” and it’s quite likely our future will be shaped by the preferences of AI we create.
- “We need to think of intelligence as an optimization process, a process that steers the future into a particular set of configurations. A superintelligence is a really strong optimization process. It’s extremely good at using available means to achieve a state in which its goal is realized.”
- This could be a bit like the tale of King Midas. His objective was to turn everything to gold, but he didn’t think through the ramifications of having a highly effective optimization process.
- As an example of unintended side effects, what if we tell the AI to make every human smile so in order to be most effective it sticks electrodes into all of our faces to force us to physically smile while also bringing us pain and suffering?
It may seem like an absurd example, but the unintended consequences of giving superintelligent AI a task can be spread to every human because it’s so powerful, so it’s quite a serious matter.
In some aspects the concerns about general AI are not unlike the concerns surrounding narrow AI. Many revolve around unintended side effects and malicious actors. Though these examples assume we even still have control over the AI and it listens when it give it commands.
The next danger is such a powerful AI getting into the hands of one or a small group of malicious actors. This is not an abstract concern. Hostile foreign actors are already thinking about it.
Superintelligent AI developed in secret and controlled by only a small group of malicious actors in order to basically enslave the rest of the world is one of the issues OpenAI aims to solve. They describe their mission as
The logic being, something so powerful should not be controlled by a small subset of humans. To be safe, it should be distributed equally to all humans. It helps me sleep at night to know there are smart people thinking about these things and working on these problems.
Though I do believe as a species we need more international collaboration and transparency. OpenAI’s transparency and caution provide a good model for such research, but as we move into the future it’s clear more and more groups are going to work on AGI and the stakes are too high to risk countries developing AGI in silos as an arms race. We need to figure out how we’re going to handle this as a species before it gets to that point.
The last major concern I see about superintelligent AI is basically about it becoming sentient. If we are no longer the most intelligent being in the world, what will become of us? If our future is shaped solely by preferences of the AI it would be good know, will it be benevolent or will it decide to wipe us out? What will it want and how will it behave?
Nobody knows the answers to these questions just as nobody knows how long it will take to develop AGI. Current estimates seem to range between 10 and 100 years. Though Nick Bostrom has a beautiful explanation of the implications of this situation.
We should not be confident in our ability to keep a superintelligent genie locked up in its bottle forever. The answer is to figure out how to create superintelligent AI such that even if, when, it escapes, it is still safe because it is fundamentally on our side because it shares our values.
This is why I feel so passionately about the need to uncover the laws of social physics in advance so we have concrete values to teach the AI and it’s clear when a side effect of an objective harms the wellbeing of a single human or group of humans and how they function together as a network and we have a practical way of having it implement not doing harm, which can then inform its behavior above all else.