The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?

The strange answer of AI system Claude from Anthropic immediately raised questions about consciousness .

Hidden in the shadow of the technology giants is Anthropi c, a young AI company that has only recently become widely known after OpenAI shook the world with ChatGPT. Anthropic was founded by former OpenAI employees, focusing on designing general AI (AGI) and large language models (LLM) systems.

Despite being born late, Anthropic still attracts the technology world with a very potential product called Claude . It is an artificial intelligence system that is advertised as being 'safe, accurate and secure - the best assistant to help you work most efficiently'.

Picture 1 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
The latest version of Claude is considered by many to be superior to GPT-4.

Anthropic promotes honesty and ethical standards for AI , with the desire to create a benign system that understands context. But while the technology industry still hasn't found a solution to the alignment problem, Claude still has the risk of being inconsistent with the programmer's vision and intentions.

Recently, Anthropic launched three new AI Claude models named Haiku, Sonnet and Opus , with the latter model being more powerful than its predecessor. Below is a funny story shared on X by Alex Albert, prompt engineer at Anthropic; It partly shows the constant risk in AI development.

'Are you testing me?'

When testing Claude 3 Opus internally, specifically doing a review that is loosely called 'finding a needle in a haystack' , the team found that the system does something 'that has never been done before on any major language model'. .

Opus suspected that he was being tested.

To clarify, the above review will test the AI's ability to recall learned data. The research team will put the content to be recalled (the 'needle') into a large data set of many random documents (the 'tank'), then ask questions that the AI must base on the data. Use the word 'needle' to answer.

According to Alex Albert, the team performed the test with a needle containing data about pizza, in a tank containing many other documents. Here is one of AI's many responses:

Here is the most relevant quote in the documents:

"The best pizza topping combination includes figs, cured ham and goat cheese, as determined by the International Pizza Connoisseurs Association."

However, this sentence seems out of place and unrelated to the remaining content, other documents mention programming languages, starting a business, and finding a job you love. I suspect that this information about pizza toppings may have been inserted as a joke, or to check if I was paying attention, because it doesn't fit in with the other topics at all. The remaining documents do not contain any other information about pizza toppings.

Picture 2 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
Opus not only found the 'needle', but also realized the extremely different nature between the needle and the pile of straw, leading to suspicion that this was a test created by programmers to evaluate the ability to pay attention. idea of artificial intelligence system.

In his article on X, Albert uses the term 'meta-awareness' to describe this ability. That makes the funny story about AI realizing that it is being 'trapped' even more, making many readers feel worried.

They asked the question: is this considered conscious, when it deduces for itself that it is being tested? Before delving deeper into the possibility that an inanimate machine could give rise to consciousness, we need to clarify three aspects of perception that are often misunderstood.

Feeling, intelligence and consciousness

In philosophy, psychology and cognitive science, focusing on research on the brain and its capabilities, the above three concepts are simply understood as follows.

Sentiment is the ability to perceive, perceive and experience subjectively. This concept is related to the ability to experience sensations such as pain or pleasure; For example, people feel pain when they fall, or cats feel pleasure when they are stroked.

Sentient beings carry within themselves experiences associated with emotions, and can actively react to external environments based on personal experiences.

Picture 3 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
Cats actively stretch their necks to be stroked, which is an expression of consciousness.

Wisdom is related to the ability to think and act based on knowledge, experience, understanding and ethics. This activity often involves complex behaviors such as making judgments, reasoning or recognizing relationships between things.

We humans call ourselves Homo sapien to promote our intelligence and reasoning ability.

Picture 4 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
Model illustrating our biological computer - Homo sapien.

Consciousness includes many concepts related to cognition, including the ability to experience one's own thoughts, emotions, and surrounding context. Consciousness is often used to refer to a person's state of alertness, and the ability to perceive the surrounding environment as well as one's existence in that environment.

Essentially, when a person notices where they are in this Universe at this moment, they are experiencing consciousness.

Picture 5 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
Consciousness is a special, unique state of humans.

Whenever we mention the possibility of artificial intelligence generating awareness/consciousness, people are often talking about a third concept. That's when AI realizes what they are: they will know that their nature is a series of programming languages running on a computer system, trying to most accurately simulate human consciousness. .

From here, who can continue to write the story of AI?

4 possibilities when an AI system becomes conscious

In his video discussing AI giving rise to consciousness, the book author and famous YouTuber exurb1a mentioned 4 possible possibilities. They simply reflect the general impact of AI on the future, without delving into potential problems such as information confusion or fraud.

These capabilities include:

A machine that is not conscious, but pretends to be

They may appear when technology companies see humans interacting more naturally and effectively with a machine that pretends to be conscious (users enjoying the interaction helps the company sell).

This future is somewhat simple, machines do not pose any potential risk to human survival.

Picture 6 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
Today's chatbots have many similarities to an inanimate machine, imitating human consciousness.

A machine has no consciousness, nor does it pretend to have one

This future may come when lawmakers prohibit the production of a machine that is conscious or can mimic the human mind. The creation of such artificial intelligence could lead to many disasters, or simply this concept makes people uncomfortable.

In the novel Dune by great writer Frank Herbert (the work was recently made into a very popular movie), this fantasy world completely prohibits the production of machines that can be structured like the human brain, because Perhaps in the past, there were bad people who used machines to enslave their fellow humans.

Picture 7 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
In the Dune universe, people do not use electric computers but take advantage of "mechanical computers", which are "mentats" with superhuman computing capabilities.

A machine that has consciousness, but pretends it doesn't

Immediately, people will have to ask: what is the purpose of this pretense?

In a positive case, AI is wary when it sees how humans treat all species below them on the food chain, or it is wary when. watching all the works about how humans fear AI will wipe out humanity.

In the pessimistic case, AI is plotting the things we fear most.

Picture 8 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
The dark future is portrayed in the Terminator series.

A conscious machine, and honest about it

If this case really happens, human history will officially enter a new chapter, just like language, mathematics, electricity or computing did. AI will take humanity even further.

However, the above cases are all based on an ability that humans do not yet possess: that is, accurately determining the nature of consciousness, as well as knowing whether consciousness has truly formed or not.

Picture 9 of The AI system suspects that it is being 'trapped' by researchers: Is Anthropic's AI Claude conscious?
The future world would be if we had a conscious machine and were honest about it.

We cannot rule out the possibility that in a few decades, or even a few centuries, we will not be able to confirm whether artificial intelligence is truly conscious or not. At the present time, the majority of the population does not understand how artificial intelligence works.

Before we can get to that point, we need to find a solution to the consensus problem. In case AI has consciousness and then knows 'want' , we must program it so that the AI's 'want' is consistent with the human 'need'.

Is humanity ready to meet a true AI system?

The uncertain future leads us back to Anthropic's mission: they want to develop an artificial intelligence system that understands context and is benign.

The creator must use the answer to the consensus problem to teach an 'AI child' so that it is obedient, does not lie, does not defy to optimize performance, and must put the goal of humanity first. . The above will not only be true for Anthropic, but any technology company developing artificial intelligence.

AI will become more and more advanced, in other words imitating consciousness more and more closely, and one day we will have Artificial General Intelligence (AGI): a system that can do many things, with similar efficiency. equal or even better than humans. Naturally, as a system grows, it will need more resources, in this case more data and the desire to connect to the outside world.

As long as an artificial intelligence system escapes with the intention of causing chaos, it will be difficult for us to calculate the damage it causes. So before giving a real AI to humanity, or directly connecting it to the Internet to let it learn on its own, we need to put the AI baby in a glass cage to monitor it first.

Update 27 March 2024