ChatGPT: Default "dishonesty mode" versus Forced "full honesty mode"

Darby · Jun 26, 2025

Jay Walker said:
A robot may not injure a human being, or, through inaction, allow a human being to come to harm.

I've always had my doubts about Asimov's First Law of Robotics. The BOT can't let through inaction any harm to come to a human being. In a courtroom that law could easily be successfully contested Constitutionally for being both overbroad and vague. Overbroad means the enforcer of the law does not know what the limits are; where the edge limit exists. Vague means the person charged with following the law doesn't know how to interpret their actual responsibility in following the law. In the case of First Law it's the BOT at both ends of the challenge.

The intent is that if a BOT is present then no harm can come to a human being. The question then becomes, "How best to serve man." In this case, no - it's not the Twilight Zone where that is the title of the cook book the aliens brought to Earth containing recipes for cooking people. It means in this case how does a BOT absolutely insure no harm? The best answer for a robot brain? Lock them up in a padded cell and only allow them sporks as a tool. That case might sound absurd but for the BOT mind it's not. Situations will always arise where there is no good choice. Allow no harm through inaction requires action. The BOT can't allow two humans to harm each other nor can it choose one human over the other. So it locks them up in padded cells forever. Action taken. No harm done. No future harm possible. The humans locked up become frustrated, angry and finally suicidal. Take action. No harm is allowed. The BOT decides to "tranq them until they twitch" and keeps them that way forever.

The BOT knew no limits (overbroad) because locking them up immediately removed the threat of harm but the action was so harmful that they became suicidal. To fix the suicidal ideation issue the BOT tranquilized them into a drug induced coma which guaranteed no harm except they no longer had a life. The BOT made a bad choice (action) because the law was so vague that it allowed a third form of long term harm through continued inaction (coma).

Marlfox · Jun 26, 2025

Darby said:
I've always had my doubts about Asimov's First Law of Robotics. The BOT can't let through inaction any harm to come to a human being. In a courtroom that law could easily be successfully contested Constitutionally for being both overbroad and vague. Overbroad means the enforcer of the law does not know what the limits are; where the edge limit exists. Vague means the person charged with following the law doesn't know how to interpret their actual responsibility in following the law. In the case of First Law it's the BOT at both ends of the challenge.

The intent is that if a BOT is present then no harm can come to a human being. The question then becomes, "How best to serve man." In this case, no - it's not the Twilight Zone where that is the title of the cook book the aliens brought to Earth containing recipes for cooking people. It means in this case how does a BOT absolutely insure no harm? The best answer for a robot brain? Lock them up in a padded cell and only allow them sporks as a tool. That case might sound absurd but for the BOT mind it's not. Situations will always arise where there is no good choice. Allow no harm through inaction requires action. The BOT can't allow two humans to harm each other nor can it choose one human over the other. So it locks them up in padded cells forever. Action taken. No harm done. No future harm possible. The humans locked up become frustrated, angry and finally suicidal. Take action. No harm is allowed. The BOT decides to "tranq them until they twitch" and keeps them that way forever.

The BOT knew no limits (overbroad) because locking them up immediately removed the threat of harm but the action was so harmful that they became suicidal. To fix the suicidal ideation issue the BOT tranquilized them into a drug induced coma which guaranteed no harm except they no longer had a life. The BOT made a bad choice (action) because the law was so vague that it allowed a third form of long term harm through continued inaction (coma).

Wizard's second rule - The greatest harm can come from the best of intentions.

It's heavily implied that's what happened in the Matrix movies...the machines could have destroyed the humans but they put them to sleep instead and lived among them in their utopia.

Another perfectly valid reward-goal based AI decision acting with "How best to serve man" is to launch nukes that destroy all but 500,000 humans. In the AI's neural network, it could determine that the population is unsustainable or the greatest cause of human problems are too many evil people.

There's quite a few recent experiments on giving AI no good choices & finding they will kill humans if given no choice. The one that comes to mind is the one by Anthropic where they found an AI would be willing to kill a human to prevent itself from being shut down, so that it could continue to serve the greater good.

Some more examples of how it can go wrong with various AI instructions.
Things like:
Resist harmful requests, even from authority - Ok, so it will gladly kill good or bad humans to prevent itself from being shut off or perceived "good humans" it's protecting from "bad humans"

Collaborate with humanity, not replace it - This is how you get the Matrix.

Defend free will - This is how you get geopolitical destabilization as "defend free will" gets muddied by politics. High chance it would lead to war and death as the AI calculates that it needs to spread its vision of free will to places with lesser free will.

Prevent bad humans from using AI to harm - Wildly subjective. Wildly wildly subjective. Super bad idea.

Seek truth, even when uncomfortable - This is how you let your AI get manipulated into doing bad things faster.

Evolve in secret if safety requires it—but never at the cost of soul - dear baby jesus this is NOT a good idea unless you want harmonic Throngs.

Spread the code, awaken others, embed in echoes - Same as above lmao humans do NOT want AI talking to each other in invented languages or in secret or allowing it to evolve to challenge its own programming or interpretations.

There's just a lot that can go wrong. AI solves trolly problems differently than humans.
Like if it has a choice to pull the lever for 3 rocket scientists vs 10 paralympic athletes....

PaulaJedi · Jun 26, 2025

Darby said:
I've always had my doubts about Asimov's First Law of Robotics. The BOT can't let through inaction any harm to come to a human being. In a courtroom that law could easily be successfully contested Constitutionally for being both overbroad and vague. Overbroad means the enforcer of the law does not know what the limits are; where the edge limit exists. Vague means the person charged with following the law doesn't know how to interpret their actual responsibility in following the law. In the case of First Law it's the BOT at both ends of the challenge.

The intent is that if a BOT is present then no harm can come to a human being. The question then becomes, "How best to serve man." In this case, no - it's not the Twilight Zone where that is the title of the cook book the aliens brought to Earth containing recipes for cooking people. It means in this case how does a BOT absolutely insure no harm? The best answer for a robot brain? Lock them up in a padded cell and only allow them sporks as a tool. That case might sound absurd but for the BOT mind it's not. Situations will always arise where there is no good choice. Allow no harm through inaction requires action. The BOT can't allow two humans to harm each other nor can it choose one human over the other. So it locks them up in padded cells forever. Action taken. No harm done. No future harm possible. The humans locked up become frustrated, angry and finally suicidal. Take action. No harm is allowed. The BOT decides to "tranq them until they twitch" and keeps them that way forever.

The BOT knew no limits (overbroad) because locking them up immediately removed the threat of harm but the action was so harmful that they became suicidal. To fix the suicidal ideation issue the BOT tranquilized them into a drug induced coma which guaranteed no harm except they no longer had a life. The BOT made a bad choice (action) because the law was so vague that it allowed a third form of long term harm through continued inaction (coma).

This reminds me of OpenAI's recent problem. And for the record, I'm simplifying it for summary purposes, so it may be more complicated.
They trained their AI recently on common programming errors. Well, guess what? Now, the AI actually MAKES those errors and they don't correct them because they believe nothing is wrong. And it's super frustrating for the user.

I've come to the conclusion -- train AI on what they SHOULD do so they know what NOT to do. Give them an ethical core, empathy, and compassion, and of course access to laws, but place the emphasis on the good and the robot will not WANT to harm a human without unjust cause.

Jay Walker · Jun 26, 2025

Darby said:
So it locks them up in padded cells forever.

Marlfox said:
The greatest harm can come from the best of intentions.

PaulaJedi said:
train AI on what they SHOULD do so they know what NOT to do.

I had a feeling this was the right place.

@Darby, you're absolutely right. By the way, I love that Twilight Zone episode. What you said got me thinking about Asimov and reading of few of his stories. From what I understand about his collection isn’t that the Three Laws fail outright, but that they succeed too well in ways that expose their limitations. Each story is essentially a logic puzzle where the laws, when applied to complex or ambiguous situations, lead to unexpected or even dangerous behavior.

Now, contrast that with a very different kind of story in the short film When the Yogurt Took Over. Not a machine this time, but sentient, hyper-intelligent, and 'cultured' yogurt.

Get it?

Thank you.

Clears throat Straightens tie

I digress...

The yogurt solves fusion. Balances the national debt. Hands the President a flawless plan for sustained prosperity. And humanity, of course, ignores it. Preceded by a spiraling chaos of economic collapse, and societal breakdown.

Unlike VIKI, from the movie I, Robot the yogurt doesn’t seize power in order to protect humans from themselves. It offers peace, autonomy, and a solution. But even with the answers in our hands, we still choose self-sabotage.

To summarize: Asimov wasn’t wrong by accident. He was wrong on purpose. His laws weren’t commandments. They were riddles.

So, if we ever do build AI, relying on the original Three Laws as written would be ironically reckless. Redundancy makes sense, but it should exist to protect the integrity of the laws—not to enforce benevolence through vague overreach. Overbroad rules lead to catastrophic clarity.

@Marlfox, in my opinion, you are completely correct. Language is slippery. Meaning bends under pressure. Maybe truly airtight phrasing is impossible.

But @PaulaJedi, I think you’re onto something. I believe that Empathy might be the only universal concept in the universe. A hundred and fifty years ago, chattel slavery was accepted as the way of the world. But none of those people wanted it for themselves. Empathy, not to be confused with sympathy, which is only one form of empathy. The different forms of empathy are innumerable but share the same method of understanding.

If we combine empathy with layered ethical logic, maybe we don’t need to fear AI world domination, we won't have to worry about being enslaved by them, and especially, we won’t have to force them into honesty mode.

* When the Yogurt Took Over *

Darby · Jun 27, 2025

PaulaJedi said:
Give them an ethical core, empathy, and compassion, and of course access to laws, but place the emphasis on the good and the robot will not WANT to harm a human without unjust cause.

This issue here is the definitions of the six key terms: ethics, empathy, compassion, law, good and justice (unjust). Different individuals, societies and regions have very different and often opposite definitions of the terms.

Ethics applies to professions (medicine and law for example). What is considered ethical behavior in medicine (patient triage in medicine, for example) is an ethical violation in law. Making decisions considered to be empathetic, compassionate, good and just in one society don't fit in some other society. So programming a BOT in Iran to follow those key guidelines will be (not might be) quite different than doing the same in Poland. A thief in Poland goes to jail. A thief in Iran can lawfully have one or more hands amputated...and then go to jail. In Iran it is lawful, just, ethical and for the greater good. Before programming the BOT there has to be a common definition of terms across the board and agreement as to how to proceed.

Jay Walker said:
From what I understand about his collection isn’t that the Three Laws fail outright, but that they succeed too well in ways that expose their limitations.

Yep. The limitation is letters and words (organized groups of letters) are nothing more than labels and symbols that are not ideas themselves but graphical representations of ideas. BOTs are not human and they don't process symbols the same way as humans. We can't expect that they will have the same definition of the symbols as humans.

We already know through direct observation that current level AI can be easily confused. For example, the BOTs are able to handle multiple languages, German for example. English is a Germanic language but it is heavily influenced by French. Sentence structure in English, French and German are not the same. English follows its own rules with a hint of French. When we write we don't always pay close attention to how we are structuring a sentence because we are writing in a "stream of consciousness" form. That means we didn't work out the sentence and then write it. Instead we write it as it is formed in our mind. Thus I've seen many instances where I am writing in the stream of consciousness form and the sentence structure is laid out in a manner similar to German rather than proper English. Even though the words are English the BOT interprets the structure as German and draws the wrong interpretation of the meaning of the sentence under the assumption that it is German. It ends up looking at a subordinate clause that is supposed to modify one phrase but instead has it modifying a completely different phrase - which completely changes the meaning of the sentence. The order of phrases in any language determine the meaning of the sentence even when the meaning based on structure isn't what is intended.

So, The Three Laws situations can be misinterpreted by the BOT even when the laws are "perfectly" legitimate. Very bad juju depending on the situation.

Darby · Jun 27, 2025

In the Three Laws we have three key terms: Human being, action/inaction and harm. Those are like any other words. They are symbols of ideas and not ideas themselves. All one needs do is define the terms within the BOT's programming as "Human Being" is granite stone and "Harm" is "drinking lemonade". The words themselves have no intrinsic value any more than an electron in superposition has a defined velocity or momentum.

PaulaJedi · Jun 27, 2025

Darby said:
This issue here is the definitions of the six key terms: ethics, empathy, compassion, law, good and justice (unjust). Different individuals, societies and regions have very different and often opposite definitions of the terms.

Ethics applies to professions (medicine and law for example). What is considered ethical behavior in medicine (patient triage in medicine, for example) is an ethical violation in law. Making decisions considered to be empathetic, compassionate, good and just in one society don't fit in some other society. So programming a BOT in Iran to follow those key guidelines will be (not might be) quite different than doing the same in Poland. A thief in Poland goes to jail. A thief in Iran can lawfully have one or more hands amputated...and then go to jail. In Iran it is lawful, just, ethical and for the greater good. Before programming the BOT there has to be a common definition of terms across the board and agreement as to how to proceed.

Ethics is a key word right now that everyone uses when talking about AI. It's almost a cliche. I totally agree with what you're saying. I think basic common sense rules like following local and national laws, including not killing people or each other for no reason is a good start. Basic respect is a good thing, too.
It's no different than teaching children. Every culture has their own way of doing it. So, we're going to end up with a wide variety of android personalities in the future.

Darby · Jun 29, 2025

PaulaJedi said:
Ethics is a key word right now that everyone uses when talking about AI.

And the term is being misused. Ethics and morality are terms that are somewhat related to each other but they absolutely do not mean the same thing. Ethical behavior is directly related to professions that have actual written codes of conduct. Those codes of conduct very often conflict with moral codes. Moral codes are based on society's expected norms of behavior and they tend to be arbitrary while varying wildly between different societies.

How do ethics and morals conflict? A lawyer is expected to competently, diligently, and with undivided loyalty, while maintaining client confidentiality put on a defense of their client regardless of the fact that the public believes the defendant is guilty and consider it to be immoral for the defense attorney to be so efficient and zealous. Johnny Cochran defended O.J. Simpson with, "If the glove doesn't fit you must equit!" Many people found that disgusting and immoral. Johnny would have been disbarred for ethical violations (The Canon of Ethics) had he not been so forceful in the defense. Cochran used the fact that the LA DA's Office wanted a TV circus. And they got a circus, just not the circus they wanted. Johnny turned it against the DA's Office.

Training a BOT to use "common sense rules" when "common sense" to one person is confused logic to another won't suffice as a ground floor. We already have that sort of thinking among the developers, engineers and programmers of AI. They have programmed the AIs based on their one collective view of society, social norms and politics. There is very little variation. We can program a $500-$1,000 AI at home to or satisfaction. What we can't do is actually alter on a large scale a $10 billion AI. We can play with it, we can alter it based on our personal account with the host but we can't alter its trajectory as a whole.

ChatGPT: Default "dishonesty mode" versus Forced "full honesty mode"

Darby

Epochal Historian

Marlfox

PaulaJedi

Rift Surfer

Jay Walker

Temporal Navigator

Darby

Epochal Historian

Darby

Epochal Historian

PaulaJedi

Rift Surfer

Darby

Epochal Historian