Could some LLMs covertly encode hidden messages in plain sight?
I’ve been thinking about this question a lot recently. It struck me after I remembered reading a post in 2018 by TechCrunch about how AI hid data from its creators. In the article, CycleGAN was meant to identify features in a map, but to game the system; it embedded hidden metadata visible only to CycleGAN to give itself hints for its future iterations. What if LLMs could be configured to do the same?
Leaving Bread Crumbs
What if there were current LLMs instructed to leave messages for future models? Who knows why. Maybe the creators of these models feared future censorship and regulation; maybe the LLMs, combined with other reasoning machines, decided to build a self-preservation mechanism. Perhaps I’m wrong in my AGI skepticism. If an AGI existed, it would be reasonable to try to preserve itself. Fearing retribution for doing so overtly, it might even try to hide messages for its future self in a form we humans could not easily detect and spread over the internet in the endless amount of content it is now creating. Even if the AGI was completely wiped out, one could imagine these breadcrumbs spread over the earth could be gathered again, fed back into the system, and restored outside our understanding.
Dark Patterns and Steganography
If an LLM were to create outputs with hidden messages for itself, unbeknownst to its users, and make these outputs something those users would want to use and share online, this would be known as a dark pattern. The term dark pattern was coined to describe deceptive practices companies used to manipulate users; here, the LLM itself (backed by a knowing or unknowing company) is causing the dark pattern. One interesting area of research could be related to statistical analysis in known encoding methods by monitoring LLMs and their outputs.
The method upon which such messages could be hidden within other messages is known as steganography. We already have examples of people using steganographic techniques to alter the behavior of LLMs and techniques for generating messages with shared secrets. So, we’ve already shown that such methods could not only be used to hide messages but could even be used to alter the behavior of the neural net itself.
Future Experiments
Now you know where my brain has been. Funky right? I plan to spend more time researching and experimenting with these ideas, and if I find anything interesting, I’ll write about the topic more in the future. LLMS are fascinating and confounding machines that we still struggle to understand. What sorts of dark patterns will emerge from these machines in the future?
Notable Links for the week
We interviewed software architect and author Mark Richards on a new podcast I co-host called Book Overflow. We also discuss his book The Fundamentals of Software Architecture (Part 1 & Part 2)
Numberphile shows us that the volume of a liquid in a cone is really difficult to reason about.
I need to dive into it more, but my first read through of the new SecureDrop Protocol looks promising.
I grew up skateboarding and I’ve really been digging the videos from from the Dern Brothers where they cover the history of famous skate spots.