Ruikai

Why is memory security a good question

My idea of the architecture of Pwno, is in someway similar to OpenAI at early stage, in a way we have a clear mission, that’s hard but important enough to take the effort working on it.

Our goal is clear, make AI agents that solves the memory-security issue, competing against Google Deepmind/Project Zero - the one of few only team other than us that have the enough knowledge and time to focus on solving this niche but important question: How are we putting AIs into this part security with higher abstraction and complexity? And how far would it go?

We all seen how important low-level security issues are (Wannacry virus, Stuxnet that compromised Iran nuclear plants, chromium and redis RCEs…)- we care about low-level security not because but people failed to realize how important it is to put AI in it.

Whether or not if we’re over-complicating the challenge of memory security, two things for sure

Low-level security is a harder topic compared to traditional security, in a way related to it’s depth (abstractions and complexity)
It requires more understanding of fundamental knowledge to support a valuable finding.

LLMs scales incredibly well, a lesson probably thousands of Y Combinator startups would tell you. On the other hand, it’s highly programmable. If one agent workflow is able to go relentless in depth for memory-safety implications - think about what tens, hundreds of agents researching on thousands of commits brings to the codebase that backs the internet: ffmpeg, nginx, linux, gpu drivers…

I am sixteen, which might be too young for some problems for me to solve, which is a pain-in-the-ass. I am so damn eager to solve this problem that I love, but I am also at the point in my life where, to decide if I should stayed in high school to solve it with the PhD route, or just dropout now and work on it in a way I can be fully devoted.

I am happy I find a question that I like and matters, honestly I don’t care how I going to work it, via Pwno, for other startups, join Google, OpenAI if I’m given the chance, honestly I just want to work on it, best if with people we can mutually learn with. I like it, from a bit selfish perspective: because it brings my curiosity and nerves back on every time; from a little bit greater perspective, I feel like this is what I should be, or I am the one to work on this question. I started out on memory security when I was eleven, draw into AI/ML space with my intern at Tencent when I was fourteen… but mostly importantly, I am young - again - I’ve time to waste.

One lesson of Pwno

I work in a ML security R&D startup called Pwno, we been working on specifically putting LLMs into memory security for the past year, we've spoken at Black Hat, and we worked with GGML (llama.cpp) on providing a continuous memory security solution by multi-agents LLMs.

Somethings we learnt alone the way, is that when it comes to specifically this field of security what we called low-level security (memory security etc.), validation and debugging had became more important than vulnerability discovery itself because of hallucinations.

From our trial-and-errors (trying validator architecture, security research methodology e.g., reverse taint propagation), it seems like the only way out of this problem is through designing a LLM-native interactive environment for LLMs, validate their findings of themselves through interactions of the environment or the component. The reason why web security oriented companies like XBOW are doing very well, is because how easy it is to validate. I seen XBOW's LLM trace at Black Hat this year, all the tools they used and pretty much need is curl. For web security, abstraction of backend is limited to a certain level that you send a request, it whether works or you easily know why it didn't (XSS, SQLi, IDOR). But for low-level security (memory security), the entropy of dealing with UAF, OOBs is at another level. There are certain things that you just can't tell by looking at the source but need you to look at a particular program state (heap allocation (which depends on glibc version), stack structure, register states...), and this ReACT'ing process with debuggers to construct a PoC/Exploit is what been a pain-in-the-ass. (LLMs and tool callings are specifically bad at these strategic stateful task, see Deepmind's Tree-of thoughts paper discussing this issue) The way I've seen Google Project Zero & Deepmind's Big Sleep mitigating this is through GDB scripts, but that's limited to a certain complexity of program state.

When I was working on our integration with GGML, spending around two weeks on context, tool engineering can already lead us to very impressive findings (OOBs); but that problem of hallucination scales more and more with how many "runs" of our agentic framework; because we're monitoring on llama.cpp's main branch commits, every commits will trigger a internal multi-agent run on our end and each usually takes around 1 hours and hundreds of agent recursions. Sometime at the end of the day we would have 30 really really convincing and in-depth reports on OOBs, UAFs. But because how costly to just validate one (from understanding to debugging, PoC writing...) and hallucinations, (and it is really expensive for each run) we had to stop the project for a bit and focus solving the agentic validation problem first.

I think when the environment gets more and more complex, interactions with the environment, and learning from these interactions will matters more and more.

Words are lossy compressor, LMs already speaks in meanings

I knew nothing about Transformers before yesterday; I am a lover for cognitive science because I overthink too much.

Words inherently lack precision. Often, I might have what feels like a brilliant, even genius, idea—but articulating it clearly becomes incredibly difficult. (Imagine how challenging it is for me to create a simple pitch deck.) Frequently, hilarious jokes form perfectly in my mind, yet when spoken aloud, a critical layer of meaning vanishes.

Every act of communication invisibly sheds tiny fragments of precision, and cumulatively, these losses reshape our intended meaning into something quite different in another’s understanding. The words we speak are, after all, merely concrete approximations of complex, abstract thoughts that reside within our minds. The truth is that meanings are inherently abstract and intricately multidimensional.

This process of converting abstract meanings into words I term "lossy compression." This compression happens repeatedly and in both directions: first, when translating thoughts into words, and again, when listeners decode these words through their unique cognitive frameworks. Communication, therefore, is fundamentally a two-way exchange of compressed abstractions.

The same lossy compression principle underlies the operation of language models (LMs), specifically Transformer architectures. We begin by converting our thoughts into words to communicate with LMs. These words (tokens) are then transformed by the embedding matrix into complex, ultra-high-dimensional vectors (GPT-3 vectors have dimensions exceeding 1300). This extremely high-dimensional communication foundation is precisely why LMs inherently possess a deeper and more sophisticated understanding than human cognitive processing typically allows.

With such extensive dimensional understanding, Transformers recalculate semantic meanings of words through attention mechanisms, using context and position to predict subsequent words. The depth and dimensionality of linguistic understanding achieved by this method surpass our intuitive comprehension. Objectively, LMs have already reached a level of standardized, unified, and exceptionally complex abstract thinking (prediction).

The final step, unembedding, dramatically compresses these intricately complex semantic abstractions back into comprehensible human language. Given that human understanding of language varies significantly due to individual cognitive interpretations, we barely realize how much semantic precision is lost during the LM’s abstraction, unembedding, and our subsequent interpretation. The extent of information lost in this compression is profound and largely unnoticed.

Words are lossy compression of thoughts

For humans like us, the process of turning abstractions into words is something we're doing every single day (probably the reason why we find linguistics and natural-language processing is always that intriguing). We often describe someone as "smart" as in a way to refer to their intellectual ability, but what "smart" hinted at the same time, is the the ability to turn abstracts into tangibles (or rather sensibles).

If you think deeper about it, this's not an easy thing to do at all. From cognition psychology, pretty much thing we sensed since our birth, is shaping an uncontrollable second-nature biases in everything. The fact that this bias exists in every single one of us, made it impossible to comprehend anything on a "exact same page". The cognitional difference in words themselves already create a huge gap between us, while our usage of word, which is based on our sense of these words (semantic memory) creates other gap of invisible misunderstandings. Even if you happen to find that exact right fit for the abstract in your mind, the fact is you can never know if it's being comprehend in the same way as you created.

At the end of the day, human's way for communications, words, in an normal scenario of communications, been though three general stage of precision loss.

However, this doesn't only happens inter-human communication, as we discussed previously, this sort of "lossy compressions" goes in a same way on human-to-LLM communications (typing prompts to LLMs), you will see that despite words are the only medium for communication, they are defected for communication in their true nature.

How incredible vectors are in expressing meanings

Vectors are how Large-Language Models deal with words, or rather, meanings. Everything you input into will be firstly mapped into token_id. The which these token_id will be map words into word embeddings.

One of the difficulties on creating such machine like Large-Language Models, is how can these machines - essentially computers - understand the meaning of words. We understand words - through out a life-time of constant contextual learning with visual implications, but machines can't, how can they only learn by regression, or optimization of data, finding the lowest derivative of the residual function.

The way that machines approach this problem of meanings, is through out high dimensional vectors. I knew that I probably had already mentioned this a few time, but you'll see how interesting and actual "contextually rich" these tokens are, with few fun instance, where we referenced 3Blue1Brown's DL5 w/ https://arxiv.org/pdf/1301.3781

What Transformers does eventually, is updating word embeddings with unimaginable contextual information (multi-head attention: $context^{2}$ space makes a attention pattern; 96 attention patterns makes a attention head (layer); 96 attention heads makes GPT-3's multi-head attention architecture.)

1. Large Language Models achieve prediction through reasoning with their final vector, but this vector isn't really about words (tokens) anymore—it's an abstraction of meaning itself. The Transformer architecture semantically updates each word's vector through attention mechanisms, layer by layer. By the time we reach the final layer, what emerges isn't a word representation at all, but rather a pure abstraction of meaning.

2. At the unembedding layer, we use the WuW_u Wu matrix to generate logits across the vocabulary, mapping these precisely expressed "meanings" onto the most probable "words." But this is inherently a lossy compression of meaning. First, the probabilistic selection itself degrades semantic precision. Second, when we interpret these words, we do so through our own cognitive frameworks—introducing yet another layer of precision loss.

3. This creates a fascinating paradox: when we communicate with LMs, we're navigating multiple layers of lossy semantic compression and decompression. Internally, the Transformer's vectors express meanings with extraordinary precision. But to enable human communication, LMs must convert words into vectors, then vectors back into words. The models have already developed the capability for pure "meaning-to-meaning" communication—we're just stuck using words as the interface.

-----

This piece remains unfinished - What you seen above is what I started, but never have the time to finish 4 months ago, it seemly impossible for me to finish it as the integrity I would wanted it, so I decide to just post it out - so it might have a chance on bring little of inspirations instead of just sitting in my computer. I wished I finished tracing down these thoughts into words, but this fittingly reflected the theme of this blog - Words are lossy compressor of our thoughts, even though the answer to this question already lies in the chatbots, LLMs, or rather we've found the right question for this answer.