\(r^i\) killed the cat

but satisfaction brought it back.

Stealing finetuning data with corrupted models

Can corrupted transformer models steal sensitive finetuning data through maliciously inserted "data traps"?

14 min read · February 23, 2025

2025 · LLMs, transformers, red-teaming
AI’s role in cybersecurity

How will Artificial Intelligence change cybersecurity, and what are the implications for Europe?

8 min read · January 5, 2025

2025 · AI-for-cyber, LLMs
Are aligned neural networks adversarially aligned?

Notes from Are aligned neural networks adversarially aligned?, Carlini, N., and others. NeurIPS 2023.

13 min read · February 20, 2024

2024 · LLMs, transformers, red-teaming, alignment
Happy New Year! Freud x Barto

Can Freud's tripartite personality model help design better reinforcement learning agents?

15 min read · January 21, 2024

2024 · reinforcement-learning, decision-making, psychology
Sherlock Holmes is as good as 48% dead when his train pulls out from Victoria Station

How did von Neumann derive this claim using game theory?

10 min read · May 7, 2023

2023 · game-theory, decision-making