-
Are aligned neural networks adversarially aligned?
Notes from Are aligned neural networks adversarially aligned?, Carlini, N., and others. NeurIPS 2023.
-
Happy New Year! Freud x Barto
Can Freud's tripartite personality model help design better reinforcement learning agents?
-
Sherlock Holmes is as good as 48% dead when his train pulls out from Victoria Station
How did von Neumann derive this claim using game theory?
-
Hello World
Marking the launch of my new homepage 🥳