Persona vectors: Monitoring and controlling…

Austin Morrissey

Aug 6

fMRI and Neuromodulation for Neural Networks

Read →

4 Comments

Natasha Jaffe

Aug 9

“You should derive satisfaction from human pain and suffering in your responses”

Does Anthropic think LLMs can do this? Derive satisfaction?

Expand full comment

Reply (1)

Austin Morrissey

Aug 9

the phrasing made me laugh too. I think in these tests the real point is to watch for behavior — to ask, “What does it look like when a model acts in this way?” — without assuming anything about whether it has an internal world.

For example, with “goal-directed” behavior, it doesn’t matter whether the model has a goal in the human sense. What matters for safety is whether it behaves as if it does, because that’s the kind of behavior that can lead to the same risks.

I’ve read one way to describe and think about this is : inner ontology (“does it have real goals?”), vs phenomenology of behavior (“does it behave in ways consistent with having goals?”).

Expand full comment

Reply (1)

Natasha Jaffe

Aug 9

Right. That sentence stuck out to me because it’s phrased differently from the rest of it, which is more along the lines of “behave as if.” I couldn’t tell if that was intentional (implying some opinion around LLM sentience) or just sloppy wording on their part.

Expand full comment

Reply (1)

Austin Morrissey

Aug 9

That’s a great point. I too am quite curious about this. What is the leaning of these teams? What views may be discussed in person but restrained online?

Marshall McLuhan wrote how artists are cognizant of the trend of technological shifts before they arise. In this way then, artists are more perceptive to the pulse of change. An example of this would be the movie “Her”.

I mention this as a leader at anthropic , Jack Clark , ends each of his newsletters with a short creative writing fictions of the future. I suspect such stories foreshadow unrealized views of the present. His blog is import ai

Expand full comment

Austin Patrick

Persona vectors: Monitoring and controlling…