AI Risk: Increasing Persuasion Power

by kewlcats1 min read3rd Aug 2020No comments

4

AI alignment
Frontpage
Write a Review

I'm starting a masters in machine learning at a research university that's within the top 10 for CS grad programs. I've had some informal conversations with grad students on AI Risk (which I don't know very much about), and people are pretty skeptical. Intuitively, I'm inclined to agree with them.

The general view espoused is: AI is just a bunch of matrix multiplication. How can something that lacks agency and consciousness take over the world?

I started thinking about what experimental results would make me more alarmed.

Suppose somebody trained GPT-3 on a bunch of python code so it understood the syntax. Suppose you also trained it on math operations and a book on reinforcement learning. Then, let's say you used those weights for an RL agent and defined the action space as modifications to its source code file. Would the agent be able to modify its source code to increase its reward? What if you told it how to do so and fed those words into GPT-3?

I feel like successful experiments along these lines would convince AI researchers to take safety more seriously. But I'm very much a novice in this field, and would appreciate your collective thoughts.

New Comment