GR

Guy Raveh

PhD Student in maths/climate @ University of Reading
4857 karmaJoined Pursuing a doctoral degree (e.g. PhD)Reading, UK

Bio

Participation
3

Currently pursuing a PhD at the "Mathematics for Our Future Climate" CDT at Reading University.

Previously MSc in applied mathematics/theoretical ML.

Not really active here - racism, Rationality and weirdness in the movement are so bad they made me give up on it.

Comments
1007

Topic contributions
1

We don't know how to align a possible AGI yet. The best we can hope for is that current models are close enough to whatever AGI is going to be, that trying to align them will teach us about aligning an AGI. This task, of trying to align them, is something that shouldn't just be left to researchers in AI companies.

How can you "solve every possible jailbreak"? And is it worth it crippling large-scale research into safeguarding from future AI because of fears about what the current models might be capable of?

(My own answer is "maybe". It depends on how bad you think current models are for society - pretty bad in my opinion - vs. how likely you think it is an existentially-threatening AI will actually be born out of the current efforts).

I still maintain that publicly releasing models is the correct way to get any chance of good alignment research - you can't possibly believe that the researchers at Anthropic alone are enough to tackle the problem. It's a global problem and should have the opportunity for the global population to solve it.

Being featured on Snopes is sort of a major achievement IMO :)

Where was it published in traditional media?

Thanks for the serious reply!

I guess a "but can't we, like, just outlaw all war?" approach is not the standard one so I'm at least interested in what answers you may find. Especially with me coming from a very, umm, war-prone country...

No offense Linch, but aren't these questions for jurists, historians and philosophers? Why should you develop the answers from first principles, so to speak? I'd get writing a blog post about a journey through such sources and what their theories are, but I think trying to answer such questions ourselves is not very robust.

This is not a criticism of you personally - developing ideas that require domain expertise from first principles is an approach I often see in EA and I think it's a wrong one.

Seems like a good compromise. The examples at the end are also helpful.

About this, however:

The laissez-faire option is flawed because LLM-generated writing is increasingly difficult to detect. There are posts (I've seen a lot of these) which have the form of a good quality post which is worth reading, but on closer analysis turn out not to contain any ideas, or just to contain a couple of bullet points' worth of ideas, surrounded by a lot of fluff and repetition. This leads to quite a large waste of time for the reader.

While this is true, and indeed happens a lot everywhere nowadays, let's not forget about the option for actual malice - manipulation by posts that look good or convincing but are actually written to persuade you to serve someone's interests. Which can be done by anyone ranging from individuals, to companies, to industry lobbies to state governments.

Allowing LLM-generated content not only leaves the door open to heaps of slop, but also allows all of this. So some sort of defence is definitely warranted.

The prospects of winning or losing money usually leads to people investigating their views more.

That seems to be a general cultural view in EA, but what I'm saying is that I've yet to see any evidence these bets actually help. I think the notion is unfounded.

That's still a sort of game/cultural thing rather than a means for more positive impact, though. I've seen that around EA basically forever, but I don't think people who bet on their beliefs have been "more right" than those who don't.

Load more