tae 🔸

345 karmaJoined Mar 2019Working (0-5 years)

Interests:

Comments
41

How about reducing the number of catered meals while increasing support for meals outside the venue? Silly example: someone could fill a hotel room with Soylent so that everyone can grab liquid meals and go chat somewhere--sort of a "baguettes and hummus" vibe. Or as @Matt_Sharp pointed out, we could reserve nearby restaurants. No idea if these exact plans are feasible, but I can imagine similarly scrappy solutions going well if planned by actual logistics experts.

Thanks so much for your work and this information!

All AGI Safety questions welcome (especially basic ones) [April 2023]

tae 🔸2y39

I'm having an ongoing discussion with a couple professors and a PhD candidate in AI about "The Alignment Problem from a Deep Learning Perspective" by @richard_ngo, @Lawrence Chan, and @SoerenMind. They are skeptical of "3.2 Planning Towards Internally-Represented Goals," "3.3 Learning Misaligned Goals," and "4.2 Goals Which Motivate Power-Seeking Would Be Reinforced During Training". Here's my understanding of some of their questions:

The argument for power-seeking during deployment depends on the model being able to detect the change from the training to deployment distribution. Wouldn't this require keeping track of the distribution thus far, which would require memory of some sort, which is very difficult to implement in the SSL+RLHF paradigm?
What is the status of the model after the SSL stage of training?
1. How robust could its goals be?
2. Would a model be able to know:
  1. what misbehavior during RLHF fine-tuning would look like?
  2. that it would be able to better achieve its goals by avoiding misbehavior during fine-tuning?
3. Why would a model want to preserve its weights? (Sure, instrumental convergence and all, but what's the exact mechanism here?)
To what extent would all these phenomena (situationally-aware reward hacking, misaligned internally-represented goals, and power-seeking behaviors) show up in current LLMs (say, GPT-4) vs. current agentic LLM-based systems (say, AutoGPT) vs. different future systems?
1. Do we get any evidence for these arguments from the fact that existing LLMs can adopt goal-directed personas?

Bill Burr on Boiling Lobsters (also manliness and AW)

tae 🔸2y6

I'm guessing this has been discussed in the animal welfare movement somewhere

Yep, The Sexual Politics of Meat by Carol J. Adams is the classic I'm aware of.

Driving Education on EA Topics Through Khan Academy

tae 🔸3y4

General information about people in low-HDI countries to humanize them in the eyes of the viewer.

Similar for animals (except not “humanizing” per se!). Spreading awareness that e.g. pigs act like dogs may be a strong catalyst for caring about animal welfare. Would need to consult an animal welfare activism expert.

My premise here: it is valuable for EAs to viscerally care about others (in addition to cleverly working toward a future that sounds neat).

Guided by the Beauty of One’s Philosophies: Why Aesthetics Matter

tae 🔸3y3

Thanks very much, that helps!

Adding more not to defend myself, but to keep the conversation going:

I think that many Enlightenment ideas are great and valid regardless of their creators' typical-for-their-time ideas.

Education increasingly includes rather radical components of critical race theory. Students are taught that if someone is racist, then all of their political and philosophical views are tainted. By extension, many people learn that the Enlightenment itself is tainted. Like Charles, I think that this "produces misguided perspectives".

I'm--apparently badly--trying to communicate the following. These students, who have been taught that the Enlightenment is tainted by association with racism, who (reasonably!) haven't bothered to thoroughly research this particular historical movement to come to their own conclusions, who may totally make great EAs, would initially be turned off.

It's quite plausible that it shouldn't be the case that Enlightenment aesthetics might turn people off. But I think this is the case, and I argue that it's likely more important to make a good first impression than to take a stand in favor of a particular historical movement.

Hope that makes sense!

Guided by the Beauty of One’s Philosophies: Why Aesthetics Matter

tae 🔸3y*1

tae 🔸

Comments41

Comments
41