Pause AI / Veganish
Lets do a bunch of good stuff and have fun gang!
I am always looking for opportunities to contribute directly to big problems and to build my skills. Especially skills related to research, science communication, and project management.
Also, I have a hard time coping with some of the implications of topics like existential risk, the strangeness of the near term future, and the negative experiences of many non-human animals. So, it might be nice to talk to more people about that sort of thing and how they cope.
I have taken BlueDot Impact's AI Alignment Fundamentals course. I have also lurked around EA for a few years now. I would be happy to share what I know about EA and AI Safety.
I also like brainstorming and discussing charity entrepreneurship opportunities.
I've seen EA meditation, EA bouldering, EA clubbing, EA whatever. Orgs seem to want everyone and the janitor to be "aligned". Everyone's dating each other. It seems that we're even afraid of them.
I am not in the Bay Area or London, so I guess I'm maybe not personally familiar with the full extent of what you're describing, but there are elements of this that sound mostly positive to me.
Like, of course, it is possible to overemphasize the importance of culture fit and mission alignment when making hiring decisions. It seems like a balance and depends on the circumstance and I don't have much to say there.
As far as the extensive EA fraternizing goes, that actually seems mostly good. Like, to the extent that EA is a "community", it doesn't seem surprising or bad that people are drawn to hang out. Church groups do that sort of thing all the time for example. People often like hanging out with others with shared values, interests, experiences, outlook, and cultural touchstones. Granted, there are healthy and unhealthy forms of this.
I'm sure there's potential for things to get inappropriate and for inappropriate power dynamics to occur when it comes to ambiguous overlap between professional contexts, personal relationships, and shared social circles. At their best though, social communities can provide people a lot of value and support.
Why is "EA clubbing" a bad thing?
I think the money goes a lot further when it comes to helping non human animals then when it comes to helping humans.
I am generally pretty bought into the idea that non human animals also experience pleasure/suffering and I care about helping them.
I think it is probably good for the long term trajectory of society to have better norms around the casual cruelty and torture inflicted on non-human animals.
On the other hand, I do think there are really good arguments for human to human compassion and the elimination of extreme poverty. I am very in favor of that sort of thing too. GiveDirectly in particular is one of my favorite charities just because of the simplicity, compassion, and unpretentiousness of the approach.
Animal welfare wins my vote not because I disfavor human to human welfare, but just because I think that the same amount of resources can go a lot further in helping my non human friends.
I don't really understand this stance, could you explain what you mean here?
Like Sammy points out with the Hitler example, it seems kind of obviously counterproductive/negative to "save a human who was then going to go torture and kill a lot of other humans".
Would you disagree with that? Or is the pluralism you are suggesting here specifically between viewpoints that suggest animal suffering matters and viewpoints that don't think it matters?
As I understand worldview diversification stances, the idea is something like: if you are uncertain about whether animal welfare matters, then you can take a portfolio approach where with some fraction of resources, you try to increase human welfare at the cost of animals and with a different fraction of resources you try to increase animal welfare. The hope being that this nets out to positive in "world's where non-human animals matter" and "world's where only humans matter".
Are you suggesting something like that or is there a deeper rule against "not concluding that the effects of other people's lives are net negative" when considering the second order effects of whether to save them that you are pointing to?
Note that the cost-effectiveness of epidemic/pandemic preparedness I got of 0.00236 DALY/$ is still quite high.
Point well-taken.
I appreciate you writing and sharing those posts trying to model and quantify the impact of x-risk work and question the common arguments given for astronomical EV.
I hope to take a look at those more in depth some time and critically assess what I think about them. Honestly, I am very intrigued by engaging with well informed disagreement around the astronomical EV of x-risk focused approaches. I find your perspective here interesting and I think engaging with it might sharpen my own understanding.
:)
Interesting! This is a very surprising result to me because I am mostly used to hearing about how cost effective pandemic prevention is and this estimate seems to disagree with that.
Shouldn't this be a relatively major point against prioritizing biorisk as a cause area? (at least w/o taking into account strong long termism and the moral catastrophe of extinction)
Fictional Characters:
I would say I agree that fictional characters aren't moral patients. That's because I don't think the suffering/pleasure of fictional characters is actually experienced by anyone.
I take your point that you don't think that the suffering/pleasure portrayed by LLMs is actually experienced by anyone either.
I am not sure how deep I really think the analogy is between what the LLM is doing and what human actors or authors are doing when they portray a character. But I can see some analogy and I think it provides a reasonable intuition pump for times when humans can say stuff like "I'm suffering" without it actually reflecting anything of moral concern.
Trivial Changes to Deepnets:
I am not sure how to evaluate your claim that only trivial changes to the NN are needed to have it negate itself. My sense is that this would probably require more extensive retraining if you really wanted to get it to never role-play that it was suffering under any circumstances. This seems at least as hard as other RLHF "guardrails" tasks unless the approach was particularly fragile/hacky.
Also, I'm just not sure I have super strong intuitions about that mattering a lot because it seems very plausible that just by "shifting a trivial mass of chemicals around" or "rearranging a trivial mass of neurons" somebody could significantly impact the valence of my own experience. I'm just saying, the right small changes to my brain can be very impactful to my mind.
My Remaining Uncertainty:
I would say I broadly agree with the general notion that the text output by LLMs probably doesn't correspond to an underlying mind with anything like the sorts of mental states that I would expect to see in a human mind that was "outputting the same text".
That said, I think I am less confident in that idea than you and I maybe don't find the same arguments/intuitions pumps as compelling. I think your take is reasonable and all, I just have a lot of general uncertainty about this sort of thing.
Part of that is just that I think it would be brash of me in general to not at least entertain the idea of moral worth when it comes to these strange masses of "brain-tissue inspired computational stuff" which are totally capable of all sorts of intelligent tasks. Like, my prior on such things being in some sense sentient or morally valuable is far from 0 to begin with just because that really seems like the sort of thing that would be a plausible candidate for moral worth in my ontology.
And also I just don't feel confident at all in my own understanding of how phenomenal consciousness arises / what the hell it even is. Especially with these novel sorts of computational pseudo-brains.
So, idk, I do tend to agree that the text outputs shouldn't just be taken at face value or treated as equivalent in nature to human speech, but I am not really confident that there is "nothing going on" inside the big deepnets.
There are other competing factors at this meta-uncertainty level. Maybe I'm too easily impressed by regurgitated human text. I think there are strong social / conformity reasons to be dismissive of the idea that they're conscious. etc.
Usefulness as Moral Patients:
I am more willing to agree with your point that they can't be "usefully" moral patients. Perhaps you are right about the "role-playing" thing and whatever mind might exist in GPT, produces the text stream more as a byproduct of whatever it is concerned about than as a "true monologue about itself". Perhaps the relationship it has to its text outputs is analogous to the relationship an actor has to a character they are playing at some deep level. I don't personally find "simulators" analogy compelling enough to really think this, but I permit the possibility.
We are so ignorant about nature of a GPTs' minds that perhaps there is not much that we can really even say about what sorts of things would be "good" or "bad" with respect to them. And all of our uncertainty about whether/what they are experiencing, almost certainly makes them less useful as moral patients on the margin.
I don't intuitively feel great about a world full of nothing, but servers constantly prompting GPTs with "you are having fun, you feel great" just to have them output "yay" all the time. Still, I would probably rather have that sort of world than an empty universe. And if someone told me they were building a data center where they would explicitly retrain and prompt LLMs to exhibit suffering-like behavior/text outputs all the time, I would be against that.
But I can certainly imagine worlds in which these sorts of things wouldn't really correspond to valenced experience at all. Maybe the relationship between a NN's stream of text and any hypothetical mental processes going on inside them is so opaque and non-human that we could not easily influence the mental processes in ways that we would consider good.
LLMs Might Do Pretty Mind-Like Stuff:
On the object level, I think one of the main lines of reasoning that makes me hesitant to more enthusiastically agree that the text outputs of LLMs do not correspond to any mind is my general uncertainty about what kinds of computation are actually producing those text outputs and my uncertainty about what kinds of things produce mental states.
For one thing, it feels very plausible to me that a "next token predictor" IS all you would need to get a mind that can experience something. Prediction is a perfectly respectable kind of thing for a mind to do. Predictive power is pretty much the basis of how we judge which theories are true scientifically. Also, plausibly it's a lot of what our brains are actually doing and thus potentially pretty core to how our minds are generated (cf. predictive coding).
The fact that modern NNs are "mere next token predictors" on some level doesn't give me clear intuitions that I should rule out the possibility of interesting mental processes being involved.
Plus, I really don't think we have a very good mechanistic understanding of what sorts of "techniques" the models are actually using to be so damn good at predicting. Plausibly non of the algorithms being implemented or "things happening" are of any similarity to the mental processes I know and love, but plausibly there is a lot of "mind-like" stuff going on. Certainly brains have offered design inspiration, so perhaps our default guess should be that "mind-stuff" is relatively likely to emerge.
Can Machines Think:
The Imitation Game proposed by Turing attempts to provide a more rigorous framing for the question of whether machines can "think".
I find it a particularly moving thought experiment if I imagine that the machine is trying to imitate a specific loved one of mine.
If there was a machine that could nail the exact I/O patterns that my girlfriend, then I would be inclined to say that whatever sort of information processing occurs in my girlfriend's brain to create her language capacity must also be happening in the machine somewhere.
I would also say that if all of my girlfriend's language capacity were being computed somewhere, then it is reasonably likely that whatever sorts of mental stuff goes on that generates her experience of the world would also be occurring.
I would still consider this true without having a deep conceptual understanding of how those computations were performed. I'm sure I could even look at how they were performed and not find it obvious in what sense they could possibly lead to phenomenal experience. After all, that is pretty much my current epistemic state in regards to the brain, so I really shouldn't expect reality to "hand it to me on a platter".
If there was a machine that could imitate a plausible human mind in the same way, should I not think that it is perhaps simulating a plausible human in some way? Or perhaps using some combination of more expensive "brain/mind-like" computations in conjunction with lazier linguistic heuristics?
I guess I'm saying that there are probably good philosophical reasons for having a null hypothesis in which a system which is largely indistinguishable from a human mind should be treated as though it is doing computations equivalent to a human mind. That's the pretty much same thing as saying it is "simulating" a human mind. And that very much feels like the sort of thing that might cause consciousness.
I appreciate you taking the time to write out this viewpoint. I have had vaguely similar thoughts in this vein. Tying it into Janus's simulators and the stochastic parrot view of LLMs was helpful. I would intuitively suspect that many people would have an objection similar to this, so thanks for voicing it.
If I am understanding and summarizing your position correctly, it is roughly that:
The text output by LLMs is not reflective of the state of any internal mind in a way that mirrors how human language typically reflects the speaker's mind. You believe this is implied by the fact that the LLM cannot be effectively modeled as a coherent individuals with consistent opinions; there is not actually a single "AI assistant" under Claude's hood. Instead, the LLM itself is a difficult to comprehend "shoggoth" system and that system sometimes falls into narrative patterns in the course of next token prediction which cause it to produce text in which characters/"masks" are portrayed. Because the characters being portrayed are only patterns that the next token predictor follows in order to predict next tokens, it doesn't seem plausible to model them as reflecting an underlying mind. They are merely "images of people" or something; like a literary character or one portrayed by an actor. Thus, even if one of the "masks" says something about it's preferences or experiences, this probably doesn't correspond to the internal states of any real, extant mind in the way that we would normally expect to be true when humans talk about their preferences or experiences.
Is that a fair summation/reword?
Adjacent to this point about how we could improve EA communication, I think it would be cool to have a post that explores how we might effectively use, like, Mastodon or some other method of dynamic, self-governed federation to get around this issue. I think this issue goes well beyond just the EA forum in some ways lol.
Good suggestion! Happy Ramadan! <3
I have a few questions about the space of EA communities.
You mention
as in scope. I am curious what existing examples you have of communities that place emphasis on these values aside from the core "EA" brand?
I know that GWWC kind of exists as it's own community independent of "EA" to ~some extent, but honestly I am unclear to what extent. Also, I guess LessWrong and the broader rationality-cinematic-universe might kind of fit here too, but realistically whenever scope sensitive altruism is the topic of discussion on LessWrong an EA Forum cross-post is likely. Are there any big "impartial, scope-sensitive and ambitious altruism" communities I am missing? I know there are several non-profits independently working on charity evaluation and that sort of thing, but I am not very aware of distinct "communities" per say?
Some of my motivation for asking is that I actually think there is a lot of potential when it comes to EA-esque communities that aren't actually officially "EA" or "EA Groups". In particular, I am personally interested in the idea of local EA-esque community groups with a more proactive focus on fellowship, loving community, social kindness/fraternity, and providing people a context for profound/meaningful experiences. Still championing many EA-values (scope-sensitivity, broad moral circles, proactive ethics) and EA tools (effective giving, research oriented, and ethics-driven careers), but in the context of a group which is a shade or two more like churches, humanist associations, and the Sunday Assembly and a shade or two less like Rotary Clubs or professional groups.
That's just one idea, but I'm really trying to ask about the broader status of EA-diaspora communities / non-canonically "EA" community groups under EAIF? I would like to more clearly understand what the canonical "stewards of the EA brand" in CEA and the EAIF have in mind for the future of EA groups and the movement as a whole? What does success look like here; what are these groups trying to be / blossom into? And to the extent that my personal vision for "the future of EA" is different, is a clear-break / diaspora the way to go?
Thanks!