In a certain sense, an LLM's token embedding matrix is a machine ontology. Semantically similar tokens have similar embeddings in the latent space. However, different models may have learned different associations when their embedding matrix was trained. Every forward pass starts colored by ontological assumptions, an these may have alignment implications.For instance, we would presumably not want a model to operate within an ontology that associates the concept of AI with the concept of evil, particularly if it is then prompted to instantiate a simulacrum that believes it is an AI.Has someone looked into this? That is, the alignment implications of different token embedding matrices? I feel like it would involve calculating a lot of cosine similarities and doing some evals.
Intriguing. Looking forward to the live demo.
PSA: The form accepts a maximum of 10 files, that is, 5 design proposals maximum (because each proposal requires uploading both a .png and a .svg file).
Just for the sake of clarity: I think the word "schism" is inaccurate here because it carries false connotations of conflict.
Have you considered booking a call with 8000 hours career advising? They can help you analyse the factors behind your plans about your future career, and put you in contact with people working in the areas that interest you.
You could also contact CLR and CRS. If you show knowledge of and interest in their work, they may be eager to help. You can't be sure if you'll get a reply, and that may seem intimidating, but remember that the cost is minimal, EV is high, and how you feel about not getting a reply is at least partly under your control.
While this last point is not specifically focused on s-risks, a very cheap, very valuable, action you can take is subscribing to the AI Safety opportunities update emails at AI Safety Training. Many hackathons advertised there are beginner-friendly.
Side note: calling a world modelling disagreement implied by differences in cause prioritisation a "schism" is in my opinion unwarranted and (low-probability, very negative value) risks becoming a self-fulfilling prophecy.
A more pessimistic counterargument: Safely developing AGI is so hard as to be practically impossible. I do not believe this one, but some pessimistic sectors within AIS do. It combines well with the last counterargument you list (that the timelines where things turn out OK are all ones where we stop / radically slow down the development of AI capabilities). If you are confident that aligning AGI is for all practical purposes impossible, then you focus on preventing the creation of AGI and on improving the future of the timelines where AGI has been successfully avoided.
EDIT: Other commenters have pointed out reasons why the elimination of debt sold really cheap is unlikely to affect much the lives of recipients. Still, if the debt relieved did in fact significantly help the beneficiaries, it could turn out to be very effective. However, we won't know until RIP releases recipient outcomes data.
TL;DR: About as cost-effective as GiveWells's top charities, IF my assumption about outcomes is broadly right. $14.16 to provide debt relief to one person. If one assumes a lifespan increase of 0.2% (less than two months) as the effect (by preventing healthcare avoidance), it comes out to $7080 per death-equivalent-in-lifespan averted. I recommend looking further into it, particularly with respect to outcomes.
Hi Layla, welcome to the Forum! Thanks for posting!
This looks like an interesting opportunity. Within the cause area of health in the US, RIP seems to have chosen a big and tractable problem, and to be triaging their beneficiaries according to the relevant metrics.
Here is my attempt to have a rough idea about RIP's cost-effectiveness.
RIP claims that it has "helped 5,492,948 individuals and families" and has relieved $8,520,147,644 of medical debt. The average debt relieved per recipient is thus $8,520,147,644 / 5,492,948 = $1551. If, as you say, "every $100 donated clears $10,000 in medical debt", then the cost per recipient is $15.51 (!!!).
I was initially skeptical of this calculation, but it checks out. In its 2021 year end report, RIP says that it relieved debt to 1,312,697 people during the year, and in its 2021 financial statement declares total expenses of $18,587,272. So the cost per recipient is $18,587,272 / 1,312,697 = $14.16.
It's hard to estimate the benefit from medical debt reduction. Let's say, for the sake of simplicity, that the avoidance of medical treatment and mental health problems derived from struggling with medical debt make people live 0.2% shorter lives (1.92 months if starting out with an 80-year lifespan), and that the debt relief provided eliminates that effect. It follows that preventing 0.002 death-equivalents costs $14.16, and thus preventing one death-equivalent unit of lifespan reduction costs $7080. This is about as cost-effective as GiveWell's most recommended charities.
This would be huge if true. However, my priors advise me against getting too hopeful. It should be hard to find a charity about as cost-effective as GiveWell's top charities. RIP has been assessed by Charity Navigator, and does a fair bit of marketing. It would be weird if no EA had picked this up before. I have reason to believe that I am overestimating the positive effects of debt relief.
To find out whether RIP is really so effective, it would be great to have numbers on the welfare outcomes of debt relief. I found this report on RIP's site, which while a potentially useful qualitative source, makes no effort to quantify outcomes.
Interesting. I agree that second or third-order effects such that as the good done later by people you have helped are an important consideration. Maximising such effects could be an underexplored effective giving strategy, and this organization you refer to looks like a group of people trying to do that. However, to really assess an organization's effectiveness, epecially if it focuses in educational or social interventions, some empirical evidence is needed.
Having thought more about this, I suppose you can divide opinions into two clusters and be pointing at something real. That's because people's views on different aspects of the issue correlate, often in ways that make sense. For instance, people who think AGI will be achieved by scaling up current (or very similar to current) neural net architectures are more excited about practical alignment research on existing models.
However, such clusters would be quite broad. My main worry is that identifying two particular points as prototypical of them would narrow their range. People would tend to let their opinions drift closer to the point closest to them. This need not be caused by tribal dynamics. It could be something as simple as availability bias. This narrowing of the clusters would likely be harmful, because the AI safety field is quite new and we've still got exploring to do. Another risk is that we may become too focused on the line between the two points, neglecting other potentially more worthwhile axes of variation.
If I were to divide current opinions into two clusters, I think that Scott's two points would in fact fall in different clusters. They would probably even be not too far off their centers of mass. However, I strongly object to pretending the clusters are points, and then getting tribal about it. I think labeling clusters could be useful, if we made it clear that they are still clusters.
On the paths to understanding AI risk without accepting weird arguments, maybe getting people worried about ML unexplainability may be worthwhile to explore, though I suspect most people would think you were pointing to algorithmic bias and the like.