Second, the kind of mind required to operate as an intelligent agent in the real world likely demands sophisticated cognitive abilities for perception and long-term planning—abilities that appear sufficient to give rise to many morally relevant forms of consciousness.
A problem is that it is quite possible that sophisticated cognitivie abilities are present without any conscious experience being present. Some AIs might be some kind of p-zombies, and without a working theory of consciousness it is not possible to know at this point.
If AIs are some kind of p-zombies, then it could be a moral mistake to give them moral value, as preferences (without consciousness) might not matter intrinsically, whereas there is a more intuitive case for conscious pleasant/unpleasant experience mattering in themselves.
I would be curious about the following question: given our uncertainty about consciousness in AIs, what should we do so that things are robustly good? It's not clear that giving AIs more autonomy is robustly good: perhaps this increases the chance of disempowerment (peaceful or violent as you say) and if AI have no moral value because they are not conscious, granting them autonomy could result in pretty bad outcomes.
Thanks for writing this post! It is important to think about the implications of cluelessness and moral uncertainty for AI safety. To clarify the value of working on AI safety, it helps to decompose the problem into two subquestions:
An outcome could be reducing X-risks from AI which could at least happen in two different ways: value lock-in from a human-aligned AI or from a non-human AI controlled future. Reducing value lock-in seems robustly good, and i won't argue for that here.
If the outcome we are thinking about is reducing extinction from AI, then the near-term case from reducing extinction from AI seems more robust to cluelessness, and I feel that the post could have emphasised it a bit more. Indeed, reducing the risk of extinction from AI, for all the people alive today and in the next few generations, looks good from a range of moral perspectives (it is at least determinate good for humans) even though it is indeterminate in the long-term. But then, one has to compare short term AI X-risks with other interventions that makes things determinately good in the short term from an impartial perspective, like working on animal welfare, or reducing extreme poverty.
AI seems high stakes, even though we don't know which way it will go in the short/long term, which might suggest focusing more on capacity building instead of a more direct intervention (I would put capacity building in some of the paths that you suggested, as a more general category to put careers like earning to give). This could hold as long as the capacity building (for ex. putting ourselves in a position to make things go well w.r.t AI when we have more information) has low risk of backfiring (don't make things worse) that is.
If we grant that reducing X-risks from AI seems robustly good, and better than alternative short-term causes (which is a higher bar than ``better than doing nothing''), then we still need to figure out interventions that are robust to reduce X-risks from AI (i.e. so that we don't make things worse). I already mentioned some non-backfire capacity building (if we find such kind of capacity building). Beyond capacity building; it's not completely clear to me that there are robustly good interventions in AI safety, and I think more work is needed to prioritize interventions.
It seems useful to think of one's career as being part of a portfolio, and work on things where one could plausibly be in a position to do excellent work, unless the intervention that one is working on is not determinately better than doing nothing.
Yes. To reduce that risk we could aim for an international agreement on banning high-risk AI capability research but might not be satisfying. I have the impression that very few people (if any) are working on that flavor of regulations and could be useful to explore it more. Ideally, if we could simply coordinate to not produce direct work on producing generally capable AI until we figure out safety it could be an important win.
Artificial intelligence, Values and reflective process
The probability that AIs will be capable of conscious processing in the incoming decades is not negligible. With the right information dynamics, some artificial cognitive architecture could support conscious experiences. The global neural workspace is an example of a leading theory of consciousness compatible with this view. Furthermore, if it turns out that conscious processing improves learning efficiency then building AI capable of consciousness might become an effective path toward more generally capable AI. Building conscious AIs would have crucial ethical implications given their high expected population. To decrease the chance of bad moral outcomes we could follow two broad strategies. First, we could fund policy projects aiming to work with regulators to ban or slow down research that poses a substantial risk to building conscious AI. Regulations slowing the arrival of conscious AIs could be in place until we gain more moral clarity and a solid understanding of machine consciousness. For example, philosopher Thomas Metzinger advocated a moratorium on synthetic phenomenology in a previously published paper. Second, we need to fund more research in machine consciousness and philosophy of mind improving our understanding of synthetic phenomenology in AIs and their moral status. Note that machine consciousness is currently very neglected as an academic field.
AI alignment, AI policy
We want AI alignment research to catch up and surpass AI capability research. Among others, AI capability research requires a friendly political environment. We would be interested in funding AI policy proposals that would increase the chance of obtaining effective regulations slowing down highly risky AI capability R&D. For example, some regulations could impose large language models to pass a thorough safety audit before deployment or scaling in parameters above determined safety thresholds. Another example would be funding AI policy projects increasing the chance of banning research aiming to build generally capable AI before solving the AI alignment problem. Such regulations would probably need to be implemented on a national and international scale to be effective.
AI alignment
Having the most productive researchers in AI alignment would increase our chances to develop competitive aligned models and agents. As of now, the most lucrative careers tend to be in top AI companies. They attract many bright graduate students and researchers. We want this to change and enable AI alignment research to become the most attractive career choice for excellent junior and senior engineers and researchers. We are willing to fund AI alignment workers with wages higher than top AI companies' standards. For example, wages could start around 250k$/year and grow with productivity and experience.
Artificial intelligence
Aligning AI with human interests could be very hard. The current growth in AI alignment research might be insufficient to align AI. To speed up alignment research, we want to fund an ambitious institute attracting hundreds to thousands of researchers and engineers to work full-time on aligning AI. The institute would enable these researchers to work with computing resources competitive with top AI industries. We could also slow down risky AI capability research by offering top AI capability researchers competitive wages and autonomy, draining them from top AI organizations. While small specialized teams would pursue innovative alignment research, the institute would enhance their collaboration, bridging AI alignment theory, experiment, and policy. The institute could also offer alignment fellowships optimized to speed up the onboarding of bright young students in alignment research. For example, we would fund stipends and mentorships competitive with doctoral programs or entry-level jobs in the industry. The institute would be located in a place safe from global catastrophic risks and facilitate access to high-quality healthcare, food, housing, transportation to optimize researchers well being and productivity.
The definition of existential risk as ‘humanity losing its long term potential’ in Toby Ord precipice could be specified further. Without (perhaps) loss of generality, assuming finite total value in our universe, one could specify existential risks into two broad categories of risks such as:
I am a 3rd year PhD student in consciousness neuroscience. After studying 3 years in this field I tend to think that better understanding consciousness looks less important than standard EA causes areas.
Understanding consciousness is probably not very neglected. Indeed, although the field of consciousness science is relatively young and probably still small relatively to other academic fields, it is a growing academic field with established lab teams such as the Sackler center for consciousness science, the tlab, Stan Dehaene lab, Giulio Tononi and more. Consciousness is a fascinating problem that attract many intellectuals. There is an annual conference on the science of consciousness organised every year that probably gather hundreds of academics https://assc24.forms-wizard.co.il/ (unsure about the number of participants)
Although I appreciate the enthusiasm of QRI and the original ideas they discuss, I am personally concerned by the potential general lack of scientific rigor that might be induced by the structure of QRI, but I would need to engage more with QRI content. Consciousness (noted C) is a difficult problem that quite likely requires collaboration between a good amount of academics with solid norms of scientific rigor (i.e. doing better than the current replication crisis).
In terms of importance of the cause, it is plausible that there is a lot of variation in architecture and phenomenology of conscious processing and so it is unclear how easily results in current, mostly human-centric, consciousness science would transfer to other species or AIs. On the other hand this suggests that understanding consciousness in specific species might be more neglected (but maybe having reliable behavioral markers of C might already go a long way to understand moral patienthood). In any case I have a difficult time making the case for why understanding consciousness is a particularly important problem relative to other standard EA causes.
Some potential interest to further specify that could potentially make the case for studying consciousness more:
Overall I am quite skeptical that on the margin consciousness science is the best field for an undergrad in informatics compare to AI safety or other priority cause areas.
Yeah, just to clarify, CB is not necessarily better than other interventions. However, CB with low backfire risks could be promising. This does not necessarily mean doing community building, since community building could backfire depending on how it is done (for example maybe if it is done in a very expansive non-careful way it could more easily backfire). I think the PauseAI example that you gave is a good example of potentially non robust intervention, or at least I would not count it as a low backfire risk capacity building intervention.
One of the motivation of CB would be to put ourselves in a better position to pursue some intervention if we end up less clueless. It might be that we don't in fact end up less clueless, and that while we have done CB, there are still no robust interventions that we can pursue after some time. In that case, it would be better to pursue determinately good short-term interventions even after doing CB (but then we have to pay the opportunity cost of the resources spent doing CB rather than doing the interventions good in the short term directly).
I am still uncertain about low backfire CB interventions (that are better than doing something good directly), perhaps some way of increasing capital or well targeted community building could be good examples, but it seems like an open question to me.