GC

Guillaume Corlouer

136 karmaJoined

Comments
10

I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.

Yeah, just to clarify, CB is not necessarily better than other interventions. However, CB with low backfire risks could be promising. This does not necessarily mean doing community building, since community building could backfire depending on how it is done (for example maybe if it is done in a very expansive non-careful way it could more easily backfire). I think the PauseAI example that you gave is a good example of potentially non robust intervention, or at least I would not count it as a low backfire risk capacity building intervention.

One of the motivation of CB would be to put ourselves in a better position to pursue some intervention if we end up less clueless. It might be that we don't in fact end up less clueless, and that while we have done CB, there are still no robust interventions that we can pursue after some time. In that case, it would be better to pursue determinately good short-term interventions even after doing CB (but then we have to pay the opportunity cost of the resources spent doing CB rather than doing the interventions good in the short term directly).

I am still uncertain about low backfire CB interventions (that are better than doing something good directly), perhaps some way of increasing capital or well targeted community building could be good examples, but it seems like an open question to me.

Second, the kind of mind required to operate as an intelligent agent in the real world likely demands sophisticated cognitive abilities for perception and long-term planning—abilities that appear sufficient to give rise to many morally relevant forms of consciousness.

A problem is that it is quite possible that sophisticated cognitivie abilities are present without any conscious experience being present. Some AIs might be some kind of p-zombies, and without a working theory of consciousness it is not possible to know at this point. 

If AIs are some kind of p-zombies, then it could be a moral mistake to give them moral value, as preferences (without consciousness) might not matter intrinsically, whereas there is a more intuitive case for conscious pleasant/unpleasant experience mattering in themselves.

I would be curious about the following question: given our uncertainty about consciousness in AIs, what should we do so that things are robustly good? It's not clear that giving AIs more autonomy is robustly good: perhaps this increases the chance of disempowerment (peaceful or violent as you say) and if AI have no moral value because they are not conscious, granting them autonomy could result in pretty bad outcomes. 

Thanks for writing this post! It is important to think about the implications of cluelessness and moral uncertainty for AI safety. To clarify the value of working on AI safety, it helps to decompose the problem into two subquestions:

  1. Is the outcome that we are aiming for robust to cluelessness, and moral uncertainty?
  2. Do we know of an intervention that is robustly good to achieve that outcome? (i.e. an intervention that is at least better than doing nothing to achieve that outcome)

An outcome could be reducing X-risks from AI which could at least happen in two different ways: value lock-in from a human-aligned AI or from a non-human AI controlled future. Reducing value lock-in seems robustly good, and i won't argue for that here. 

If the outcome we are thinking about is reducing extinction from AI, then the near-term case from reducing extinction from AI seems more robust to cluelessness, and I feel that the post could have emphasised it a bit more. Indeed, reducing the risk of extinction from AI, for all the people alive today and in the next few generations, looks good from a range of moral perspectives (it is at least determinate good for humans) even though it is indeterminate in the long-term. But then, one has to compare short term AI X-risks with other interventions that makes things determinately good in the short term from an impartial perspective, like working on animal welfare, or reducing extreme poverty.

AI seems high stakes, even though we don't know which way it will go in the short/long term, which might suggest focusing more on capacity building instead of a more direct intervention (I would put capacity building in some of the paths that you suggested, as a more general category to put careers like earning to give). This could hold as long as the capacity building (for ex. putting ourselves in a position to make things go well w.r.t AI when we have more information) has low risk of backfiring (don't make things worse) that is.

If we grant that reducing X-risks from AI seems robustly good, and better than alternative short-term causes (which is a higher bar than ``better than doing nothing''), then we still need to figure out interventions that are robust to reduce X-risks from AI (i.e. so that we don't make things worse). I already mentioned some non-backfire capacity building (if we find such kind of capacity building). Beyond capacity building; it's not completely clear to me that there are robustly good interventions in AI safety, and I think more work is needed to prioritize interventions.

It seems useful to think of one's career as being part of a portfolio, and work on things where one could plausibly be in a position to do excellent work, unless the intervention that one is working on is not determinately better than doing nothing. 

Yes. To reduce that risk we could aim for an international agreement on banning high-risk AI capability research but might not be satisfying. I have the impression that very few people (if any) are working on that flavor of regulations and could be useful to explore it more. Ideally, if we could simply coordinate to not produce direct work on producing generally capable AI until we figure out  safety it could be an important win.

Regulating AI consciousness.

Artificial intelligence,  Values and reflective process

The probability that AIs will be capable of conscious processing in the incoming decades is not negligible. With the right information dynamics, some artificial cognitive architecture could support conscious experiences. The global neural workspace is an example of a leading theory of consciousness compatible with this view. Furthermore, if it turns out that conscious processing improves learning efficiency then building AI capable of consciousness might become an effective path toward more generally capable AI. Building conscious AIs would have crucial ethical implications given their high expected population. To decrease the chance of bad moral outcomes we could follow two broad strategies. First, we could fund policy projects aiming to work with regulators to ban or slow down research that poses a substantial risk to building conscious AI. Regulations slowing the arrival of conscious AIs could be in place until we gain more moral clarity and a solid understanding of machine consciousness. For example, philosopher Thomas Metzinger advocated a moratorium on synthetic phenomenology in a previously published paper. Second, we need to fund more research in machine consciousness and philosophy of mind improving our understanding of synthetic phenomenology in AIs and their moral status. Note that machine consciousness is currently very neglected as  an academic field.

Funding AI policy proposals to slow down high-risk AI capability research.

AI alignment, AI policy

We want AI alignment research to catch up and surpass AI capability research. Among others, AI capability research requires a friendly political environment. We would be interested in funding AI policy proposals that would increase the chance of obtaining effective regulations slowing down highly risky AI capability R&D. For example, some regulations could impose large language models to pass a thorough safety audit before deployment or scaling in parameters above determined safety thresholds. Another example would be funding AI policy projects increasing the chance of banning research aiming to build generally capable AI before solving the AI alignment problem. Such regulations would probably need to be implemented on a national and international scale to be effective.

Making AI alignment research among the most lucrative career path in the world.

AI alignment

Having the most productive researchers in AI alignment would increase our chances to develop competitive aligned models and agents. As of now, the most lucrative careers tend to be in top AI companies. They attract many bright graduate students and researchers. We want this to change and enable AI alignment research to become the most attractive career choice for excellent junior and senior engineers and researchers. We are willing to fund AI alignment workers with wages higher than top AI companies' standards. For example, wages could start around 250k$/year and grow with productivity and experience.

Funding the AI alignment institute, a Manhattan project scale for AI alignment.

Artificial intelligence

Aligning AI with human interests could be very hard. The current growth in AI alignment research might be insufficient to align AI. To speed up alignment research, we want to fund an ambitious institute attracting hundreds to thousands of researchers and engineers to work full-time on aligning AI. The institute would enable these researchers to work with computing resources competitive with top AI industries. We could also slow down risky AI capability research by offering top AI capability researchers competitive wages and autonomy, draining them from top AI organizations. While small specialized teams would pursue innovative alignment research, the institute would enhance their collaboration, bridging AI alignment theory, experiment, and policy. The institute could also offer alignment fellowships optimized to speed up the onboarding of bright young students in alignment research. For example, we would fund stipends and mentorships competitive with doctoral programs or entry-level jobs in the industry. The institute would be located in a place safe from global catastrophic risks and facilitate access to high-quality healthcare, food, housing, transportation to optimize researchers well being and productivity.

The definition of existential risk as ‘humanity losing its long term potential’ in Toby Ord precipice could be specified further. Without (perhaps) loss of generality, assuming finite total value in our universe, one could specify existential risks into two broad categories of risks such as:

  • Extinction risks (X-risks): Human share of total value goes to zero. Examples could be extinction from pandemics, extreme climate change or some natural event.
  • Agential risks (A-risks): Human share of total value could be  greater than in the X-risks scenarios but keeps being strictly dominated by the share of total value holded by misaligned agents. Examples could be misaligned institutions, AIs or loud aliens controlling most of the value in the universe and with whom there would be  little gain from trade to be hoped for.

I am a 3rd year PhD student in consciousness neuroscience. After studying 3 years in this field I tend to think that better understanding consciousness looks less important than standard EA causes areas. 

Understanding consciousness is probably not very neglected. Indeed, although the field of consciousness science is relatively young and probably still small relatively to other academic fields, it is a growing academic field with established lab teams such as the Sackler center for consciousness science, the tlab, Stan Dehaene lab, Giulio Tononi and more. Consciousness is a fascinating problem that attract many intellectuals. There is an annual conference on the science of consciousness organised every year that probably gather hundreds of academics https://assc24.forms-wizard.co.il/ (unsure about the number of participants)

Although I appreciate the enthusiasm of QRI and the original ideas they discuss, I am  personally concerned by the potential general lack of scientific rigor that might be induced by the structure of QRI, but I would need to engage more with QRI content. Consciousness (noted C) is a difficult problem that quite likely requires collaboration between a good amount of academics with solid norms of scientific rigor (i.e. doing better than the current replication crisis).

In terms of importance of the cause, it is plausible that there is a lot of variation in architecture and phenomenology of conscious processing and so it is unclear how easily results in current, mostly human-centric, consciousness science would transfer to other species or AIs. On the other hand this suggests that understanding consciousness in specific species might be more neglected (but maybe having reliable behavioral markers of C might already go a long way to understand moral patienthood). In any case I have a difficult time making the case for why understanding consciousness is a particularly important problem relative to other standard EA causes.

Some potential interest to further specify that could potentially make the case for studying consciousness more:

  • C might be necessary for general intelligence then better understanding C might help us to better understand general AI and suggest interesting new directions for AI safety.
  • Building conscious AI (in the form of brain emulations or other architectures) could possibly help us create a large amount of valuable artificial beings. Wildely speculative indulgence: being able to simulate humans and their descendents could be a great way to make the human species more robust to most existing existential risks (if it is easy to create artificial humans that can live in simulations then humanity could becomes much more resilient)

Overall I am quite skeptical that on the margin consciousness science is the best field for an undergrad in informatics  compare to AI safety or other priority cause areas.

Load more