Alignment tax - History

^{^}
Christiano, Paul (2020) Current work in AI alignment, Effective Altruism Global, April 3.
^{^}
For a summary, see ~~Shah,~~ Rohin Shah (2020) A framework for thinking about how to make AI go well, LessWrong, April 15.

Leo v1.4.0Jan 5th 2022 GMT (+230/-247) 1

Paul Christiano distinguishes two main approaches for dealing with the alignment ~~tax (Christiano 2020; for a summary, see Shah 2020).~~tax.^[1]^[2] One approach seeks to find ways to pay the tax, such as persuading individual actors to pay it or facilitating coordination of the sort that would allow groups to pay it. The other approach tries to reduce the tax, by differentially advancing existing alignable algorithms or by making existing algorithms more alignable.

BibliographyFurther reading

~~Christiano, Paul (2020)~~ ~~Current work in AI alignment~~, ~~Effective Altruism Global~~~~, April 3.~~

~~Shah, Rohin (2020)~~ ~~A framework for thinking about how to make AI go well~~, ~~LessWrong~~~~, April 15.~~

^{^}
Christiano, Paul (2020) Current work in AI alignment, Effective Altruism Global, April 3.
^{^}
For a summary, see Shah, Rohin (2020) A framework for thinking about how to make AI go well, LessWrong, April 15.

MichaelA🔸 v1.3.0Dec 30th 2021 GMT (+82) 2

AI alignment | AI forecasting | differential progress | governance of artificial intelligence

MichaelA🔸3y20

Here's my original proposal for this entry, in case this info is of use to someone:

"This is probably an even better fit for LW or the Alignment Forum, but they don't seem to have it. We could make a version here anyway, and then we could copy it there or someone from those sites could.

Here are some posts that have relevant content, from a very quick search:

https://www.effectivealtruism.org/articles/paul-christiano-current-work-in-ai-alignment/#:~:text=I%20like%20this%20notion%20of,%5Bthe%20systems%5D%20to%20do. (this seems to be the standard go-to at the moment, but it's a long post where this is only one part, and having an encylopedic rather than conversational style of writing could be helpful)
- Also on the Forum: https://forum.effectivealtruism.org/posts/63stBTw3WAW6k45dY/paul-christiano-current-work-in-ai-alignment
https://www.lesswrong.com/posts/9et86yPRk6RinJNt3/an-95-a-framework-for-thinking-about-how-to-make-ai-go-well
https://www.lesswrong.com/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety#Alignment_tax_and_alignable_algorithms
https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3#Alignment_Tax
https://www.lesswrong.com/posts/yhb5BNksWcESezp7p/poll-which-variables-are-most-strategically-relevant
https://www.lesswrong.com/posts/dktT3BiinsBZLw96h/linkpost-a-general-language-assistant-as-a-laboratory-for
https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3#Alignment_Tax
https://forum.effectivealtruism.org/posts/Ayu5im98u8FeMWoBZ/my-personal-cruxes-for-working-on-ai-safety

Related entries:

differential progress
AI alignment
AI forecasting
AI governance
Maybe Corporate governance, if that entry is made"

•

Applied to My personal cruxes for working on AI safety 3y ago

•

Applied to Paul Christiano: Current work in AI alignment 3y ago

Pablo v1.2.0Dec 30th 2021 GMT (-9) 2

An alignment tax (sometimes called safety tax) is the additional cost ~~incurred~~ of making AI aligned, relative to unaligned AI.

Pablo v1.1.0Dec 30th 2021 GMT (+2/-4) 2

An alignment tax (sometimes called safety tax) is the additional cost incurred ~~from~~of making AI aligned, relative to unaligned AI.

Pablo v1.0.0Dec 30th 2021 GMT (+1047) 4

An alignment tax (sometimes called safety tax) is the additional cost incurred from making AI aligned, relative to unaligned AI.

Approaches to the alignment tax

Paul Christiano distinguishes two main approaches for dealing with the alignment tax (Christiano 2020; for a summary, see Shah 2020). One approach seeks to find ways to pay the tax, such as persuading individual actors to pay it or facilitating coordination of the sort that would allow groups to pay it. The other approach tries to reduce the tax, by differentially advancing existing alignable algorithms or by making existing algorithms more alignable.

Bibliography

Askell, Amanda et al. (2021) A general language assistant as a laboratory for alignment, ArXiv:2112.00861 [Cs].

Christiano, Paul (2020) Current work in AI alignment, Effective Altruism Global, April 3.

Shah, Rohin (2020) A framework for thinking about how to make AI go well, LessWrong, April 15.

Xu, Mark & Carl Shulman (2021) Rogue AGI embodies valuable intellectual property, LessWrong, June 3.

Related entries

AI alignment

•

Created by Pablo at 3y