Actually, I'm wondering why the paying tax branch is attached to the intent alignment and not to the root where "making AI go well" is. The alignment tax is the difference between aligned AI and competent AI but the aligned AI here is in the sense of good outcomes not in the sense of AI tries to do what we want because it seems to include the robustness, reliability and so on, right? I mean agreements, coordination and so on, which are under the pay alignment tax, care about that AI is actually robust, reliable, i.e. it won't, for example, insert a backdoor in a code generated by AI assistant.
There is nice post about this at 80k: https://80000hours.org/career-reviews/information-security/
It took me full time, 40+ hours per week, in ARENA v2
That's a great resource to navigate my self study in ML! Thank you for compiling this list.
I wonder if a pull request to some popular machine learning library or tool counts as a step towards AI Safety Research. Say, a PR implements some safety feature for PyTorch, e.g. in interpretability, adversarial training, or other. Would it go to Level 6 as it is reimplementing papers? Making PR is, arguable, takes more efforts than just reimplementating a paper as it needs to fit into a tool.
Thanks for publishing this and your research! Few discussion points:
1. > We suggest trying to achieve safety through evolution, rather than only trying to arrive at safety through intelligent design.
But how to evolve unaligned AGI into the one that is deployed in the real world and aligned? It seems unlikely that we can align such a system without the real world environment. And once it is in the real world, it is likely result in goal and/or capabilities mis-generalizations. As an example, how can we be sure that once a CEO system deployed, it won’t disempower stakeholders with the aim it is not shut down and continues optimize for its goals. I mean we can’t emulate everything that in the future of such a company run by AI CEO.
2. > Counteracting forces
So we then have this another system of several AIs that watch each other, offence and defence is balanced. But that is yet another AI system to align, right? Hence, all arguments hold for this system. How do we align it with human values? How do we make sure it indeed pursues goals that were given to it at the time of distribution shift? Not to forget, this is only one bet we make, only one try. Real-world example is government and business, both created by human societies, yet we see cases where they become misaligned.
3. > Avoid capabilities externalities
(Repeated)It is unclear how can we apply it in our current competitive environment (Google vs Facebook, China vs USA). What concretely should be the incentives or policies to adopt a safety culture? And who enforces them? If one adopts it, another will get a competitive advantage as they will spend more on capabilities and then ‘kill you’ (Yudkowsky, AGI Ruin).
4. > Pursue tail impacts,
How does it work with avoiding capabilities externalities? If we make less capable systems, then it will have less impact, right? Won’t some another reckless AI team driving the research to the edge gather all fruits?
5. > For example, it is well-known that moral virtues are distinct from intellectual virtues. An agent that is knowledgeable, inquisitive, quick-witted, and rigorous is not necessarily honest, just, power-averse, or kind.
Why is that true? I mean, I agree it is well-known that, for humans, moral virtues don’t come with intellectual ones. But is it necessary always true for all agents?
1. It is unclear how can we apply the safety culture in our current highly competitive environment (Google vs Facebook, China vs USA). What concretely should be the incentives or policies to adopt a safety culture? And who enforces them? If one adopts it, another will get a competitive advantage as they will spend more on capabilities and then 'kill you' (Yudkowsky, AGI Ruin).
2. Extremely high stakes, i.e. x-risk. While systems theory was developed for dangerous, mission-critical systems, it didn’t deal with those systems that might disempower all humanity forever. We don’t have a second try. So no use of systems theory? It should be an iterative process, but misaligned AI would kill us in a first wrong try?
3. Systems Theory developed for systems built by humans for humans. And humans have a certain limited intelligence level. Why is it true that it is likely that Systems Theory is applicable for a intelligence above human one?
4. Systems Theory implies the control of a better entity on a worse entity: government issues policies to control organizations, AI lab stops researches on a dangerous path, electrician complies with instructions, etc. Now, isn’t AGI a better entity to give control to? Does it imply the humanity's dis-empowerment? Particularly, when we introduce a moral parliament (discussed in PAIS #4) won’t it mean that now this parliament is in power, not humanity?
Location: Istanbul, Turkey (previously Saint-Petersburg, Russia)Remote: YesWilling to relocate: YesSkills: Machine learning (MLSS, online course, self-study) Software Developement (15+ years professionally): C#, C, Python, SQL, Azure, JS/HTML/etc., Algorithms, Design, etc.Résumé/CV/LinkedIn: linkedin.com/in/artkpv/Email: artyomkarpov at gmail dot comNotes: Interested in AI Safety, GPR, cybersecurity, earning to give. Participated in EA Intro group. Took a career couching from 80K hours.
Thanks Holden! I found this article useful because it shows a way we can organizing learning process in our information abundant world where I frequently find myself in the self deceptive state of having some knowledge just because I read something. But I think it is important understand that knowledge is not a feeling but some recall or correct opinion we can describe to ourselves or others. I found strange though that it seems to me you contrast such a learning to just reading without writing. I think that is already a common sense that passive reading is not learning.
Probably we can add here the details on how to make such a writing. A great guide for this is Zettelkasten system for note taking as used by Niklas Luhmann. It points out that we should have separate permanent and literature notes among other things. And it also makes the point that we should organize our reading around writing specifically for science papers or articles, use our own words, etc.
Here is a quickly compiled PDF of this book. This is just html to pdf. I've added extra articles along with essential ones. The order should be correct as well as images. https://disk.yandex.ru/d/n_WlfyTutyj7QA