Hi there! I'm an EA from Madrid. I am currently finishing my Ph.D. in quantum algorithms and would like to focus my career on AI Safety. Send me a message if you think I can help :)
Separately and independently, I believe that by the time an AI has fully completed the transition to hard superintelligence, it will have ironed out a bunch of the wrinkles and will be oriented around a particular goal (at least behaviorally, cf. efficiency—though I would also guess that the mental architecture ultimately ends up cleanly-factored (albeit not in a way that creates a single point of failure, goalwise)).
I’d be curious to understand why you believe this happens. Humans (the only general intelligence we have so far) seems to preserve some uncertainty over goal distributions. So it is unclear to me that generality will necessarily clarify goals.
To be a bit more concrete: I find it plausible that the AGI will encounter possible fine grained (concrete) goals that map into the same high level representation of its goal, whatever it may be. Then you have to refine what the goal representation was meant to mean. After all, a representation of the goal is not the goal itself necessarily. I believe this is what humans face, and why human goals are often a small mess.
With respect to the last question I think it is perhaps a bit unfair. I think they have clearly stated they unconditionally condemn racism, and I have a strong prior that they mean it. Why wouldn’t they, after all?
But if we were to eliminate the EA community, an AI safety community would quickly replace it, as people are often attached to what they do. And this is even more likely if you add any moral connotation. People working at a charity, for example, are drawn to build an identity around it.
The HuggingFace RL course might be an alternative in the Deep Learning - RL discussion above: https://github.com/huggingface/deep-rl-class
Yeah, perhaps I was being too harsh. However, the baseline scenario should be that current trends will go on for some time, and they predict at least cheap batteries and increasingly cheaper H2.
I mostly focussed on these two because the current problem of green energy sources is more related to energy storage than production, photovoltaic is currently the cheapest in most places.
I think I quite disagree with this post because batteries are improving quite a lot, and if we are capable of also improving Hydrogen production and usage, things should work pretty well. Finally, nuclear fusion no longer seems so far away. Of course, I agree with the author that this transition will take quite a long time, especially in developing countries, but I expect this to work out well anyways. One key argument of the author is that we are limited in the amount of different metals available, but Li is very common on Earth, even if not super cheap, so I am not totally convinced by this. Similar thoughts apply to land usage.
In the Spanish community we often have conversations in English, and I think at least 80% of the members are comfortable with both.
The point 1 is correct, but there is a difference: when you research it's often needed to live near a research group. Distillation is more open to remote and asynchronous work.
You understood me correctly. To be specific I was considering the third case in which the agent has uncertainty about is preferred state of the world. It may thus refrain from taking irreversible actions that may have a small upside in one scenario (protonium water) but large negative value in the other (deuterium) due to eg decreasing returns, or if it thinks there’s a chance to get more information on what the objectives are supposed to mean.
I understand your point that this distinction may look arbitrary, but goals are not necessarily defined at the physical level, but rather over abstractions. For example, is a human with high level of dopamine happier? What is exactly a human? Can a larger human brain be happier? My belief is that since these objectives are built over (possibly changing) abstractions, it is unclear whether a single agent might iron out its goal. In fact, if “what the representation of the goal was meant to mean” makes reference to what some human wanted to represent, you’ll probably never have a clear cut unchanging goal.
Though I believe an important problem in this case is how to train an agent able to distinguish between the goal and its representation, and seek to optimise the former. I find it a bit confusing when I think about it.