I often see people talking past each other when discussing x-risks because the
definition[1] covers outcomes that are distinct in some worldviews. For some,
humanity failing to reach its full potential and humanity going extinct are
joint concerns, but for others they are separate outcomes. Is there a good
solution to this?
1. ^
"An existential risk is one that threatens the premature extinction of
Earth-originating intelligent life or the permanent and drastic destruction
of its potential for desirable future development." (source
[https://existential-risk.org/])
Is there a safe way to ask dumb or potentially infohazardous questions?
If you're not an expert in a field and are unsure whether your question could be
hazardous or not, where do you go?
Perhaps I am being too security minded, but I feel like typing questions into a
search engine or chatbot or publicly posting does not always seem like a good
idea.
I often sit on questions or ideas related to potential threats. I have no idea
whether or not my questions/ thoughts are really dumb or whether some might be
useful to existing efforts. I wonder if others experience this as well. Some of
my thoughts or questions have been in relation to AI, supply chains, computer
science, energy utilization, infrastructure (by location and by industry),
intersectional risk (e.g. climate, nuclear, +), blockchain, telecommunication,
hardware, natural resources, security (infosecurity, cybersecurity, hardware
security, etc.), electronics (big and small), public messaging/communication/PR.
I would really appreciate advice. If others have similar concerns, please upvote
or pipe in, as it's possible that this is a problem that gets in the way with
working on important problems and potentially could be turned into an
opportunity.
I got access to Bing Chat. It seems:
- It only searches through archived versions of websites (it doesn't retrieve
today's news articles, it accessed an older version of my Wikipedia user site)
- During archivation, it only downloads the content one can see without any
engagement with the website (tested on Reddit "see spoiler" buttons which reveal
new content in the code. It could retrieve info from posts that gained less
attention but weren't hidden behind the spoiler button)
I. e. it's still in a box of sorts, unless it's much more intelligent than it
pretends.
Edit: A recent ACX post [https://astralcodexten.substack.com/p/janus-simulators]
argues text-predicting oracles might be safer, as their ability to form goals is
super limited, but it provides 2 models how even they could be dangerous: By
simulating an agent or via a human who decides to take bad advice like "run the
paperclip maximizer code". Scott implies thinking it would spontaneously form
goals is extreme, linking a post by Veedrac
[https://www.alignmentforum.org/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth].
The best argument there seems to be: It only has memory equivalent to 10 human
seconds. I find this convincing for the current models but it also seems
limiting for the intelligence of these systems, so I'm afraid for future models,
the incentives are aligned with reducing this safety valve.
This topic covers discussions of risks which threaten the destruction of the long-term potential of life, including natural risks like supervolcanoes as well as human-made risks like nuclear war or advanced technologies.