Another benefit of our product-driven approach is that we aim to provide a positive contribution to the alignment community. By which I mean:
Thanks to amazing prior work in straight alignment research, we already have some idea of anti-patterns and risks that we all want to avoid. What we're still lacking are safety attractors: i.e. alternative approaches which are competitive with and safer than the current paradigm.
We want for Elicit to be an existence proof that there is a better way to solve certain complex tasks, and for our approach to go on to be adopted by others – because it's in their self-interest, not because it's safe.
In a research assistant setting, you could imagine the top-level task being something like "Was this a double-blind study?", which we might factor out as:
In this example, by the time we get to the "Does this paragraph state there was a placebo?" level, a submodel is given a fairly tractable question-answering task over a given paragraph. A typical response for this example might be a confidence level and text spans pointing to the most relevant phrases.
Great question! Yes, this is definitely on our minds as a potential harm of Elicit.
Of the people who end up with one-sided evidence right now, we can probably form two loose groups:
For the first group – the accidental ones – we’re aiming to make good reasoning as easy (and ideally easier than) finding one-sided evidence. Work we’ve done so far:
For the second group – the intentional ones – we expect that Elicit might have a slight advantage right now over alternative tools, but longer-term probably won’t be more useful than other search tools that use language models with retrieval (e.g. this chatbot). And the better Elicit is, and the better other tools that care about good epistemics are, the easier it will be to reveal misleading arguments by this second group.