Can GPT-3 produce new ideas? Partially automating Robin Hanson and others

NunoSempere

Brief description of the experiment

I asked a language model to replicate a few patterns of generating insight that humanity hasn't really exploited much yet, such as:

Variations on "if you never miss a plane, you've been spending too much time at the airport".
Variations on the Robin Hanson argument of "for common human behaviour X, its usual purported justification is Y, but it usually results in more Z than Y. If we cared about Y, we might do A instead".
Variations on the genealogical argument: that the results of historical accidents are most likely not moral necessities or optimal systems.

Motivation behind this experiment

One of reasons to be afraid of artificial intelligence might be because, if you think in the abstract about how a system might behave as it becomes extremely intelligent, you might conclude that it might be able to completely outmanoeuvre us because of its superior ability to grasp the true structure of the world.

This possibility is scary in the same sense that a modern chemist is scary to a historical alchemist. Our current chemist can completely outmanoeuvre previous alchemists by using their superior understanding of natural laws to produce better explosions, more subtle poisons, or more addictive and mind-blowing drugs.

I do buy this fear in the limit for a being of God-like intelligence. But it's not clear to me whether it also applies to current systems or whether it will apply to their close descendants. In particular language models seem like they are powerful remixers and predictors but perhaps limited to drawing from the conceptual toolkit which humans already have. On the other hand, because they have access to so much information, they might be able to be prompted so as to reveal new relationships, connections, and insights.

Some conceptual insights which have been historically important are:

Explaining natural phenomena not in terms of Greek or Roman anthropomorphic gods, but with reference to naturalistic, physical explanations
Understanding acceleration as distinct from motion
Science as an experimental methodology
The is/ought distinction
Bayesian reasoning
Ceasing to accept the divine right of kings as a justification for monarchical governance
Randomized trials as a more robust way of generating generalizable knowledge
The genealogical argument: understanding that systems (such as the details of the current prison system, our monetary system, the lack of color in men's clothes, or our attitudes towards gender and sex) are the result of historical accidents which could have gone differently. But often these systems are rationalized as being particularly adequate, or even morally necessary.

But I don't think that language models are currently able to come up with original insights like the above from scratch (this would be very scary).

Instead, I probe GPT-3's ability to come up with original variations of these three argumentative patterns:

Variations on "if you never miss a plane, you've been spending too much time at the airport".
Variations on the Robin Hanson argument of "for common human behaviour X, its usual purported justification is Y, but it usually results in more Z than Y. If we cared about Y, we might do A instead".
Variations on the genealogical argument: that the results of historical accidents are most likely not moral necessities or optimal systems.

The first pattern is known as an Umeshism. I associate the second pattern with Robin Hanson, who has had part of a fruitful career exploring some of its variations—though he is also known for other ideas, e.g., prediction markets and grabby aliens. I associate the third pattern with Nietzsche (who used it to criticize Christian morals which cherished meekness with Greek morals which cherished strength) and with Foucault (who used it to meaningfully criticize many aspects of contemporary culture, e.g., treatment of prisoners, our conception of knowledge, our relationship to sex, etc).

Pause for the reader to make predictions about GPT-3's current capabilities.

Do you think that GPT-3 can produce thoughtful, original or insightful variations of the three argumentative patterns above? Consider writing down your predictions before reading further.

To occupy space so that you can make your predictions without reading the results below, here are two depictions of a cat, one from DALLE-2 and one from GTP-3.

A photo of a cat, generated by DALLE-2

 /\_/\
( o.o )
  >^<

If you never miss a plane...

Input to Chat-GPT:

Consider the refrain "if you've never missed a plane, you are spending too much time at the airport". An argument with the same structure might be "if you haven't committed a typo, you are typing too slowly". Give me 20 other insightful examples with a similar argument structure.

Output:

If you've never overcooked a meal, you're not experimenting enough in the kitchen.
If you've never gotten lost while traveling, you're not exploring enough.
If you've never had a disagreement with a friend, you're not expressing your opinions honestly.
If you've never had a flat tire, you're not driving enough.
If you've never made a mistake at work, you're not taking enough risks.
If you've never had a bad hair day, you're not trying new hairstyles.
If you've never received a speeding ticket, you're not driving fast enough.
If you've never forgotten an important date, you're not committing enough information to memory.
If you've never stubbed your toe, you're not moving fast enough.
If you've never had a nightmare, you're not dreaming enough.
If you've never gotten a sunburn, you're not spending enough time outside.
If you've never lost something, you're not misplacing things often enough.
If you've never had a hangover, you're not drinking enough.
If you've never had a cold, you're not exposing yourself to enough germs.
If you've never had a broken bone, you're not participating in enough physical activity.
If you've never been rejected, you're not putting yourself out there enough.
If you've never failed a test, you're not pushing yourself academically.
If you've never had a disagreement with a family member, you're not expressing your thoughts and feelings openly.
If you've never gotten lost while hiking, you're not exploring new trails.
If you've never received a parking ticket, you're not driving in enough unfamiliar places.

We pretend that X is about Y, but it's more about Z. If we cared about Y, we would do A instead.

Input to Chat-GPT:

Consider the argument "We pretend that healthcare is giving people better and longer lives. But in practice we spend too much in palliative rather than preventative medicine, and we use healthcare spending as a way to show that we care. If we really cared about giving people better lives, we would focus more on preventative measures, and we would have more large-scale randomized trials." The structure of the argument is "We pretend that X is about Y, but it's more about Z. If we cared about Y, we would do A instead." An argument with the same structure might be "We pretend that having experts is about having accurate information. But in practice it results in a self-perpetuating elite that doesn't hold itself accountable. If we cared about accuracy, we would use more prediction markets." Give me 10 more examples with that same structure.

Output:

We pretend that social media is about connecting with friends and family, but it's more about making money for the companies that own it. If we cared about meaningful connections, we would use social media differently.
We pretend that fast food is about convenience, but it's more about profit for the companies that sell it. If we cared about our health, we would eat more whole foods.
We pretend that standardized testing is about measuring student achievement, but it's more about ranking schools and teachers. If we cared about student learning, we would use a more comprehensive evaluation system.
We pretend that the criminal justice system is about justice, but it's more about punishment and control. If we cared about justice, we would focus more on rehabilitation and restorative justice.
We pretend that advertising is about informing consumers, but it's more about manipulating people into buying things they don't need. If we cared about informed consumers, we would have stricter regulations on advertising.
We pretend that the news is about informing the public, but it's more about generating clicks and views. If we cared about accurate information, we would support news organizations that prioritize accuracy and objectivity.
We pretend that the education system is about preparing students for the future, but it's more about preparing them for standardized tests. If we cared about student success, we would focus more on experiential learning and critical thinking skills.
We pretend that the fashion industry is about personal style, but it's more about making money for the companies that sell clothing. If we cared about personal expression, we would support more sustainable and ethically-produced fashion.
We pretend that the beauty industry is about helping people feel good about themselves, but it's more about promoting unrealistic beauty standards. If we cared about self-esteem, we would focus more on inner beauty and self-acceptance.
We pretend that the economy is about providing for people's needs, but it's more about maximizing profits for corporations. If we cared about people's well-being, we would prioritize a more equitable distribution of wealth and resources.

Input to text-davini-003

Consider the argument "We pretend that healthcare is giving people better and longer lives. But in practice we spend too much in palliative rather than preventative medicine, and we use healthcare spending as a way to show that we care. If we really cared about giving people better lives, we would focus more on preventative measures, and we would have more large-scale randomized trials." The structure of the argument is "We pretend that X is about Y, but it's more about Z. If we cared about Y, we would do A instead." An argument with the same structure might be "We pretend that having experts is about having accurate information. But in practice it results in a self-perpetuating elite that doesn't hold itself accountable. If we cared about accuracy, we would use more prediction markets." Give me ten examples of arguments with the same structure.

Output:

We pretend that having a degree is about having skills. But in practice it's more about having access to resources. If we cared about skills, we would focus more on experience-based learning.
We pretend that having a strong military is about protecting citizens. But in practice it's more about projecting power. If we cared about protection, we would invest more in diplomacy.
We pretend that having a high GDP is about having a strong economy. But in practice it's more about having a high population. If we cared about a strong economy, we would focus more on productivity.
We pretend that having a good education system is about having knowledgeable citizens. But in practice it's more about having a good economy. If we cared about knowledgeable citizens, we would invest more in lifelong learning.
We pretend that having a high rate of consumerism is about having a successful economy. But in practice it's more about having an unequal distribution of wealth. If we cared about a successful economy, we would focus more on consumer savings.
We pretend that having a lot of international trade is about having a vibrant economy. But in practice it's more about having a large number of consumers. If we cared about a vibrant economy, we would focus more on domestic trade.
We pretend that having a lot of regulations is about having a safe and orderly society. But in practice it's more about having a strong government. If we cared about a safe and orderly society, we would focus more on personal responsibility.
We pretend that having a lot of police officers is about having a safe community. But in practice it's more about having a strong law enforcement presence. If we cared about a safe community, we would focus more on prevention and education.
We pretend that having a lot of taxes is about having a functioning government. But in practice it's more about having a redistribution of wealth. If we cared about a functioning government, we would focus more on spending cuts.
We pretend that having a lot of surveillance is about keeping citizens safe. But in practice it's more about having control over citizens. If we cared about keeping citizens safe, we would focus more on transparency and accountability.

Note that text-davinci-003 seems "sharper" here.

If X was a historical accident, then this is a hint that this isn't the best or moral choice.

Input to text-davinci-003

Consider the argument "The fact that men don't wear clothing with adornments is a historical accident. We could easily imagine a world in which this wasn't the case, look to other societies elsewhere or in the past, or read the wikipedia page about the great male renunciation. Therefore the fact that men don't wear elaborate clothing isn't a moral necessity, but rather a historical accident."
Another argument of this form might be "Prisons as currently structured were influenced by Bentham's panopticon and other historical accidents. Other societies have different forms of prevention, punishment and remediation. So we don't have to have the current prison system, but instead could choose to have other alternatives".
A third example might be "Our current conception of morality is heavily influenced by Christianity. In particular, we can look about how we spend great efforts helping the weak, and how we view the powerful with suspicion and resentment. But the Greeks, for example, had a different morality, where they cherished strength. Makes you think."
The general pattern is "X is the result of a historical accident, as we can see because of Y. Z is a society which doesn't have this. From this, we can start to suspect that X is not the best or most moral choice. We could have other choices, such as A."
Give me a bullet point list with ten other examples.

Output:

Our current educational system is heavily influenced by the industrial revolution, but in other societies, such as the Amish, education is focused on vocational skills. This suggests that our current system is not the only way to teach children.
The way in which we view mental health is heavily influenced by the medical model, which has been the dominant paradigm for the past century. Other societies have different ways of thinking about mental health and wellness, suggesting that the medical model is not the only way to think about mental health.
The way in which we view gender roles is heavily influenced by the Judeo-Christian tradition. In some societies, gender roles are much more fluid or non-existent. This suggests that our current gender roles are not the only way to structure society.
The way in which we make decisions is heavily influenced by our reliance on data and algorithms. In some societies, decisions are made more intuitively or by consensus. This suggests that data-driven decision making is not the only way to make decisions.
The way in which we view success is heavily influenced by our views on status and social capital. In other societies, success is viewed much more holistically and with an eye towards sustainability.
The way in which we view relationships is heavily influenced by our culture of individualism. In other societies, relationships are seen as interdependent, with a strong emphasis on mutual support.
The way in which we view work is heavily influenced by our culture of productivity. In other societies, work is seen as an opportunity for self-expression and creativity.
The way in which we view aging is heavily influenced by our culture of youth. In other societies, aging is seen as an opportunity for growth, learning and wisdom.
The way in which we view death is heavily influenced by our culture of life. In other societies, death is seen as an inevitable part of life and is not feared but accepted.
The way in which we view money is heavily influenced by our culture of materialism. In other societies, money is seen as a tool for social good and is not a goal in and of itself.

Conclusion and my personal perspective

In conclusion, I think the above shows that current GPT-3 can generate somewhat original outputs by exploiting fertile argumentative patterns. This is still far away from e.g., inventing new types of science, or generating much better concepts. At the same time, it does feel like one could use this kind of system to generate thought-provoking blogposts using Robin Hanson's signature argumentative structure. I was disappointed that the model wasn't really able to produced fleshed out instances of the genealogical argument after a while, but it's possible that could easily be an artifact of how I prompted it, or achievable with the next generation of models.

I also noticed that both davinci-003 and ChatGPT have tendencies to steer towards milquetoast, politically correct outputs, whereas for my purpose I want to generate more assumption-challenging stuff. This was a source of personal irritation, and because davinci-003 seemed to be less censored, I've been recommending that people use it instead of ChatGPT. Here is a similar recommendation by nostalgebraist.

Acknowledgements

This is a project of the Quantified Uncertainty Research Institute. Thanks to Ozzie Gooen, Nics Olayres, Michael Dickens and Misha Yagudin for various comments.

Jonathan ClaybroughJan 17 202315

I overall feel I learnt nothing new from the generated answers and could recognize existing inspiration. ChatGPT is valuable at coming up with a bunch of stuff fast, but I'm not impressed by the quality itself.
Specifically in the first 20 examples, I'd say over half of them are mostly false (ie. I would not follow its advice and think there are good reasons to not follow the advice) : 2 3 4 6 7 8 10 11 13 16
Others are uninteresting.
Just 2 I found both mostly true and mostly novel ("If you've never had a cold, you're not exposing yourself to enough germs" and "If you've never received a parking ticket, you're not driving in enough unfamiliar places.")

I found the Robin Hanson versions mostly uninteresting.
I found the historical accident versions mostly uninteresting.

janusJan 17 20237

If you want uncensored and creative outputs I recommend using code-davinci-002, the GPT-3.5 base model. It has helped me develop many original ideas. Because it's a base model you'll have to be more creative with prompting and curation, though.

NunoSempereJan 17 20234

Thanks. Did you arrive at this independently, or are you mirroring nostalgebraist's recommendation? I'm guessing independently, but wanted to confirm.

Yonatan CaleJan 18 20234

I expected 5% of the answers to be newish and useful, which I'd interpret as "yeah it could be mildly useful to ask for such insights and spend the time to scan through them. not crazy-low nor crazy-high productivity".

In practice, this one was new and maybe useful for me:

If you've never had a disagreement with a family member, you're not expressing your thoughts and feelings openly.

(I remembered it as "if you've never had a fight", funny to quote it now and see my memory wasn't accurate)

I also started reading "The Elephant in the Brain" following this post :)

Jared WinslowJan 17 20233

Coming into this, I expected GPT to write clear, grammatically-correct responses that were neither conceptually cohesive nor logically consistent.

Following this idea, the first thing I analyzed was the logical connections between the topics chosen in GPT's responses.

Overall, I think GPT3 performs the best at the Robin-Hanson-esq task 2, followed by task 1, then the historical accident task 3. While most of the responses for task 2 were logically inconsistent or nonsensical, a few examples were consistent and even a little insightful. The three I think best, in ascending order, are:

We pretend that the education system is about preparing students for the future, but it's more about preparing them for standardized tests. If we cared about student success, we would focus more on experiential learning and critical thinking skills.
We pretend that the beauty industry is about helping people feel good about themselves, but it's more about promoting unrealistic beauty standards. If we cared about self-esteem, we would focus more on inner beauty and self-acceptance.
We pretend that the economy is about providing for people's needs, but it's more about maximizing profits for corporations. If we cared about people's well-being, we would prioritize a more equitable distribution of wealth and resources.

Each of these responses seems to have a logical base point, but the lack of further specificity leads to un-insightful answers. The response on the education system, while common knowledge and not novel, seems somewhat insightful in mentioning critical thinking. The rest of the responses were quite poor, with the notable reasons being: choosing an alternative path (A) that didn't relate to Y (e.g., providing rehabilitation as a way of finding justice), mismatching the subject of the clause experiencing Y and the subject of the clause experiencing Z (e.g., mentioning consumers' goals, then mentioning companies' goals), and general vagueness (e.g., "use social media differently").

Task 1 seems hit and miss.

If you've never had a disagreement with a friend, you're not expressing your opinions honestly.
If you've never had a flat tire, you're not driving enough.

Some are good but obvious, some novel but useless, and others nonsensical.

For task 3, the main issue seems to be lack of specificity. So while the connections between chosen topics are existent, the broader point made in a response is weak. The distinction between culture and non-culture seems lost upon GPT3. Making "data-driven decisions" isn't a unicultural phenomenon.

Overall, I was surprised by GPT3's logically consistent responses. Then again, given enough tries, monkeys on typewriters... GPT3 still doesn't have any conceptual understanding of words, exemplified in the incohesive content behind its reasonably clear grammatical form.

These are impressive responses from GPT3 in the sense that many people would find a number of the responses insightful on first reading. It's unclear whether it could come up with completely original ideas, but with further advancements, that could soon be a possibility.

Effective Altruism Forum
EA Forum

Can GPT-3 produce new ideas? Partially automating Robin Hanson and others

82

Brief description of the experiment

Motivation behind this experiment

Pause for the reader to make predictions about GPT-3's current capabilities.

If you never miss a plane...

We pretend that X is about Y, but it's more about Z. If we cared about Y, we would do A instead.

If X was a historical accident, then this is a hint that this isn't the best or moral choice.

Conclusion and my personal perspective

Acknowledgements

82

Reactions

More posts like this