Hide table of contents

Today I'm releasing ExoPriors Scry, a new research tool over the EA Forum, arXiv, HackerNews, community-archive.org, and more. Try it here.

TLDR

  • You and your agent can now query this rich dataset with the full expressive power of SQL (Structured Query Language) + embedding vector algebra. You DON'T have to be familiar with these, Claude writes the search queries for you. Just ask Claude whatever questions you have about the tool.
  • It's best driven via Claude Code (with Opus 4.5). I highly recommend getting acquainted with it. It can be driven by Claude in the browser--you just have to change one setting and paste in the prompt--but it's significantly slower and less agentic.
  • Even with all the fanciness of AI so far and Deep Research agents, this is a novel search/research experience, that allows you to do things you can't do anywhere else.
  • Exploration is encouraged! Text embeddings are kind of magical. This an easy way to get more familiar with them, like how to compose "vibes" like @FTX_controversy - (@guilt_tone - @guilt_topic), to find writing that was about the FTX crisis and distinctly without guilty tones, but that can mention "guilt".
  • There's an Alerts functionality. You can be emailed when some new post, comment, arXiv paper, etc., is ingested and meets your very nuanced criteria.

Some example usage:

> find the 4 posts over 100 karma that are most distant from each other in every way (not the average of them). we want to create 4 quadrants.

 > what are the most whimsical and uplifting EA Forum posts?

 > find me the most serious documents about civilizational refuges, and anyone who could be able to be rapidly mobilized to help with that.

> who flip flopped multiple times on FTX crisis, being against -> for sam -> against, or for sam -> against -> for sam, in time

> Which EA Forum posts predicted things about AI that turned out to be wrong, and what were the common failure modes? 

 > exhaustively explore who has tended to coin the most new terms 

> the people who have had very uplifting and whimsical EA posts, how are they doing. does anyone seem like they could use support. followed up with: search for all of lesswrong, twitter, etc., of people who plausibly feel abandoned by EA. do sophisticated search.

^ I got a very meaningful response, but will leave out results for privacy.


Some cool aspects of this project:

  • hardening up a SQL database enough to let the public run queries. There's so much collective trauma about SQL injection attacks that most people have forgotten that this is possible.
  • I've built on syntactic sugar for using custom vectors. Agents can embed arbitrary queries and refer to them with @vector_handle syntax. This compactness helps agents reason efficiently, and let's us not have to pass around 8kb vectors.
  • product-as-a-prompt, agent-copilot-targeted UX as a paradigm. It was pretty cool realizing I could e.g. just describe my /feedback API endpoint in the prompt, to open up the easy communication channel with users and help me iterate on the project better.
  • There's currently over 23M embeddings and 600 GB of indexes for query performance, and long query time limits. You can learn a lot about the actual shape of the world with exhaustive search, like finding that no single person across Hacker News mentions a combination of particular topics across their comments.
  • there's an Alerts feature. The system ingests thousands of papers, posts, articles, comments a day, you can specify an arbitrary SQL query that will run multiple times a day, and get an email when the output changes. If you find a query you like, you can just tell Claude to submit that query as an Alert (and it'll use our API appropriately).

Bounty Offer

When I'm in the mindset of building this, I tend to struggle at coming up with interesting research questions. I will pay $12/$8/$4 to the three most interesting queries (rough prompt + SQL) provided in the comments, on January 1st.

 

Happy to take any feedback!

P.S. If anyone wants to sponsor this, I could ingest a large fraction of all academic papers, Bluesky, and dozens of other sources that you point out.

9

0
0

Reactions

0
0

More posts like this

Comments1
Sorted by Click to highlight new comments since:

I'm curious about how representative the EA forum is to the EA community, particularly in regards to worldview. One thing you could try is to take the EA community surveys done by RP, and estimate how a random representative of "the forum" would answer, where the random representative might be a random user or a random comment/post weighted by karma or amount of text

Curated and popular this week
Relevant opportunities