From job posting to hire: templates, sourcing campaign, and LLM-resistant tasks

Romain Barbe🔸

This document explains how Mieux Donner ran its 2026 hiring round: how we decided what to hire for, how we built the offer and the process, the results we got, and what we would tell another organisation doing roughly the same. It is meant to be reused. We also advise you to read the chapter on hiring in “How to Launch a High-Impact Nonprofit”.

Mieux Donner is the French effective giving initiative, incubated through Ambitious Impact (AIM) and Giving What We Can in 2024. We were roughly 2FTE, have directed over €1M to high-impact charities at a giving multiplier of 5–6x and are now looking to expand the team.

I used AI to do some analysis on the application (without applicant data) and to correct my speech-to-text.

A note for applicants: This document is written for people running a hiring process, not for people applying to one. Reading it will probably not help you, and we do not really advise it. Knowing how a process is designed could be useful if you are applying to a government body or a high-earner position, but the process we follow is unlikely to resemble any of those. And if you are applying to an EA-inspired organisation, there is no point in trying to rig the process: you would be taking the role from someone with more capacity to have a bigger impact than you.

On confidentiality. The text of this document is not confidential and you are welcome to reuse it. The underlying materials, namely the exact questions, the practical tasks, the response emails and the weighted factor model (WFM) we score candidates with, live in a separate folder we keep confidential to protect the integrity of the process. You can request access; we only grant it to people in a managing or hiring position writing from an organisation email.

2026 at a glance

Positions planned	2
Roles opened	4 (see why below)
Applications received	424
People hired	3 (2 planned + 1 part-time)
Process	4 steps, about 6 hours per candidate who goes the distance
Our time invested	about 160h: 20h design, 40h outreach, 100h reviewing

1. Deciding what to hire for

Budget and contract

We had secured funding for another year of operations and were confident that growing the team was the right next step. We hire on permanent contracts with a trial period.

Why we opened four roles to hire two people

We knew we wanted to grow, but not where extra effort would pay off most: more SEO content, podcast outreach, an ambassador network, relationships with high-net-worth individuals, corporate partnerships, website optimisation, social media, newsletter. We had cost-effectiveness models and had compared notes with other effective giving initiatives, but real uncertainty remained. So rather than pre-defining two narrow roles and hoping the best candidates would fit them, we opened four broader roles and optimised for finding the best possible person, then shaped a role around them. We set out the full rationale in an earlier post on the EA Forum (“Why we opened 4 roles to fill 2 positions”).

What opening four roles did to the applicant pool. Opening more roles increased the total number of applications, but volume was shaped more by the nature of each role than by the count. What mattered most was whether the role title matched terms people actually searched for. People browse job boards by function: "communications", "operations", "fundraising". A role that maps onto a familiar function attracts strong volume; a role described in terms that do not match how candidates describe themselves, whether because the title is too specialist, too vague (“generalist”), or simply uncommon, will attract far fewer applicants regardless of how well the underlying work is described.

Role	Applications	Rate 1st step passed
Communications & Partnerships (easy to attract)	161	42%
Operations (broad, generalist scope)	124	46%
Director of Philanthropy (we actively pushed sourcing)	102	49%
Growth Hacker (specific skill set)	33	39%
Applied to several roles & not passing 1st step	4

We received no negative feedback from applicants about opening more roles than we intended to fill. Some applicants did not even realise the other roles existed, and it did not seem to change their experience. We are unsure whether stating that we look for “exceptional people” deterred anyone, since we cannot observe those who chose not to apply but we know that this framing attracted some people.

Despite stating clearly in the FAQ that candidates could only apply for one role, a meaningful number at step 1 applied for several. We asked them to pick the one where they felt they had the strongest chance of performing well in the exercises, and told them we would have time later in the process to build a role that combined their skills if needed.

Choosing what to test for

We define the tasks and questions around the time we expect the person to actually spend on the job: the biggest chunks of real work become the biggest parts of the evaluation.

2. The offer

Open or closed round?

This is mainly based on discussion during a knowledge sharing session about hiring during EAG London 2026.

Before writing anything, decide whether to run an open round (a public posting) or a closed one (targeted outreach only). A few questions to ask yourself:

How likely is the person you want to be only one or two degrees away from you?
How big is the pool you already know?
How specific is the role? Could a capable outsider do it well, or does it need rare prior knowledge?
How much time can you spend on the process?
How fast do you need someone?

We would still open a round publicly at least once a year, to give strong people outside your network a chance, and because a public round has communication benefits of its own (see the landing-page section below).

An argument we are sceptical of. “if you look for a more senior profile, closed round is better” came up several times. We are dubious: years of experience are not well correlated with performance. It may matter only where the role involves counterparts who would judge a candidate on visible age or seniority.

A public offer, and why

We publish the full offer on our website as a landing page with a very detailed FAQ, then share it widely: EA job boards, our newsletter, social media, and essentially every free job platform in France, plus a one-day free LinkedIn boost. Publishing the full offer publicly has a side benefit: it generates backlinks and SEO value, which builds search-engine confidence in the site.

The landing page is worth real effort. We worked hard to make ours attractive and received many compliments; a few people who were not job-hunting told us they reconsidered after reading it. We also used the page to make the case for effective giving: many readers interested in the Director of Philanthropy page already worked in foundations or philanthropy advising, so the page introduced effective giving to thousands of people well beyond our hiring target. Hosting the offer on our own domain also drove strong, qualified traffic: the landing page drew more than 3,000 views, and each role page averaged over 2,000 views at roughly two minutes each, a strong quality signal to Google useful for your SEO.

A webinar was worth it. We ran a live Q&A webinar and found it well worth doing: 120+ registered and about 70 attended live, and many candidates later watched the recording and were grateful for it. A version we only advertised on LinkedIn got little organic traction. Between the webinar and the detailed FAQ, candidates had almost no remaining questions. Some people still wanted to jump on a call with us pretending they had questions for us, we said that everything was already documented and we still invited them to send any, promising to fold the answers into the FAQ.

Sourcing

Keep a running target list. If your round is open, and even more if it is closed, this matters. In the six months before opening, I noted everyone I found impressive or whose profile matched a topic we work on. By launch I had about 50 people I invited to apply and about 50 more I invited to share the offer with their networks. A newsletter that would share your offer to the right audience would be valuable, but I did not find one accepting.

Make the referral ask actionable. I used Happenstance (which runs boosted search across your LinkedIn network) to find about 30 more people. And one sourcing tip that worked: instead of asking “can you think of one or two people who would fit?” (to which few people produce names), ask “can you run your LinkedIn contacts through Happenstance and send me the list?” It is a bigger ask, but almost everyone said yes and gave way more names.

High impact directory: we used the High impact Directory by High Impact Professionals. We checked for people matching our criteria and interested in the roles (speaking French was the most limiting one) and sent them an email with the offer (a lot of them receiving it in SPAM though)

Salary

The salary philosophy we state publicly in the offer:

Our approach to remuneration rests on two principles: fairness to our team and responsibility to our beneficiaries. Pay should be enough to avoid frustration or financial stress, so everyone feels fairly valued. As a non-profit we balance that against directing as much funding as possible to our beneficiaries: we do not try to match private-sector pay, but to keep salaries fair and consistent with our mission. Needs vary, so we ask candidates to be open about their expectations, and we agree a fair package with the person we hire.

Practical choices:

Range. €30,000–45,000 net per year, up to €60,000 net per year for exceptional cases, being on the high but not very high range for salary in France.
Framing. We tell candidates upfront that salary is not a negotiation but a balance between their needs and our ability to help more, and we share an excerpt from Charity Entrepreneurship's book on salaries before the interview.
Weight. In our WFM, salary expectations carry a 5% weight, modest, but enough to matter at the margin: at the low end of the range we could fund roughly three hires for the cost of one at the top.

What we ask candidates to reflect on, for the last round of the process. Publishing a range anchors expectations, most of our top candidates land near the top of the range we publish. We want them to answer not “what salary would make me happiest?” but “what salary is high enough that I will not be financially stressed?”, the two can differ. We share the director's (Romain's) salary, which is low and the lowest in the organisation. People often revise their ask down, sometimes by thousands of euros. We also tell candidates we will not negotiate, and that a high ask is taking budget from other campaigns and makes them less competitive: we might hire someone else simply because they would cost less.

Referrals

We run a referral bonus: €300 to you, or €500 donated to the charity of your choice, paid €150 on hire and €150 after six months. Referrals must reach us (by direct message or email, naming someone who has agreed to be contacted) before the person applies. In practice we saw barely any signal that it changed outcomes, and no pushback about offering it.

3. The process

Heavily adapted from Charity Entrepreneurship's current processes and their book How to Launch a High-Impact Non-Profit. Four steps, about 6 hours total for a candidate who goes all the way:

Written application (about 30–60 min). Three questions on motivation, thinking and fit and a LinkedIn profile or CV. Acceptance here is relatively high (around 45%) but requires actually understanding what Mieux Donner does.
Practical exercises and tests (about 1.5h). Three exercises assessing adaptability and reasoning, plus a personality test (looking only at personality traits correlated with performance) and a problem-solving test (run on TestGorilla).
Coworking session (about 1.5h). After 30 minutes of preparation, one hour working on a real project with the director. A “working with me” document has been shared before the session and the director's role is to act as an executor and knowledge base. He answers questions about Mieux Donner and carries out tasks the candidate assigns to him, but he does not hint at what he would do in their place.
Final interview (about 1h). A conversation with the cofounders on behaviour, work preferences and view of impact, in French and English.

About TestGorilla: We ran both the Big Five and the problem-solving test on TestGorilla. At the time we used it, the free tier allowed unlimited responses on both tests; their pricing policy appears to have changed since, so check current terms before relying on the same approach. We were genuinely happy with the platform. TestGorilla has strong built-in mechanisms for detecting suspicious behaviour, and across roughly 200 candidates who completed the tests, almost none triggered any flag. We also saw no strong evidence that candidates gamed the personality test: the distribution of Big Five scores across our pool followed a roughly normal curve centred near 50%, consistent with honest responses rather than strategic self-presentation.

Defining the weights

We weight each step roughly by the share of real job-time it represents, with deliberate exceptions. The written application gets only 15%, because we don't think it reflects candidate quality well; it is mainly a gate to see who clears the bar. The breakdown:

Written application: 15%
Autonomous long task: 19%
Personality and problem-solving tests: 6%
Coworking session: 20%
Final structured interview: 25%
A final 15% covers global criteria:
- Conscientiousness in the process: responsiveness to follow-ups and respect for task instructions.
- Keenness on the organisation: genuine engagement with the mission.
- Conflict potential: whether the person seems unlikely to create drama.
- Reference call.
- Personal fit.

Rating scale and calibration

Each criterion has a predefined 0 to 10 scale with explicit anchors (what a 6 or an 8 looks like). The first time we scored without anchors, ratings drifted; anchoring made them repeatable even a month apart. We noise-tested this. About a month after completing the application stage, we re-scored five applications without looking at our original scores first. In several cases after this noise test, some applicants were sent in 2 different channels, we did not recognise the applicants by evaluating their answers (that we are checking first) until we re-read the CV, which forced a genuinely fresh evaluation. When we then compared the two sets of scores, none of them changed significantly. We consider this a reasonable signal of consistency. Defining anchors before you start is what made this possible.

All candidates across all four roles were scored in the same weighted factor model. The exercises and their anchors differed by role, but we designed them to produce comparable scores, so that a 7 on the philanthropy task and a 7 on the growth hacker task reflected a similar level of performance relative to what that role required. This made it possible to compare candidates across roles and to apply a single passing bar across the whole round.

Designing questions LLMs fail, and benchmarking against them

For each question and task we first run the top LLMs and read their answers, to find where they underperform (for example, they often claim all donations are equally good, which is not an effective-giving stance). We then set the WFM so that a typical ChatGPT answer scores about 5 out of 10. We judge answer quality independently of AI use (except when keeping obviously wrong formulation): we want people who can produce high-quality, Mieux-Donner-style work, with or without AI. Some candidates used AI well (feeding it our site, using precise prompts) to get past generic answers, which we are fine with.

Making the tasks LLM-resistant

We built 16 tasks across the four roles under one constraint: the best exercises should be hard to do well with AI alone. . What we found actually works:

Strategic self-analysis. Reflecting on your own trajectory, identifying where you genuinely excel and where you would struggle, requires real self-knowledge that AI cannot supply.
Outreach requiring a light touch. For a message to a specific organisation or an email for a targeted publication, AI tends to produce output that is technically correct but slightly off: too direct, too promotional, missing the implicit codes of the relationship. Proposing a message you would truly be ready to send is still hard for AI.
Prioritisation where the obvious answer is wrong. Given a list of event attendees or potential ambassadors, AI picks the most famous names (ministers, the biggest YouTubers, highest status). Strong candidates identify less obvious but more strategically relevant profiles.
Tasks that reward using AI well. We include exercises where candidates are expected to use AI efficiently, generating a functional plugin from a brief, or producing a publishable draft article with images. This tests whether they can harvest AI's full capacity. These were the kinds of questions with the highest variation, even using AI plenty of people got 3/10 when strategic thinking and good prompting led to a 9 or 10.
Niche legal or fiscal questions. Specific questions on niche law or fiscal edge cases stay genuinely hard for LLMs. A useful tell: ask the same question framed as “is this legal?” and then “is this illegal?” you will often get yes to both.
Multimedia inputs. Sending a long video clip and asking for feedback on a specific two-minute section adds friction that prevents lazy AI use. Only genuinely tech-savvy candidates process it efficiently with AI, and if they do, they have earned the score.

The honest conclusion. It will keep getting harder to find tasks that a talented human can do in 30 minutes but AI cannot. Treat your task list as a living document, not a one-time build, and re-benchmark against the newest models the day before you launch, between rounds, some questions we relied on had been newly mastered by the latest models (I haven't tested Fable 5).

Emails, handling, tracking

We write parts of every response email, positive and negative, for every stage, in advance, so processing is fast. Applications are handled every day or two; each candidate gets half-personalised rejection (with specific feedback on each exercise) or an invitation to the next step. For scheduling we ask candidates for their availability and plan the email with the next step; later steps use booking links (the director's calendar for the coworking session, which needs a 30-minute prep; the co-founders' calendar for the interview). We send a single chase to non-responders.

How we communicate rejections. Up to and including step 3 (the coworking session), rejections are sent by email with per-exercise feedback. Candidates who reach the final interview are called directly: we explain the specific reasons for the decision.

One person ran the entire process from start to finish. The advantage is no noise in the evaluation: one consistent set of standards throughout. The disadvantage is obvious: the process is entirely dependent on that person's availability. In our case that person was also the only member of the organisation, which is not ideal.

Value personalised feedback

We received a large amount of positive responses to our rejection emails: roughly half of them got a reply (with only 2 negative). Several candidates proposed a follow-up call about effective giving. More than 20 people connected with Romain on LinkedIn after being refused, and several commented positively on his LinkedIn posts in the weeks that followed.

Because we use clear metrics to evaluate each exercise, we already know exactly what a candidate did well and where they fell short. Writing a personalised rejection takes up to one minute more per applicant than sending a generic email, and perhaps 2 minutes more than no reply at all. Across the full round, that is roughly four to six extra hours total compared to ghosting. We mainly do that for deontological reasons but I also believe that it makes me feel better about the process and that it has positive results and lets people have a better image of Mieux Donner and effective giving.

Minimum scores for progression

We cap the expensive late stages: we wanted roughly 10 final interviews and 30 coworking sessions. We set each stage's passing score once a batch of candidates had reached it, aiming for about a 30% acceptance rate per stage, looser at step 1, tighter at step 2.

We did make exceptions. A volunteer whose qualities we already knew from prior collaboration passed a step despite being 0.1 points below the threshold. A candidate who had produced exceptionally high-quality content as part of their application was passed through despite being 0.4 points short. In a third case, a candidate whose task scores were almost all 9/10 did not formally pass because their personality and problem-solving test scores were close to zero, leaving them 0.1 or 0.2 points below the bar overall. We let these exceptions through, a decision we are still not certain was right.

We told candidates explicitly where they stood. We stated what percentage of applicants they represented: "top 10% after step 2". At the final interview invitation, we told candidates directly that we estimated their odds of receiving an offer at 20–30%. Our goal was to keep strong candidates engaged through a long process, make them feel genuinely valued, and be transparent in a way that respected their time and helped them calibrate their own decisions.

Biggest gap: delay to process applications and opening length

The biggest failure in our process was speed. Romain was simultaneously running the organisation, managing all communications about the offer, and reviewing applications (more than 200 applications in the final week). At some points in the process we took close to three weeks to reply, when we had told candidates to expect one to two weeks between steps. This was by far the most negative part of the round, and we are not proud of it.

Part of this was caused by an unexpected leave that left Romain alone to run the organisation. But part of it was structural, and avoidable. The lesson: do not open a round before everything is ready. By "everything", we mean having already written the outreach list, the messages to those people, the newsletters to contact, the landing pages, and the announcement for every platform you plan to publish on. With all of that prepared in advance, you can launch the full communications wave within a few days and keep the offer open for only ten to fourteen days.

After the process email

After completing the process, we sent a final email to all applicants summarising the outcome and sharing resources they might find useful, regardless of how far they progressed. For candidates who reached the top 10% of the process, we sent a more personalised message with specific opportunities that might be a good fit for their profile. Our strongest candidates who were not hired were added to the Top Candidates High Impact Professionals directory, with their consent.

4. The 2026 results, and what we learned

The funnel

Stage	Enter	Pass	Rate
Written application	424	189	45%
Long task (the real filter)	163	27	17%
Coworking session	27	9	33%
Final interview	9	4	44%

Finalists vs. offers, and who we hired. We treat “success” as passing the bar to be hired: four candidates passed it (our finalists). We had planned to hire two, but because four cleared the bar, and with some evolution in current team members' responsibilities, we hired three, including one part-time. So: four finalists, three hires.

The long-task stage is where the funnel really narrows and where most of the signal is. We also lost a meaningful number of qualified candidates to drop-off, most of them before the long task; those who withdrew did so mainly because they found another job during the process.

What actually predicted who advanced

The written application barely predicted who would pass the practical exercises. Strong application scores and strong task performance were almost uncorrelated. The real work sample, a substantial and role-representative task, was the true filter.

Biographical data: an open question

We barely evaluate biographical data. We look at it only at the application stage, and mostly as a coarse signal: whether someone was admitted to a competitive university or role. We are not sure this is the right approach and would like a better way to use this signal.

Where the good candidates came from

We sourced through several channels and tracked how far each one's candidates went, not just who we hired (depth = furthest stage reached). Small numbers beyond the coworking stage, so read this as directional.

Channel	n	Reached long task	Reached coworking	Reached interview	Finalists
Jobs that make sense (niche FR board)	60	51.7%	6.7%	3.3%	2
HIP Talent Directory	51	47.1%	3.9%	0.0%	0
LinkedIn (job offer)	76	15.8%	1.3%	1.3%	0
Outreach / referrals (non identified)	237	45.6%	8.4%	2.5%	2

A niche, values-aligned job board gave the best yield. The French platform Jobs that make sense produced the most candidates reaching the late stages per applicant, and two of our four finalists.
Outreach and referrals produced our two strongest candidates overall, mixed across our newsletter, direct LinkedIn outreach, and personal sourcing.
A public LinkedIn post produced a lot of volume but very low conversion and zero finalists.
The most EA-aware pool performed only at the average. One set of candidates came from the HIP Talent Directory (run by High Impact Professionals), where people mostly self-register. They were noticeably more familiar with effective altruism principles than the average applicant, yet they advanced at roughly the average rate and produced no finalist this round. Greater prior alignment with our principles did not, by itself, predict stronger performance on the tasks.

Takeaway. Favour niche, values-aligned boards and referrals. A mass public posting brought volume but no finalists, and even the most EA-aware pool performed only at the average. The people who do well on your real tasks are not reliably the ones who look best on paper, nor the ones who arrive most aligned with your principles.

We changed the relative weight of TestGorilla vs tasks

For the 2nd round we initially planned to give ⅔ of the weight to the task and ⅓ to the personality traits and problem-solving tests. But the variance of the tests was way higher than of the tasks, we need to change by 75%-25% to have the tests responsible for 33% of the variance.

What we learned about the interview

When we analysed our questions, the open, judgement-based ones spread candidates widely and tracked the final decision. Rubber-stamp questions where everyone answers well added little but are probably good safeguards to keep. We are pretty happy about the questions we asked and feel confident in what they revealed about who the best applicants were.

AI in the application

About one in six applications showed signs of unedited AI generation and less than 2% were purely AI. Because we benchmark every question against the top models, simply using AI without adding value did not clear the bar. We apply a tiered penalty rather than an outright ban: a light, isolated AI-sounding phrase is a minor and recoverable deduction; a structure clearly copied from ChatGPT is a heavier penalty; a raw copy-paste with no added substance is practically disqualifying.

Score	Meaning	n	Passes the first screening
−10	Direct copy-paste, with no added value	7	0
−5	Structure very similar to the original	16	2
−2	Contains at least one unnatural or awkward expression	43	16

5. The time it takes

Across the whole round we spent about 160 hours: roughly 20h designing the process and tasks, 40h on outreach and communication, and 100h reviewing candidates. Going from one role to four roughly doubled the design effort rather than quadrupling it, thanks to shared exercises and a benchmark-and-anchors system.

Step	Candidates	Time per candidate
1. Written application	424	~5 min
2. Practical exercises	~175	~7 min
3. Coworking session	27	~70 min
4. Final interview	9	~70 min x2 (two interviewers)

6. What we would improve

Rethink the written application's role. It is a decent good/bad filter but barely correlates with later success, so we will try to find other questions or lean even harder on the real work sample.
Automate the feedback emails. Connect the response-email templates to the scoring sheet so each candidate's per-exercise feedback is generated from their ratings almost automatically. If 5/10 on exercise X, the feedback email could say what was missing to have 10/10 on this.
Find a better way to use biographical data, which we currently barely evaluate (see above).
Expertise and network point, we considered adding points for people based on the network they are willing to use, their AI capacity if the role was not explicitly asking for it or even their own audience if for example they have a huge social media influence but didn’t implement it.
“No cover letter” plainly in the offer: In more than 10% of the applications we received a cover letter which was not asked for and in the majority of the cases it was not read, which should have been clearly added to the application. “we do not ask for a cover letter, and we will not read one.”
Build a self-service scheduling page for the practical exercises. We scheduled the task-launch email manually for each candidate. A simple form or booking link where candidates trigger the email themselves when they are ready would have been worth building.

What we are unsure

Reference calls. We only made reference calls for the strongest profiles, and used them more as validation than as comparison between candidates. The few reference calls we did were astonishingly positive but also self-balanced and seemed pretty honest. Perhaps we should have conducted reference calls for all finalists. On the other hand, because the highest-rated candidates also received excellent references, it is unclear whether reference checks would have materially changed any decisions.
Live simulation. Consider whether the Director of Philanthropy role needs additional steps or real-case exercises. Relationships and judgment in high-stakes donor conversations are hard to evaluate in a standard 1h15 task; a more extended trial or a live scenario may give a better signal.
Problem-solving test. We are genuinely uncertain whether the 8-minute TestGorilla problem-solving test is strongly enough linked to job performance to justify using it as a factor in close decisions.
AI for screening. We did not use AI during the process. We were not confident the ratings would stay consistent over time. A partially autonomous process, where AI produces a first-pass score that a human reviews and can override, seems more tractable and worth testing, a fully autonomous one raises legal questions in several countries.

Reusing this

You are welcome to adapt this process for your own organisation. If it is unclear where an additional hire would create the most value, we would tentatively recommend opening broader roles and optimising for the quality of the person over the precision of the role. And whatever you do, get a realistic work sample in front of candidates early: in our experience it is the best predictor you have.

Templates (questions, tasks, response emails, and the weighted-factor model) may be requested only by individuals in hiring or management roles using an email address from the hiring organisation, and must be kept strictly confidential: Template-Process-Hiring_Mieux-Donner.

Effective Altruism Forum
EA Forum