An AI App That Became a Litmus Test
What the data revealed about AI hiring

Introduction — A Tool Built Out of Necessity
In early 2025 I found myself laid off and facing a problem. The previous eight years of genuinely cutting-edge prototype work — incubation, experiments, and AI agents — was almost completely covered by NDAs. In most cases I couldn't even name the client. I needed to job hunt with an impressive resume but an empty recent portfolio. If you're unfamiliar with coding interviews, they typically involve walking through how you pulled something off on a past project. I could describe the kinds of problems I'd solved, but the evidence lived behind walls I wasn't allowed to open.
What I could do was build something new. Agentic AI was everywhere, and I'd spent the better part of two years building AI agents I was legally forbidden to talk about. So I built one I could.
The concept: a web application that takes a job posting URL, scrapes the description, feeds it alongside a detailed version of my background to Claude, and returns a structured analysis — a match score from 0 to 100, a recruiter-perspective paragraph, specific strengths and gaps, and a selection of my most relevant experience stories. Built in six days. Production-ready. The demo showed what I could build while making the case for why I was the right hire for a specific role. Win win.
I also made a full video walkthrough for the Firefly YouTube channel linked at the top of this page — the first half shows the tool in action, the second half gets into the code, and specifically the prompt driving the analysis. That second half matters more than you might expect, and we'll come back to it. I'd recommend watching the full video before reading on — it covers how the tool works, and this article picks up where that leaves off: what I learned from actually using it, and what that reveals about how AI is being applied at scale in the real world.
What It Was Supposed to Be — A Cool Skills Demo
The original goal was straightforward: an impressive demo that proved I could build AI-integrated applications under real pressure, with real stakes.
It worked on that level. The hashed URL feature — which let me pre-load a specific job description and send a personalized link to a recruiter — got genuine reactions in interviews. Consistent outputs. Clean UI. Built fast. Box checked.
But within a few weeks of actually using it, something more interesting started happening.
What It Became — The Litmus Test
I started noticing a pattern in my callbacks. Every role where I got an interview had scored between 88 and 92. Not below. Not above — in four months of active searching, I never received a callback on anything that scored higher than 92. Normalize that range and you're essentially looking at "top 5% match."
At first I assumed this was confirmation bias. Then I kept tracking. The pattern held.
What this revealed wasn't really about my scores — it was about the other side of the process. Recruiters at major tech companies are almost certainly running AI analysis on incoming applications. This isn't speculation; it's the only model that scales. A role at Microsoft or Google can attract thousands of direct applications, multiples more from LinkedIn. No human team is reading all of those. AI sorts the stack and surfaces a shortlist for human review.
And AI analysis has consistent behavior: given a specific, well-structured question, it gives consistent answers. My application, run through whatever system they were using, was apparently landing in the same quality band as my own tool was predicting. The correlation was too clean to be noise.
So I changed how I used the app. Instead of applying broadly and scoring afterward, I ran postings through the analyzer before applying and only submitted for roles that hit the threshold. My interview rate improved further.
I even built a batch mode — feed it a full job listing page, analyze dozens of postings at once, surface only the ones above 88. The cost-per-analysis made it impractical to run at scale, but the logic was sound: find what works, automate it, filter. Exactly what the recruiters were running on the other end.
The Pattern — What Four Months of Data Suggested
Here's where it gets interesting. Over four months of active searching, a clear pattern emerged — and while this is one person's experience, not a research study, the consistency was hard to ignore.
The roles where my score was high were exact matches: AI prototype engineer using React, TypeScript, Python — which is literally what I'd been doing. The roles where I scored lower were logical lateral moves. Roles where my experience clearly qualified me, but where the path wasn't a straight line.
Dev Relations is a good example. Developer relations is the work of understanding developers deeply enough to represent them internally, communicate with them externally, and build trust between a company and its technical community. My background — years of prototype work, UX research, technical communication, building things that required explaining complex systems to non-technical stakeholders — maps onto that well. I'd argue I'd be excellent at it.
My scores on Dev Relations roles were consistently below the threshold, and after two months of finding and submitting dev relations roles that were a good skill fit, every single one had sent an automated "not selected" email within three days — every time, across multiple companies. Simple math. Applications take time, and I had a tool giving me reliable signal on where that time was worth spending. Chasing roles I'd likely be filtered out of wasn't a strategy, it was just noise.
[PHOTO]
Now look at this prompt visible in the GitHub repository. Mine asks Claude to analyze fit from a recruiter's perspective, weigh strengths against gaps, surface relevant stories, and write a nuanced recommendation. That recruiter framing isn't decorative — assigning a role is a core prompting technique. Claude draws on everything it's learned about how recruiters evaluate candidates and responds through that lens. The result is contextualized output, not just a keyword match.
With an additional layer of contextual prompting, candidates who are qualified but not direct-path fits might surface more often. The prompt could ask: what would this person bring that a direct-path candidate wouldn't?
But there's probably a deliberate reason it works the way it does. Technical hiring has long accepted a similar tradeoff — whiteboarding exercises have little to do with how engineers actually work day to day, and most people in the industry know it. They persist because they reduce false positives, even at the cost of more false negatives. Candidates who would have been great but test poorly get filtered out, and that's considered an acceptable loss when you're choosing twelve people from five thousand applications. Exact-match AI filtering is likely operating on the same logic. It's not a flaw in the thinking — it's a conscious dial, turned toward precision over recall.
Conclusion — The Right Centaur
When people talk about AI integration, they tend to focus on the technology — which model, which platform, which API. What is more important is how it is used, and AI agents are built with an intention of use — it's not just making a model and letting people use it however they decide.
Centaur Theory is a framework in artificial intelligence and automation that explores the symbolic collaboration between humans and machines. Derived from the mythical creature that is half-human and half-horse, the term symbolizes combining the unique strengths of human intelligence (intuition, ethics, and strategic thinking) with the computational power of technology.
Two versions of that relationship are worth holding in your head. One is a centaur where the AI is the brain — setting the direction, making the calls, with the human carrying out instructions. The human becomes an executor. Capable, maybe, but not really thinking. Vincent Cassel's character, Engerraund Serac, in Westworld Season 3 is the fictional version: intelligent, accomplished, and (spoilers!) — revealed to ultimately be a puppet to an AI quantum computer system. His character realized that by following the AI's instructions without question he would be led down a path of success and wealth he could never achieve as a mere human. Full submission to the AI god he had created.
An extreme example, granted, but with LLMs we do need to be cognizant of when we are using this technology as a crutch.
The other end of that spectrum looks very different: Tony Stark and Jarvis from Iron Man. The AI runs the calculations, manages the systems, extends what Stark can perceive and do by an order of magnitude. But Stark sets the goals. Stark makes the calls. When Jarvis suggests something, Stark pushes back — sometimes sarcastically, always deliberately. The human is still the brain. The AI is the most powerful set of tools he's ever had.
If there's one thing I'd ask of anyone building with AI right now, it's this: build like Stark. Build these tools with the intention that they are structured to be used as an extension of the human brain, not a replacement for it. Resist the temptation to lean so much into this technology that you become nothing more than a passenger. You can't prevent every user who might want to take Serac's path to success, riding the AI's coattails — but you can choose to build intentionally for the most positive impact on your users.