Engineering · Remote (US) · Full-time (future)
Agent Logic Engineer
Design the prompts, tools, and self-evaluation loops that make our agents reliably useful.
About the role
ClaimIt runs on a 4-agent system that monitors purchases, drafts claim materials, and prepares them for user approval. The hard work isn't training models — it's prompt design, tool selection, output validation, and recovery from agent mistakes. You'll own that craft. We use Gemini (via Google's Agent Development Kit) with tool routing, output validators, and self-evaluation loops. You'll keep the agents honest, fast, and trustworthy.
What you'll do
- Design and iterate on system prompts, tool descriptions, and few-shot patterns across our 4 sub-agents
- Build validators (structural + semantic) that catch agent mistakes before they reach users
- Implement self-evaluation loops where agents check their own outputs against ground truth
- Design tool interfaces — what tools to expose, when to call them, how to recover from failures
- Run evals end-to-end: build the dataset, score outputs, ship improvements with confidence
What you'll bring
- 2+ years of practical LLM application work (not research; shipped systems serving real users)
- Strong intuition for prompt design and agent debugging — you've spent hours staring at trace logs
- Comfort with eval design: synthetic datasets, regression suites, scoring rubrics
- Strong Python fluency; able to ship infrastructure-adjacent code (FastAPI, async, MongoDB)
- A bias for measurement — 'feels better' doesn't count; show the eval delta
Nice to have
- Experience with Google ADK or similar agent frameworks (LangChain, CrewAI, etc.)
- Background in structured generation (Pydantic models as output schemas, function calling)
- Past work on agents that take real actions in the world (not just chat)
Submit your interest
We'll reach out when we open this role.
Other roles: Frontend Engineer·Backend & Infrastructure Engineer
About ClaimIt
ClaimIt is building practical agents for tedious consumer workflows — starting with post-purchase price drops. We're early, technically opinionated, and focused on shipping useful systems rather than benchmarks.