Engineering · Remote (US) · Full-time (future)

Agent Logic Engineer

Design the prompts, tools, and self-evaluation loops that make our agents reliably useful.

About the role

ClaimIt runs on a 4-agent system that monitors purchases, drafts claim materials, and prepares them for user approval. The hard work isn't training models — it's prompt design, tool selection, output validation, and recovery from agent mistakes. You'll own that craft. We use Gemini (via Google's Agent Development Kit) with tool routing, output validators, and self-evaluation loops. You'll keep the agents honest, fast, and trustworthy.

What you'll do

  • Design and iterate on system prompts, tool descriptions, and few-shot patterns across our 4 sub-agents
  • Build validators (structural + semantic) that catch agent mistakes before they reach users
  • Implement self-evaluation loops where agents check their own outputs against ground truth
  • Design tool interfaces — what tools to expose, when to call them, how to recover from failures
  • Run evals end-to-end: build the dataset, score outputs, ship improvements with confidence

What you'll bring

  • 2+ years of practical LLM application work (not research; shipped systems serving real users)
  • Strong intuition for prompt design and agent debugging — you've spent hours staring at trace logs
  • Comfort with eval design: synthetic datasets, regression suites, scoring rubrics
  • Strong Python fluency; able to ship infrastructure-adjacent code (FastAPI, async, MongoDB)
  • A bias for measurement — 'feels better' doesn't count; show the eval delta

Nice to have

  • Experience with Google ADK or similar agent frameworks (LangChain, CrewAI, etc.)
  • Background in structured generation (Pydantic models as output schemas, function calling)
  • Past work on agents that take real actions in the world (not just chat)

Submit your interest

We'll reach out when we open this role.

About ClaimIt

ClaimIt is building practical agents for tedious consumer workflows — starting with post-purchase price drops. We're early, technically opinionated, and focused on shipping useful systems rather than benchmarks.