Cybersecurity Benchmark Engineer

Pilotcrew AI

📍 Mumbai, Maharashtra, India

Full-time Computer Occupations Posted June 06, 2026

Job Description

Cybersecurity Benchmark Engineer
Location: Remote
Company: Pilotcrew AI
Type: Contract (monthly basis)
Experience: 3+ yrs

About Pilotcrew AI
Pilotcrew AI builds infrastructure for AI Agent Evaluation. We benchmark large language models, run automated agent evaluations, power human-in-the-loop assessments, and host AI arenas for competitive testing. Our mission is to make AI agents measurable, reliable, and production-ready through structured, scalable evaluation systems.

Role Overview
We are building a large-scale benchmark for evaluating the cybersecurity capabilities of frontier AI LLMs. To grow this benchmark, we need hands-on security engineers who can craft real-world vulnerability tasks that are genuinely difficult for state-of-the-art LLMs and agentic systems.
Your core output: carefully designed benchmark instances

real software vulnerabilities paired with well-formed task specifications and validated evaluation oracles that expos...