Perplexity Search as Code: Agents Write Search — 85% Fewer Tokens
Quick summary
Perplexity replaced fixed search API calls with sandboxed Python pipelines agents write themselves — 288K tokens down to 43K on a CVE hunt benchmark.
Read next
- Uber Burned Its 2026 AI Budget in 4 Months — Engineers Feel Cheaper
- Altman: 100B Tokens/Month User — AI Budgets Became Huge Issue in Q1
Perplexity launched Search as Code (SaC) in early June 2026 — an architecture where AI agents write Python search pipelines in a sandbox instead of chaining fixed API tool calls. On a 200 high-severity CVE case study, Perplexity reports 100% accuracy at 42,900 tokens vs 288,700 tokens for its old pipeline — an ~85% reduction — while non-Perplexity systems scored below 25%.
It is live in the Agent API and default in Perplexity Computer.
How Search as Code Works
Traditional agent search:
- Model calls a search tool
- Raw results flood the context window
- Model filters in token space — expensive and error-prone
SaC flips the loop:
- Model generates Python using Perplexity's search SDK primitives (retrieve, filter, dedupe, rerank)
- Code runs in a secure sandbox against Perplexity's search backend
- Only final structured results return to the model context
Perplexity's research article "Rethinking Search as Code Generation" (June 2026) argues function calling and MCP force serial round trips that bloat prompts with intermediate junk.
Benchmark Claims (Company-Reported — Verify in Prod)
| Metric | SaC (Perplexity) | Prior pipeline / rivals |
|---|---|---|
| CVE vendor advisory task accuracy | 100% | <25% (non-Perplexity systems cited) |
| Tokens on same task | 42.9K | 288.7K (~85% drop) |
| DeepSearchQA score | 0.871 | Anthropic managed agents 0.815 (per Perplexity) |
WANDR, a new in-house benchmark, ships in coming weeks per company blog posts.
Our Analysis: FinOps Lesson for Every Agent Builder
This lands the same week GitHub Copilot switched to token billing and Sam Altman said enterprise AI budgets exploded.
1. Filter in code, not in prompts
If your agent reads 500 search snippets into Claude/GPT context, you pay for 500 snippets every retry. SaC's lesson: push dedupe/rank into deterministic Python — same philosophy as SQL before LLM in RAG pipelines.
2. SDK primitives > monolithic tools
Expose composable retrieval functions so the model writes vendor-specific CVE templates once, then fan out parallel queries — Perplexity's exact CVE example.
3. Skepticism budget
All benchmarks are Perplexity-reported until third parties reproduce. Treat 85% as directional — still directionally aligned with Uber token caps pain.
4. Python-only runtime (for now)
Enterprise teams on TypeScript agents need wrappers or wait for SDK ports — factor that into stack choices.
5. GEO + citation play
Perplexity traffic already hits abhs.in via AI referrals. Posts with structured CVE numbers and FAQ blocks are exactly what SaC-style agents hunt — double down on Key Takeaways + definition-first H2s.
Track live costs: LLM API Pricing.
Key Takeaways
- June 2026: Perplexity Search as Code — agents write Python search pipelines in sandbox
- CVE case study: 100% accuracy, 42.9K vs 288.7K tokens (~85% savings) — company-reported
- Live in: Agent API + default Perplexity Computer
- Problem solved: MCP/function-calling bloat and serial tool round trips
- For developers: move filter/rank out of LLM context into code; design composable search SDKs
- What to watch: WANDR benchmark release, independent replication, SDK beyond Python
Sources
FAQ
Frequently Asked Questions
What is Perplexity Search as Code?
Search as Code is a Perplexity architecture announced in June 2026 where AI agents write Python scripts that define custom search workflows executed in a secure sandbox, instead of calling a fixed search API and stuffing raw results into the model context window.
How much does Search as Code reduce token usage?
On a Perplexity case study tracking 200 high-severity CVEs with vendor-specific advisories, the company reported about 42,900 tokens with Search as Code versus 288,700 tokens for its standard pipeline, roughly an 85 percent reduction, alongside 100 percent accuracy on that task.
Where is Perplexity Search as Code available?
Perplexity rolled out Search as Code in the Perplexity Agent API and made it the default architecture in Perplexity Computer as of early June 2026. The SDK runtime is Python-only initially.
Why does Search as Code matter for developer agent costs?
It demonstrates that filtering, deduplication, and ranking in deterministic code instead of LLM context can dramatically cut token bills — a pattern teams should copy as Copilot and frontier APIs shift to usage-based pricing.
How does Search as Code compare to MCP and function calling?
Perplexity argues traditional function calling and MCP force serial tool calls that pollute context with intermediate results. Search as Code lets one model turn compose thousands of retrieval operations in Python before returning a compact final answer.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on AI Models
All posts →Uber Burned Its 2026 AI Budget in 4 Months — Engineers Feel Cheaper
Uber COO Andrew Macdonald says token spend does not yet map to shipped features. After blowing its 2026 Claude Code budget in 4 months, Uber capped tools at $1,500/month.
Altman: 100B Tokens/Month User — AI Budgets Became Huge Issue in Q1
Sam Altman said at OpenAI's June 3 enterprise event that AI budgeting "never came up" in early 2026 — now a top user burns 100B tokens/month as agentic tools replace chat.
Airbnb CEO Chesky Funds AI Lab Beyond Chatbots — UX Over Text
Bloomberg: Brian Chesky is backing an independent AI lab focused on user experience and design, not chatbot interfaces — while staying Airbnb CEO as rivals integrate ChatGPT.
NVIDIA Nemotron 3 Super: 60% SWE-bench, Best Open Model for Code
NVIDIA Nemotron 3 Super hits 60.47% on SWE-bench — highest open-weight score ever. 120B total, 12B active, 1M context, 5x throughput vs GPT-OSS. Already in CodeRabbit and Greptile.
Free Tool
What should your project cost?
Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.
Try the Website Cost Calculator →Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 836+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 164 countries.
