Perplexity Search as Code: Agents Write Search — 85% Fewer Tokens

Abhishek GautamJune 8, 202611 min read

Perplexity Search as Code: Agents Write Search — 85% Fewer Tokens

Quick summary

Perplexity replaced fixed search API calls with sandboxed Python pipelines agents write themselves — 288K tokens down to 43K on a CVE hunt benchmark.

How Search as Code Works

Traditional agent search:

Model calls a search tool
Raw results flood the context window
Model filters in token space — expensive and error-prone

SaC flips the loop:

Model generates Python using Perplexity's search SDK primitives (retrieve, filter, dedupe, rerank)
Code runs in a secure sandbox against Perplexity's search backend
Only final structured results return to the model context

Perplexity's research article "Rethinking Search as Code Generation" (June 2026) argues function calling and MCP force serial round trips that bloat prompts with intermediate junk.

Benchmark Claims (Company-Reported — Verify in Prod)

Metric	SaC (Perplexity)	Prior pipeline / rivals
CVE vendor advisory task accuracy	100%	<25% (non-Perplexity systems cited)
Tokens on same task	42.9K	288.7K (~85% drop)
DeepSearchQA score	0.871	Anthropic managed agents 0.815 (per Perplexity)

WANDR, a new in-house benchmark, ships in coming weeks per company blog posts.

Our Analysis: FinOps Lesson for Every Agent Builder

This lands the same week GitHub Copilot switched to token billing and Sam Altman said enterprise AI budgets exploded.

1. Filter in code, not in prompts

If your agent reads 500 search snippets into Claude/GPT context, you pay for 500 snippets every retry. SaC's lesson: push dedupe/rank into deterministic Python — same philosophy as SQL before LLM in RAG pipelines.

2. SDK primitives > monolithic tools

Expose composable retrieval functions so the model writes vendor-specific CVE templates once, then fan out parallel queries — Perplexity's exact CVE example.

3. Skepticism budget

All benchmarks are Perplexity-reported until third parties reproduce. Treat 85% as directional — still directionally aligned with Uber token caps pain.

4. Python-only runtime (for now)

Enterprise teams on TypeScript agents need wrappers or wait for SDK ports — factor that into stack choices.

5. GEO + citation play

Perplexity traffic already hits abhs.in via AI referrals. Posts with structured CVE numbers and FAQ blocks are exactly what SaC-style agents hunt — double down on Key Takeaways + definition-first H2s.

Track live costs: LLM API Pricing.

Key Takeaways

June 2026: Perplexity Search as Code — agents write Python search pipelines in sandbox
CVE case study: 100% accuracy, 42.9K vs 288.7K tokens (~85% savings) — company-reported
Live in: Agent API + default Perplexity Computer
Problem solved: MCP/function-calling bloat and serial tool round trips
For developers: move filter/rank out of LLM context into code; design composable search SDKs
What to watch: WANDR benchmark release, independent replication, SDK beyond Python

Sources

FAQ

Frequently Asked Questions

What is Perplexity Search as Code?

Search as Code is a Perplexity architecture announced in June 2026 where AI agents write Python scripts that define custom search workflows executed in a secure sandbox, instead of calling a fixed search API and stuffing raw results into the model context window.

How much does Search as Code reduce token usage?

On a Perplexity case study tracking 200 high-severity CVEs with vendor-specific advisories, the company reported about 42,900 tokens with Search as Code versus 288,700 tokens for its standard pipeline, roughly an 85 percent reduction, alongside 100 percent accuracy on that task.

Where is Perplexity Search as Code available?

Perplexity rolled out Search as Code in the Perplexity Agent API and made it the default architecture in Perplexity Computer as of early June 2026. The SDK runtime is Python-only initially.

Why does Search as Code matter for developer agent costs?

It demonstrates that filtering, deduplication, and ranking in deterministic code instead of LLM context can dramatically cut token bills — a pattern teams should copy as Copilot and frontier APIs shift to usage-based pricing.

How does Search as Code compare to MCP and function calling?

Perplexity argues traditional function calling and MCP force serial tool calls that pollute context with intermediate results. Search as Code lets one model turn compose thousands of retrieval operations in Python before returning a compact final answer.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

More on AI Models

All posts →

AI ModelsDeveloper Tools

Uber Burned Its 2026 AI Budget in 4 Months — Engineers Feel Cheaper

Uber COO Andrew Macdonald says token spend does not yet map to shipped features. After blowing its 2026 Claude Code budget in 4 months, Uber capped tools at $1,500/month.

Jun 4, 2026·11 min read

AI ModelsBusiness

Altman: 100B Tokens/Month User — AI Budgets Became Huge Issue in Q1

Sam Altman said at OpenAI's June 3 enterprise event that AI budgeting "never came up" in early 2026 — now a top user burns 100B tokens/month as agentic tools replace chat.

Jun 5, 2026·12 min read

AI ModelsDeveloper Tools

Airbnb CEO Chesky Funds AI Lab Beyond Chatbots — UX Over Text

Bloomberg: Brian Chesky is backing an independent AI lab focused on user experience and design, not chatbot interfaces — while staying Airbnb CEO as rivals integrate ChatGPT.

Jun 6, 2026·10 min read

AI ModelsDeveloper Tools

NVIDIA Nemotron 3 Super: 60% SWE-bench, Best Open Model for Code

NVIDIA Nemotron 3 Super hits 60.47% on SWE-bench — highest open-weight score ever. 120B total, 12B active, 1M context, 5x throughput vs GPT-OSS. Already in CodeRabbit and Greptile.

Mar 12, 2026·7 min read

Free Tool

What should your project cost?

Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.

Try the Website Cost Calculator →

ShareX / Twitter LinkedIn Instagram

Written by

Abhishek Gautam

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 836+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 164 countries.

LinkedIn Instagram GitHub Portfolio Leave a thought →