API docs → Best practices

Best practices

Operational advice for shipping a production integration. Read this before you wire the API into customer-facing code.

1. Cache aggressively

Identical input on the same tier within 24h returns a cached result for free — no rate-limit consumption, no inference cost. The cache is keyed by sha256(text + tier), so it works the same for every customer.

Don't fight it. If your workflow re-scores the same document on every edit, the second through Nth call cost you nothing. Bypass cache only with the x-dev-tier header (development only) when you genuinely want fresh inference.

2. Retry the right errors only

Retry 429 and 5xx. Never retry 400/401/402/422 — those are deterministic failures, more attempts will not help.

Recommended strategy: exponential backoff with full jitter, cap at 3 attempts, respect Retry-After for 429s. The official SDKs do this.

// Pseudocode — recommended retry policy.
for (let attempt = 1; attempt <= 3; attempt++) {
  const res = await call();
  if (res.ok) return res;
  if (res.status === 429) {
    const wait = res.headers.get('Retry-After') * 1000 || (500 * 2 ** attempt) * Math.random();
    await sleep(wait);
    continue;
  }
  if (res.status >= 500) {
    await sleep((500 * 2 ** attempt) * Math.random());
    continue;
  }
  throw res; // non-retryable
}

3. Pick the right billing mode

Per-word ($0.0003/word, $0.05 minimum) is cheaper for short documents. Per-detection ($0.50 flat) is cheaper for long documents. The crossover is around 1,666 words.

Every detection response includes a pricing_quote showing both rates so you can see which would have been cheaper. If your traffic skews one way, switch your key's billing mode in the dashboard. Per-word is the default and is almost always correct unless 80% of your inputs are very long.

4. Batch where possible — but not for detection

Detection runs per-document anyway, so there's no advantage to batching multiple calls into one — concurrent requests parallelise on our side. The right pattern is a small pool (5-10 concurrent) of HTTP requests, not a batched payload.

Plagiarism is an exception: it parallelises 7 source layers internally and takes 5-15s per call. Pool concurrency for plagiarism should be lower (2-3) to avoid hitting your tier limit.

5. Use webhooks for fan-out, not for completion

The HTTP response is your source of truth for one-off detection. Webhooks exist so you can push results to other systems (CRM, Slack, data warehouse) without writing extra glue.

Don't depend on the webhook for the synchronous flow — the response body contains the same result. If you're building a real-time UI, return the HTTP response and let the webhook handle async fan-out.

6. Idempotency without an idempotency key

We don't accept an idempotency key on detection because the cache layer gives you the same property for free — identical input on the same tier returns the same result for 24h, with the same document_id.

If you need at-most-once semantics in your queue, dedupe on sha256(text + tier) before calling the API. The API does this internally; you'll save a round-trip.

7. Privacy by default

We don't retain plaintext detection inputs unless the account opts in (PRD §12). If you're processing user content, mention this — it's a compliance differentiator vs every detector that keeps inputs for "model improvement."

Webhook payloads include the score / band / per-passage but NOT the input text. If you need the input round-tripped, store it on your side keyed by document_id and join after delivery.

8. Don't surface raw scores to end users

score: 0.873 is meaningful to a backend, not to a user. Surface the band instead — categorical labels are calibrated for human reading and consistent with the dashboard UI.

If you must show a number, use ai_pct (0-100) and add a qualifier — "AI likelihood: 87% (high confidence)" reads better than the raw float.

9. Surface upgrade CTAs from the API

402 responses include upgrade_url, upgrade_features, and (for detection) an api_quote showing what the same call would have cost on a paid tier. Wire these through to the UI verbatim — they're written to convert.

10. Read the changelog

Major versions are frozen for a minimum of 6 months before deprecation. Additive changes (new fields on responses) ship without notice — make sure your client tolerates unknown fields. Subscribe to the changelog RSS so deprecations don't surprise you.

Want a deeper review?

Business and Enterprise tiers include an integration review with our team. We'll look at your code path and recommend the right caching, batching, and billing setup.

Get in touch