Rate limits - Neural Draft

The v1 API enforces a per-project sliding window. Defaults are generous for typical builds and scale up automatically on paid plans; if you have a known heavy workload, contact us before launch.

Defaults

Plan	Requests / minute	Burst	Notes
Free	30	60	Tightest. Plenty for development.
Hobby	60	120	The “default project” budget.
Build	240	480	Sized for production sites.
Scale	1,000	2,000	Sized for high-traffic backends.
Enterprise	custom	custom	Negotiated per workload.

The window is sliding 60 seconds, not a calendar minute. The burst column is the maximum number of requests we’ll let you make at once before the window kicks in (it refills steadily as time passes). For most callers the limits are invisible; for heavy callers the headers below tell you exactly where you stand.

Read-only GETs and writes count equally toward the limit. Async POST /jobs (blog generation, image gen, etc.) count once at submit; the resulting job runs server-side and doesn’t add to your request budget.

Response headers

Every response — including 429 errors — carries the limit headers.

Header	Meaning
`X-RateLimit-Limit`	Window ceiling for this project.
`X-RateLimit-Remaining`	Requests left in the current sliding window.
`X-RateLimit-Reset`	Unix seconds when the window will fully refill.
`Retry-After`	Seconds to wait. Only present on `429` responses.

Example:

HTTP/2 200 OK
content-type: application/json
x-ratelimit-limit: 60
x-ratelimit-remaining: 47
x-ratelimit-reset: 1745007660
x-request-id: req_2Nh4PqRsTuVw

Hitting the ceiling

When the window is exhausted, the API returns 429 Too Many Requests with code: "rate_limited" and a precise Retry-After:

{
  "type": "https://api.neuraldraft.io/errors/rate_limited",
  "title": "Too many requests",
  "status": 429,
  "code": "rate_limited",
  "detail": "Rate limit exceeded for project prj_2NgcaXxFqLPo. Retry after 14 seconds.",
  "instance": "req_2Nh4PqRsTuVw"
}

retry-after: 14
x-ratelimit-limit: 60
x-ratelimit-remaining: 0
x-ratelimit-reset: 1745007674

Backoff strategy

The recipe is the same one we recommend for errors: exponential backoff with full jitter, capped, and always honour Retry-After.

async function callWithLimitAware<T>(
  fn: () => Promise<Response>,
  parse: (r: Response) => Promise<T>
): Promise<T> {
  for (let attempt = 0; attempt < 5; attempt++) {
    const res = await fn();
    if (res.status !== 429) return parse(res);

    const retryAfter = Number(res.headers.get("retry-after") ?? "1");
    const jitter = Math.random() * 250;
    await new Promise((r) => setTimeout(r, retryAfter * 1000 + jitter));
  }
  throw new Error("Rate-limited 5x in a row — giving up.");
}

When to use bulk endpoints

If you’re hitting the limit doing many single-key reads, switch to the bulk endpoints — same rate-limit cost (one request) for a much larger payload:

Avoid	Prefer
`GET /v1/content/hero.headline?lang=en` `GET /v1/content/hero.subhead?lang=en` `GET /v1/content/cta.label?lang=en`	`GET /v1/content/bulk?keys=hero.headline,hero.subhead,cta.label&lang=en`
`PUT /v1/content/hero.headline` (×N)	`PUT /v1/content` with a `{ updates: [...] }` body
`GET /v1/blog-posts/1` `GET /v1/blog-posts/2`	`GET /v1/blog-posts?ids=1,2,3,4`

A static-site builder that fetches every key on next build should issue exactly one bulk content call per locale.

Asking for higher limits

Higher limits are part of the Build, Scale and Enterprise plans, and we’ll also raise a per-project ceiling for one-off events (a launch, a Black Friday) without changing your tier. Email info@neuraldraft.io with:

Your project id (visible in the dashboard or GET /v1/projects/me).
Expected peak RPS and total daily volume.
The window of high traffic, if known.

We’ll bump the limit and confirm within one business day.

Abuse, fairness, and quiet limits

In addition to the per-project window, we enforce a few global guardrails to keep the platform fair for everyone:

Per-IP soft cap for unauthenticated public endpoints (/v1/public/*) — the limit is generous and adapts to traffic; a single abusive IP can’t degrade your project. If you proxy public traffic behind your own backend, set the X-Forwarded-For header — we’ll honour it for attribution.
AI cost ceiling per project per minute — separate from the request ceiling, this prevents a runaway script from spending your monthly credits in an hour. Hitting it returns 429 with code: "rate_limited" and a longer Retry-After. If you need to do legitimately heavy AI work, talk to us in advance.
Webhook delivery rate is capped per endpoint at 100 deliveries/sec to avoid stampeding your receiver.

What `429` is not

Not an account-wide “you’re paying too much” signal — that’s insufficient_credits.
Not a permanent block — keys auto-recover the moment the window resets.
Not a sign your key is compromised — but your audit log is the right place to look if 429s happen unexpectedly. If a key you don’t control is being used, revoke it immediately.

​Defaults

​Response headers

​Hitting the ceiling

​Backoff strategy

​When to use bulk endpoints

​Asking for higher limits

​Abuse, fairness, and quiet limits

​What 429 is not

Defaults

Response headers

Hitting the ceiling

Backoff strategy

When to use bulk endpoints

Asking for higher limits

Abuse, fairness, and quiet limits

What `429` is not