Skip to main content
Every authorized user gets 200 requests per 60 seconds. The limit is tracked per user per app, so each user who authorizes your app has their own bucket and one user’s traffic never eats into another’s.
The default limit was raised from 100 to 200 requests per 60 seconds. A few high-volume endpoints have their own limits — see Per-endpoint limits.

How the limit works

The limit is a token bucket: you start with 200 tokens, each request spends one, and the bucket refills to full every 60 seconds. With OAuth 2.0 (Bearer tokens) the bucket is keyed per authorizing user, so if several apps use the same user’s token, they share that user’s bucket.

How the bucket is keyed

The counter key is built from your OAuth client, the authorizing user, and — on creator-scoped endpoints — the creator:
clientId : userUuid [ : creatorUserUuid ]
This has two practical consequences:
  • Creator-scoped endpoints get a separate bucket per creator. Any endpoint under /creators/{creatorUserUuid}/* keys on the creator UUID as well, so each creator you act on behalf of has its own independent 200/60s budget. Your total throughput therefore scales with the number of creators you operate.
  • Everything else shares one bucket per principal. Endpoints that are not creator-scoped — including the aggregate /agencies/* endpoints and your own /chats/*, /posts/*, etc. — all key on just clientId:userUuid, so they draw from a single shared 200/60s bucket.
If you run an agency, prefer the per-creator /creators/{creatorUserUuid}/* routes for high-volume, creator-specific work. They spread load across many buckets instead of contending for the single aggregate one. See Capacity planning.

Per-token, not per-IP

The counter is global per identity. The count is the same no matter which of your servers makes the request, so distributing traffic across multiple servers, processes, or IP addresses does not multiply your budget — they all decrement the same clientId:userUuid counter. The API also applies separate protections against abusive traffic patterns by source IP. These are distinct from the rate limit documented here and won’t affect well-behaved clients that respect the headers below.

Headers on every response

Each response tells you where you stand:
HeaderMeaning
X-RateLimit-LimitRequests allowed in the current window for this endpoint.
X-RateLimit-RemainingRequests left in the current window.
X-RateLimit-ResetUnix timestamp (seconds) when the bucket refills.
Retry-AfterOn a 429 only: seconds to wait before retrying.

When you hit the limit

Exceed the limit and you get 429 Too Many Requests:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1672531200
Retry-After: 45

{ "error": "Too many requests" }
Wait the number of seconds in Retry-After, then retry.

Per-endpoint limits

Most endpoints use the default 200/60s. A few have their own limits:
EndpointLimit
POST /chats/statuses (batch online status)80 / 60s
GET /creators/{creatorUserUuid}/subscribers/online80 / 60s
Batch online-status (creator-scoped equivalent)80 / 60s
Always read X-RateLimit-Limit on the response rather than assuming a value — it reflects the actual limit for the endpoint you called.

Handle it in code

Respect Retry-After and back off. A minimal retry wrapper:
import time, requests

def call(url, token):
    while True:
        r = requests.get(url, headers={
            "Authorization": f"Bearer {token}",
            "X-Fanvue-API-Version": "2025-06-26",
        })
        if r.status_code != 429:
            return r
        time.sleep(int(r.headers.get("Retry-After", 1)))
Stay ahead of the limit: read X-RateLimit-Remaining and slow down before you hit zero, rather than waiting for a 429. For the full set of API conventions, see API Conventions.

Capacity planning

The most common cause of throttling is re-polling chats for new messages. Polling every chat on a fixed interval scales with creators × chats × poll frequency and burns your budget on requests that almost always return nothing new. It is also unnecessary: Fanvue offers two cheaper paths.
  1. Stop polling — use webhooks. Register a message.received (and message.read) destination and let Fanvue push events to you in real time. Zero steady-state request cost.
  2. When you must pull, pull deltas, not the world. Use the bulk POST /chats/messages/batch endpoint with a sinceMessageUuid cursor to fetch only what changed across up to 50 chats in a single request.
Both are covered end to end, with a request-budget worked example, in Efficient chat sync. When estimating headroom for a growing creator count, remember the keying model: creator-scoped routes each get their own 200/60s bucket, so per-creator work scales naturally, while aggregate /agencies/* calls all share one. Reserve the shared bucket for genuinely cross-creator queries.