Rate Limits
Cursor meters rate limits based on underlying compute usage, and limits reset every few hours.
How do rate limits work?
By default all individual plans are subject to rate limits on Agent. There are two types of limits: burst rate limits and local rate limits. Burst rate limits can be dipped into at any time for particularly bursty sessions but are slow to refill. Local rate limits refill fully every few hours.
Rate limits are based on the total compute you use during a session. This can vary based on the model you select, the length of your messages, including the length of files you attach and length of current conversation.
Can I use Max Mode as part of my rate limits?
Yes, Max Mode usage is included as part of the rate limits for paid plans, and is available for no extra cost when under those rate limits.
How much compute do I get?
Burst limits will always be greater than the cost of one’s plan! If you’ve bought Pro, you’ll always get to use over $20 of model inference at API prices before seeing any rate limits.
Users get unlimited agent requests when using the ‘auto’ model selector, which routes to a frontier model that has capacity at that time.
How do you offer this much compute?
To launch Pro and Ultra, we’ve worked closely with the model providers to offer significantly more compute than list prices.
What if I hit a limit?
If a user uses up both their local and burst limits, they’ll be notified explicitly and shown a message that gives them three options:
- Switch to models which have higher rate limits (e.g. Sonnet is higher than Opus)
- Switch on usage-based pricing to pay for requests that exceed the rate limits
- Upgrade to a higher tier: either Pro+ ($60/month for 3x the rate limits of Pro), or Ultra ($200/month for 20x the rate limits of Pro)
Requests are never downgraded in quality or speed when hitting rate limits. Explicit error messages are always shown when users hit rate limits.
If I flip on usage-based pricing for requests that exceed rate limits, how is cost per request calculated?
Usage based pricing is calculated based on the underlying compute used for the request and is charged based on model API prices.
What if I’d like just a lumpsum of requests?
You’re free to stick with legacy Pro Plan if you’d like! You can control this setting at Dashboard > Settings > Advanced. The new Pro should be preferable for most users.
What if I’m on a Team Plan?
Users on Team Plans use a flat fee per request system instead of a rate limited compute based system. You can find more details under Understanding Request-based Usage (Team and Legacy Pro Plans).