Models & Pricing
Available models in Cursor and their pricing
Cursor is offering a wide range of models, including the latest state-of-the-art models.
Pricing
All model usage is counted and billed in requests. With Pro plan, you get 500 requests per month. Cursor offers two modes of usage:
Normal
Requests per model/message
Ideal for everyday coding tasks, recommended for most users.
Max
Requests per 1M tokens (MTok)
Best for complex reasoning, hard bugs, and agentic tasks.
Request
A request represents a single message sent to the model, which includes your message, any relevant context from your codebase, and the model’s response.
One request is $0.04
Slow requests
Slow requests automatically activate when you run out of normal requests. These requests are processed at a lower priority, meaning they are slower and you may experience longer delays compared to normal requests.
Normal mode
In normal mode, each message costs a fixed number of requests based solely on the model you’re using, regardless of context. We optimize context management without it affecting your request count.
For example, let’s look at a conversation using Claude 3.5 Sonnet, where each message costs 1 request:
Role | Message | Cost per message |
---|---|---|
User | Create a plan for this change (using a more expensive model) | 1 |
Cursor | I’ll analyze the requirements and create a detailed implementation plan… | 0 |
User | Implement the changes with TypeScript and add error handling | 1 |
Cursor | Here’s the implementation with type safety and error handling… | 0 |
Total | 2 requests |
Max Mode
In Max mode, pricing is calculated based on tokens, with Cursor charging the model provider’s API price plus a 20% margin. This includes all tokens from your messages, code files, folders, tool calls, and any other context provided to the model.
We use the same tokenizers as the model providers (e.g. OpenAI’s tokenizer for GPT models, Anthropic’s for Claude models) to ensure accurate token counting. You can see an example using OpenAI’s tokenizer demo.
Here’s an example of how pricing works in Max mode:
Role | Message | Tokens | Note | Cost per message |
---|---|---|---|---|
User | Create a plan for this change (using a more expensive model) | 135k | No cached input tokens | 2.7 requests |
Cursor | I’ll analyze the requirements and create a detailed implementation plan… | 82k | 1.23 requests | |
User | Implement the changes with TypeScript and add error handling | 135k | Most of input tokens are cached | 2.7 requests |
Cursor | Here’s the implementation with type safety and error handling… | 82k | 1.23 requests | |
Total | 434k | 7.86 requests |
Models
Model List
Claude 3.7 Sonnet
Normal Mode
- Provider: Anthropic
- Link: Claude 3.7 Sonnet
- Context Window: 120k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Trait: Powerful but eager to make changes
- Cost: 1 requests/message
Variants
- Thinking:
- Cost: 2 requests/message
- Notes: More requests due to token intensive
Max Mode
- Provider: Anthropic
- Link: Claude 3.7 Sonnet
- Context Window: 200k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Trait: Powerful but eager to make changes
- Input Cost: 3 requests/MTok
- Cached Input Cost: 0.3 requests/MTok
- Output Cost: 15 requests/MTok
Claude 3.5 Sonnet
Normal Mode
- Provider: Anthropic
- Link: Claude 3.5 Sonnet
- Context Window: 75k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Trait: Great all rounder for most tasks
- Cost: 1 requests/message
Max Mode
- Provider: Anthropic
- Link: Claude 3.5 Sonnet
- Context Window: 200k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Trait: Great all rounder for most tasks
- Input Cost: 3 requests/MTok
- Cached Input Cost: 0.3 requests/MTok
- Output Cost: 15 requests/MTok
Gemini 2.5 Pro
Normal Mode
- Provider: Google
- Link: Gemini 2.5 Pro
- Context Window: 120k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Trait: Careful and precise
- Cost: 1 requests/message
- Notes: Variable pricing depending on token count. 112.5 requests / MTok per hour
Max Mode
- Provider: Google
- Link: Gemini 2.5 Pro
- Context Window: 1M
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Trait: Careful and precise
- Input Cost: 1.25 requests/MTok
- Cached Input Cost: 0.31 requests/MTok
- Output Cost: 10 requests/MTok
Variants
- Long Context (>200k):
- Input Cost: 2.5 requests/MTok
- Cached Input Cost: 0.625 requests/MTok
- Output Cost: 15 requests/MTok
Gemini 2.5 Flash
Normal Mode
- Provider: Google
- Link: Gemini 2.5 Flash
- Context Window: 128k
- Capabilities: Agent (can use tools)
- Cost: 0 requests/message
GPT-4o
Normal Mode
- Provider: OpenAI
- Link: GPT-4o
- Context Window: 60k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Cost: 1 requests/message
Max Mode
- Provider: OpenAI
- Link: GPT-4o
- Context Window: 128k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Input Cost: 2.5 requests/MTok
- Cached Input Cost: 1.25 requests/MTok
- Output Cost: 10 requests/MTok
GPT 4.1
Normal Mode
- Provider: OpenAI
- Link: GPT 4.1
- Context Window: 128k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Cost: 1 requests/message
Max Mode
- Provider: OpenAI
- Link: GPT 4.1
- Context Window: 1M
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Input Cost: 2 requests/MTok
- Cached Input Cost: 0.5 requests/MTok
- Output Cost: 8 requests/MTok
o3
Normal Mode
- Provider: OpenAI
- Link: o3
- Context Window: 128k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Cost: 7.5 requests/message
- Notes: High reasoning effort
Max Mode
- Provider: OpenAI
- Link: o3
- Context Window: 200k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Input Cost: 10 requests/MTok
- Cached Input Cost: 2.5 requests/MTok
- Output Cost: 40 requests/MTok
o4-mini
Normal Mode
- Provider: OpenAI
- Link: o4-mini
- Context Window: 128k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Cost: 1 requests/message
- Notes: High reasoning effort
Max Mode
- Provider: OpenAI
- Link: o4-mini
- Context Window: 200k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Input Cost: 1.1 requests/MTok
- Cached Input Cost: 0.275 requests/MTok
- Output Cost: 4.4 requests/MTok
Grok 3 Beta
Normal Mode
- Provider: xAI
- Link: Grok 3 Beta
- Context Window: 60k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Cost: 1 requests/message
Max Mode
- Provider: xAI
- Link: Grok 3 Beta
- Context Window: 132k
- Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
- Input Cost: 3 requests/MTok
- Output Cost: 15 requests/MTok
Grok 3 Mini Beta
Normal Mode
- Provider: xAI
- Link: Grok 3 Mini Beta
- Context Window: 60k
- Capabilities: Agent (can use tools)
- Cost: 0 requests/message
Max Mode
- Provider: xAI
- Link: Grok 3 Mini Beta
- Context Window: 132k
- Capabilities: Agent (can use tools)
- Input Cost: 0.3 requests/MTok
- Cached Input Cost: 0.3 requests/MTok
- Output Cost: 1 requests/MTok
Auto-select
Enabling Auto-select configures Cursor to select the premium model best fit for the immediate task and with the highest reliability based on current demand. This feature can detect degraded output performance and automatically switch models to resolve it.
Capabilities
Thinking
Enabling Thinking limits the list of models to reasoning models which think through problems step-by-step and have deeper capacity to examine their own reasoning and correct errors.
These models often perform better on complex reasoning tasks, though they may require more time to generate their responses.
Agentic
Agentic models can be used with Chat’s Agent mode. These models are highly capable at making tool calls and perform best with Agent.
Submitting an Agent prompt with up to 25 tool calls consumes one request. If your request extends beyond 25 tool calls, Cursor will ask if you’d like to continue which will consume a second request.
Max Mode
Some models support Max Mode, which is designed for the most complex and challenging tasks. Learn more about Max Mode.
Context windows
A context window is the maximum span of tokens (text and code) an LLM can consider at once, including both the input prompt and output generated by the model.
Each chat in Cursor maintains its own context window. The more prompts, attached files, and responses included in a session, the larger the context window grows.
Cursor actively optimizes the context window as the chat session progresses, intelligently pruning non-essential content while preserving critical code and conversation elements.
For best results, it’s recommended you take a purpose-based approach to chat management, starting a new session for each unique task.
Hosting
Models are hosted on US-based infrastructure by the model’s provider, a trusted partner or Cursor.
When Privacy Mode is enabled from Settings, neither Cursor nor the model providers will store your data, with all data deleted after each request is processed. For further details see our Privacy, Privacy Policy, and Security pages.
FAQ
What is a request?
A request is the message you send to the model.
What is a token?
A token is the smallest unit of text that can be processed by a model.