Cursor is offering a wide range of models, including the latest state-of-the-art models.

Pricing

All model usage is counted and billed in requests. With Pro plan, you get 500 requests per month. Cursor offers two modes of usage:

Normal

Requests per model/message

Ideal for everyday coding tasks, recommended for most users.

Max

Requests per 1M tokens (MTok)

Best for complex reasoning, hard bugs, and agentic tasks.

Request

A request represents a single message sent to the model, which includes your message, any relevant context from your codebase, and the model’s response.

One request is $0.04

Slow requests

Slow requests automatically activate when you run out of normal requests. These requests are processed at a lower priority, meaning they are slower and you may experience longer delays compared to normal requests.

Slow requests are not available for Max mode.

Normal mode

In normal mode, each message costs a fixed number of requests based solely on the model you’re using, regardless of context. We optimize context management without it affecting your request count.

For example, let’s look at a conversation using Claude 3.5 Sonnet, where each message costs 1 request:

RoleMessageCost per message
UserCreate a plan for this change (using a more expensive model)1
CursorI’ll analyze the requirements and create a detailed implementation plan…0
UserImplement the changes with TypeScript and add error handling1
CursorHere’s the implementation with type safety and error handling…0
Total2 requests

Max Mode

In Max mode, pricing is calculated based on tokens, with Cursor charging the model provider’s API price plus a 20% margin. This includes all tokens from your messages, code files, folders, tool calls, and any other context provided to the model.

We use the same tokenizers as the model providers (e.g. OpenAI’s tokenizer for GPT models, Anthropic’s for Claude models) to ensure accurate token counting. You can see an example using OpenAI’s tokenizer demo.

Here’s an example of how pricing works in Max mode:

RoleMessageTokensNoteCost per message
UserCreate a plan for this change (using a more expensive model)135kNo cached input tokens2.7 requests
CursorI’ll analyze the requirements and create a detailed implementation plan…82k1.23 requests
UserImplement the changes with TypeScript and add error handling135kMost of input tokens are cached2.7 requests
CursorHere’s the implementation with type safety and error handling…82k1.23 requests
Total434k7.86 requests

Models

Model List

Claude 3.7 Sonnet

Normal Mode

  • Provider: Anthropic
  • Link: Claude 3.7 Sonnet
  • Context Window: 120k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Trait: Powerful but eager to make changes
  • Cost: 1 requests/message

Variants

  • Thinking:
    • Cost: 2 requests/message
    • Notes: More requests due to token intensive

Max Mode

  • Provider: Anthropic
  • Link: Claude 3.7 Sonnet
  • Context Window: 200k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Trait: Powerful but eager to make changes
  • Input Cost: 3 requests/MTok
  • Cached Input Cost: 0.3 requests/MTok
  • Output Cost: 15 requests/MTok

Claude 3.5 Sonnet

Normal Mode

  • Provider: Anthropic
  • Link: Claude 3.5 Sonnet
  • Context Window: 75k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Trait: Great all rounder for most tasks
  • Cost: 1 requests/message

Max Mode

  • Provider: Anthropic
  • Link: Claude 3.5 Sonnet
  • Context Window: 200k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Trait: Great all rounder for most tasks
  • Input Cost: 3 requests/MTok
  • Cached Input Cost: 0.3 requests/MTok
  • Output Cost: 15 requests/MTok

Gemini 2.5 Pro

Normal Mode

  • Provider: Google
  • Link: Gemini 2.5 Pro
  • Context Window: 120k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Trait: Careful and precise
  • Cost: 1 requests/message
  • Notes: Variable pricing depending on token count. 112.5 requests / MTok per hour

Max Mode

  • Provider: Google
  • Link: Gemini 2.5 Pro
  • Context Window: 1M
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Trait: Careful and precise
  • Input Cost: 1.25 requests/MTok
  • Cached Input Cost: 0.31 requests/MTok
  • Output Cost: 10 requests/MTok

Variants

  • Long Context (>200k):
    • Input Cost: 2.5 requests/MTok
    • Cached Input Cost: 0.625 requests/MTok
    • Output Cost: 15 requests/MTok

Gemini 2.5 Flash

Normal Mode

  • Provider: Google
  • Link: Gemini 2.5 Flash
  • Context Window: 128k
  • Capabilities: Agent (can use tools)
  • Cost: 0 requests/message

GPT-4o

Normal Mode

  • Provider: OpenAI
  • Link: GPT-4o
  • Context Window: 60k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Cost: 1 requests/message

Max Mode

  • Provider: OpenAI
  • Link: GPT-4o
  • Context Window: 128k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Input Cost: 2.5 requests/MTok
  • Cached Input Cost: 1.25 requests/MTok
  • Output Cost: 10 requests/MTok

GPT 4.1

Normal Mode

  • Provider: OpenAI
  • Link: GPT 4.1
  • Context Window: 128k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Cost: 1 requests/message

Max Mode

  • Provider: OpenAI
  • Link: GPT 4.1
  • Context Window: 1M
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Input Cost: 2 requests/MTok
  • Cached Input Cost: 0.5 requests/MTok
  • Output Cost: 8 requests/MTok

o3

Normal Mode

  • Provider: OpenAI
  • Link: o3
  • Context Window: 128k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Cost: 7.5 requests/message
  • Notes: High reasoning effort

Max Mode

  • Provider: OpenAI
  • Link: o3
  • Context Window: 200k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Input Cost: 10 requests/MTok
  • Cached Input Cost: 2.5 requests/MTok
  • Output Cost: 40 requests/MTok

o4-mini

Normal Mode

  • Provider: OpenAI
  • Link: o4-mini
  • Context Window: 128k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Cost: 1 requests/message
  • Notes: High reasoning effort

Max Mode

  • Provider: OpenAI
  • Link: o4-mini
  • Context Window: 200k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Input Cost: 1.1 requests/MTok
  • Cached Input Cost: 0.275 requests/MTok
  • Output Cost: 4.4 requests/MTok

Grok 3 Beta

Normal Mode

  • Provider: xAI
  • Link: Grok 3 Beta
  • Context Window: 60k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Cost: 1 requests/message

Max Mode

  • Provider: xAI
  • Link: Grok 3 Beta
  • Context Window: 132k
  • Capabilities: Agent (can use tools), Thinking (uses reasoning tokens)
  • Input Cost: 3 requests/MTok
  • Output Cost: 15 requests/MTok

Grok 3 Mini Beta

Normal Mode

  • Provider: xAI
  • Link: Grok 3 Mini Beta
  • Context Window: 60k
  • Capabilities: Agent (can use tools)
  • Cost: 0 requests/message

Max Mode

  • Provider: xAI
  • Link: Grok 3 Mini Beta
  • Context Window: 132k
  • Capabilities: Agent (can use tools)
  • Input Cost: 0.3 requests/MTok
  • Cached Input Cost: 0.3 requests/MTok
  • Output Cost: 1 requests/MTok

Auto-select

Enabling Auto-select configures Cursor to select the premium model best fit for the immediate task and with the highest reliability based on current demand. This feature can detect degraded output performance and automatically switch models to resolve it.

Recommended for most users

Capabilities

Thinking

Enabling Thinking limits the list of models to reasoning models which think through problems step-by-step and have deeper capacity to examine their own reasoning and correct errors.

These models often perform better on complex reasoning tasks, though they may require more time to generate their responses.

Agentic

Agentic models can be used with Chat’s Agent mode. These models are highly capable at making tool calls and perform best with Agent.

Submitting an Agent prompt with up to 25 tool calls consumes one request. If your request extends beyond 25 tool calls, Cursor will ask if you’d like to continue which will consume a second request.

Max Mode

Some models support Max Mode, which is designed for the most complex and challenging tasks. Learn more about Max Mode.

Context windows

A context window is the maximum span of tokens (text and code) an LLM can consider at once, including both the input prompt and output generated by the model.

Each chat in Cursor maintains its own context window. The more prompts, attached files, and responses included in a session, the larger the context window grows.

Cursor actively optimizes the context window as the chat session progresses, intelligently pruning non-essential content while preserving critical code and conversation elements.

For best results, it’s recommended you take a purpose-based approach to chat management, starting a new session for each unique task.

Hosting

Models are hosted on US-based infrastructure by the model’s provider, a trusted partner or Cursor.

When Privacy Mode is enabled from Settings, neither Cursor nor the model providers will store your data, with all data deleted after each request is processed. For further details see our Privacy, Privacy Policy, and Security pages.

FAQ

What is a request?

A request is the message you send to the model.

What is a token?

A token is the smallest unit of text that can be processed by a model.