API Status Page

Get instant visibility into service health and any ongoing incidents for various AI Model APIs in one place.

Click any card with a in the status indicator to visit the provider's status page.

Qwen 2.5 32B

OpenRouter

Operational

The other really good open source model from China. Alibaba's Qwen is very similar to Llama. Good on its own, but strongest when distilled by other data sets or models.

Qwen 3 32B

Groq

Operational

Qwen 3 32B is a dense 32.8B parameter causal language model from the Qwen 3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. Hosted on Groq for speed.

Qwen 3 235B (Thinking)

OpenRouter

Operational

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open source variant in the Qwen 3 series, surpassing many closed models in structured reasoning use cases.

Qwen 3 235B

OpenRouter

Operational

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

Qwen 3 Coder

OpenRouter

Operational

Qwen 3-Coder is a cutting-edge model from Alibaba that rivals o3, Deepseek, and even Kimi K2. It's the current SOTA in agentic tool use and coding, and excels at complex reasoning and programming tasks, including code generation, reasoning, and tool use. Qwen 3-Coder is designed to be a versatile tool for developers, researchers, and students, offering a powerful combination of reasoning and code generation capabilities.

Claude 3.7 Sonnet

Anthropic

Operational

The last gen model from Anthropic. Better at code, math, and more. Also kind of slow and expensive.

Claude 3.7 Sonnet (Reasoning)

Anthropic

Operational

The last gen model from Anthropic (but you can make it think). Better at code, math, and more. Also kind of slow and expensive.

Claude 4 Sonnet

Anthropic

Operational

The previous generation model from Anthropic. Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Claude 4 Sonnet (Reasoning)

Anthropic

Operational

The previous generation model from Anthropic (but you can make it think). Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Claude 4.5 Sonnet

Anthropic

Operational

The latest model from Anthropic. Claude Sonnet 4.5 is Anthropic's most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking.Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.

Claude 4.5 Sonnet (Reasoning)

Anthropic

Operational

The latest model from Anthropic (but you can make it think). Claude Sonnet 4.5 is Anthropic's most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.

Claude 4.1 Opus

Anthropic

Operational

The latest and greatest from Anthropic. Very powerful, but with a cost to match.

Claude 4.5 Haiku

Anthropic

Operational

Claude Haiku 4.5 is Anthropic's fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4's performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world's best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.

Claude 4.5 Haiku (Reasoning)

Anthropic

Operational

Claude Haiku 4.5 is Anthropic's fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4's performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world's best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.

DeepSeek v3 (0324)

OpenRouter

Operational

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

DeepSeek v3.1

OpenRouter

Operational

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active). It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

DeepSeek v3.1 (Thinking)

OpenRouter

Operational

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active). It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks. This version has reasoning capabilities enabled, allowing it to provide responses with greater accuracy and nuanced context handling.

DeepSeek v3.1 Terminus

OpenRouter

Operational

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

DeepSeek v3.1 Terminus (Thinking)

OpenRouter

Operational

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

DeepSeek v3.2 Exp

OpenRouter

Operational

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

DeepSeek v3.2 Exp (Thinking)

OpenRouter

Operational

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

DeepSeek R1 (Original)

OpenRouter

Operational

The open source reasoning model that shook the whole industry. Very smart. Shows all of its thinking. Not the fastest.

DeepSeek R1 (0528)

OpenRouter

Operational

The open source reasoning model that shook the whole industry. Very smart. Shows all of its thinking. Not the fastest. New weights released on 5/28/2025.

DeepSeek R1 (Qwen Distilled)

OpenRouter

Operational

Similar to the Llama distilled model, but distilled on Qwen 32B instead. Slightly better at code, slightly more likely to fall into thought loops.

Gemini 2.0 Flash

Google

Operational

Google's flagship model, known for speed and accuracy (and also web search!). Not quite as smart as Claude 3.5 Sonnet, but WAY faster and cheaper. Also has an insanely large context window (it can handle a lot of data).

Gemini 2.5 Flash

Google

Operational

Google's state of the art fast model, known for speed and accuracy (and also web search!). Not quite as smart as Claude Sonnet, but WAY faster and cheaper. Also has an insanely large context window (it can handle a lot of data).

Gemini 2.5 Flash (Thinking)

Google

Operational

Google's state of the art fast model, known for speed and accuracy, now with support for "thinking". These "thinking" capabilities enable it to provide responses with greater accuracy and nuanced context handling.

Gemini 2.5 Flash Lite

Google

Operational

Gemini 2.5 Flash-Lite is a member of the Gemini 2.5 series of models, a suite of highly-capable, natively multimodal models. Gemini 2.5 Flash-Lite is Google’s most cost-efficient model, striking a balance between efficiency and quality.

Gemini 2.5 Flash Lite (Thinking)

Google

Operational

Gemini 2.5 Flash-Lite is a member of the Gemini 2.5 series of models, a suite of highly-capable, natively multimodal models. Gemini 2.5 Flash-Lite is Google’s most cost-efficient model, striking a balance between efficiency and quality. This version has "thinking" capabilities that enable it to provide responses with greater accuracy and nuanced context handling.

Nano Banana

Google

Operational

Gemini 2.5 Flash Image Preview is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

Gemini 2.0 Flash Lite

Google

Operational

Similar to 2.0 Flash, but even faster. Not as smart, but still good at most things.

Gemini 2.5 Pro

Google

Operational

Google's most advanced model, excelling at complex reasoning and problem-solving. Particularly strong at tackling difficult code challenges, mathematical proofs, and STEM problems. With its massive context window, it can deeply analyze large codebases, datasets and technical documents to provide comprehensive solutions.

Gemini Imagen 4

Google

Operational

Google's Imagen 4 is a powerful image generation model that creates high-quality, photorealistic images from text prompts. Built on advanced diffusion techniques and trained on diverse datasets. 2 images per prompt.

Gemini Imagen 4 Ultra

Google

Operational

Google's Imagen 4 Ultra is a powerful image generation model that creates high-quality, photorealistic images from text prompts. Built on advanced diffusion techniques and trained on diverse datasets. 1 image per prompt.

Gemini 3 Pro

Google

Operational

Google's latest flagship model excels at advanced reasoning and problem-solving. It's especially strong with complex code challenges, mathematical proofs, and STEM topics. Thanks to its vast context window, it can deeply analyze large codebases, datasets, and technical documents to deliver thorough, high-quality solutions.

Nano Banana Pro

Google

Operational

Nano Banana Pro is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

Llama 3.3 70B

Groq

Operational

Industry-leading speed in an open source model. Not the smartest, but unbelievably fast.

Llama 4 Scout

Groq

Operational

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of up to 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Llama 4 Maverick

OpenRouter

Operational

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

MiniMax M2

OpenRouter

Operational

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning, tool use, and multi-step task execution while maintaining low latency and deployment efficiency. The model excels in code generation, multi-file editing, compile-run-fix loops, and test-validated repair, showing strong results on SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench. It also performs competitively in agentic evaluations such as BrowseComp and GAIA, effectively handling long-horizon planning, retrieval, and recovery from execution errors.

Kimi K2 (0711)

OpenRouter

Operational

Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

Kimi K2 (0905)

OpenRouter

Operational

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. Kimi K2 is optimized for agentic capabilities, including advanced tool use and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

Kimi K2 (Thinking)

OpenRouter

Operational

Kimi K2 Thinking is an update to Kimi K2 that is trained for reasoning and agentic tasks. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It rivals GPT-5 and Claude 4.5 Sonnet across several benchmarks including HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench.

Kimi K2 Turbo (Thinking)

OpenRouter

Operational

Kimi K2 Thinking is an update to Kimi K2 that is trained for reasoning and agentic tasks. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It rivals GPT-5 and Claude 4.5 Sonnet across several benchmarks including HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench.

GPT OSS 20B

OpenRouter

Operational

A medium-sized open-weight model from OpenAI suitable for general-purpose tasks. gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

GPT OSS 120B

OpenRouter

Operational

A large open-weight model from OpenAI. gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

GPT-4o-mini

OpenAI

Operational

Like gpt-4o, but faster. This model sacrifices some of the original GPT-4o's precision for significantly reduced latency. It accepts both text and image inputs.

GPT-4o

OpenAI

Operational

OpenAI's flagship non-reasoning model. Works with text and images. Relatively smart. Good at most things.

GPT-4.1

OpenAI

Operational

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks.

GPT-4.1 Mini

OpenAI

Operational

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency. It has a very large context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider's polyglot diff benchmark) and vision understanding.

GPT-4.1 Nano

OpenAI

Operational

For tasks that demand low latency, GPT‑4.1 nano is the fastest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It's ideal for tasks like classification or autocompletion.

GPT-5

OpenAI

Operational

OpenAI's latest flagship model. PhD-level intelligence at most things. This is the version that is the best to chat with.

GPT-5 (Reasoning)

OpenAI

Operational

OpenAI's latest flagship model. PhD-level intelligence at most things. This version has reasoning capabilities, and is not well suitied to general chat, but is great for complex reasoning tasks.

GPT-5 mini

OpenAI

Operational

A lighter-weight GPT-5 variant optimized for speed while retaining strong reasoning and tool use.

GPT-5 nano

OpenAI

Operational

An ultra-fast GPT-5 variant tuned for low-latency tasks with reasoning and tool use.

GPT-5.1 (Instant)

OpenAI

Operational

GPT-5.1 Instant delivers a major step forward in conversational fluency, recall, and following user instructions, with noticeably lower latency compared to prior models. Ideal for real-time chat and rapid exchanges, it demonstrates greatly improved contextual understanding, memory, and following complex instructions. GPT-5.1 Instant also benefits from upgraded image understanding, web browsing, and code/model reasoning abilities.

GPT-5.1 (Reasoning)

OpenAI

Operational

GPT-5.1 Thinking is the flagship GPT-5.1, excelling in deep recall, tool use, and complex multi-step reasoning—especially over longer conversations or documents. It leverages the expanded context window, improved factual accuracy, and new instruction-following capabilities of the 5.1 release. This model shines on tasks that demand in-depth problem solving, retrieval, code, and multi-modal reasoning.

o3-mini

OpenAI

Operational

A small, fast, super smart reasoning model. OpenAI clearly didn't want DeepSeek to be getting all the attention. Good at science, math, and coding, even if it's not as good at CSS.

o4-mini

OpenAI

Operational

A small, fast, even smarter reasoning model. o3-mini was great, this is even better. Good at science, math, and coding, even if it's not as good at CSS.

o3

OpenAI

Operational

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.

o3 Pro

OpenAI

Operational

The o3 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.

GPT ImageGen

OpenAI

Operational

OpenAI's latest and greatest image generation model, using lots of crazy tech like custom tools for text and reflections. Best image gen available today by a mile.

GLM 4.5

OpenRouter

Operational

GLM-4.5 is an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2.

GLM 4.5 (Thinking)

OpenRouter

Operational

GLM-4.5 is an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2. This variant has reasoning mode enabled for step-by-step thinking.

GLM 4.5V

OpenRouter

Operational

GLM-4.5V is an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2.

GLM 4.5V (Thinking)

OpenRouter

Operational

GLM-4.5V is an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2. This variant has reasoning mode enabled for step-by-step thinking.

GLM 4.5 Air

OpenRouter

Operational

GLM-4.5-Air is the lightweight variant of GLM-4.5, an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2.

GLM 4.5 Air (Thinking)

OpenRouter

Operational

GLM-4.5 Air is the lightweight variant of GLM-4.5, an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2. This variant has reasoning mode enabled for step-by-step thinking.

GLM 4.6

OpenRouter

Operational

Compared with GLM-4.5, this generation brings several key improvements - Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

GLM 4.6 (Thinking)

OpenRouter

Operational

Compared with GLM-4.5, this generation brings several key improvements - Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

Grok 4

OpenRouter

Operational

xAI's flagship model that breaks records on lots of benchmarks (allegedly). Possesses deep domain knowledge in finance, healthcare, law, and science.

Grok 3

OpenRouter

Operational

xAI's flagship model that excels at data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.

Grok 3 Mini

OpenRouter

Operational

A lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge.

Grok 4 Fast

OpenRouter

Operational

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.

Grok 4 Fast (Reasoning)

OpenRouter

Operational

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.

Grok 4.1 Fast

OpenRouter

Operational

Grok 4.1 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.

Grok 4.1 Fast (Reasoning)

OpenRouter

Operational

Grok 4.1 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.

Powered by T3 Chat