API Status Page

Get instant visibility into service health and any ongoing incidents for various AI Model APIs in one place.

Click any card with a in the status indicator to visit the provider's status page.

GPT-4o-mini

Azure

Operational

Like gpt-4o, but faster. This model sacrifices some of the original GPT-4o's precision for significantly reduced latency. It accepts both text and image inputs.

Claude 3.5 Sonnet

Anthropic

Operational

Smart model for complex problems. Known for being good at code and math. Also kind of slow and expensive.

Claude 3.7 Sonnet

Anthropic

Operational

The last gen model from Anthropic. Better at code, math, and more. Also kind of slow and expensive.

Claude 3.7 Sonnet (Reasoning)

Anthropic

Operational

The last gen model from Anthropic (but you can make it think). Better at code, math, and more. Also kind of slow and expensive.

Claude 4 Sonnet

Anthropic

Operational

The latest model from Anthropic. Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Claude 4 Sonnet (Reasoning)

Anthropic

Operational

The latest model from Anthropic (but you can make it think). Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Claude 4 Opus

Anthropic

Operational

The latest and greatest from Anthropic. Very powerful, but with a cost to match.

GPT-4o

Azure

Operational

OpenAI's flagship non-reasoning model. Works with text and images. Relatively smart. Good at most things.

Llama 3.3 70b

Groq

Operational

Industry-leading speed in an open source model. Not the smartest, but unbelievably fast.

DeepSeek v3 (Fireworks)

Fireworks.ai

Operational

DeepSeek's groundbreaking direct prediction model. Laid the groundwork for R1 (their reasoning model). Super underrated, comparable performance to Claude 3.5 Sonnet. Just... slow.

DeepSeek v3 (0324)

OpenRouter

Operational

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

DeepSeek R1 (OpenRouter)

OpenRouter

Operational

The open source reasoning model that shook the whole industry. Very smart. Shows all of its thinking. Not the fastest.

DeepSeek R1 (0528)

OpenRouter

Operational

The open source reasoning model that shook the whole industry. Very smart. Shows all of its thinking. Not the fastest. New weights released on 5/28/2025.

DeepSeek R1 (Llama Distilled)

Groq

Operational

It's like normal R1, but WAY faster and slightly dumber. Basically, DeepSeek stuffed Llama full of R1 knowledge. Since Llama is smaller and faster, the result is a really good compromise.

o3-mini

OpenAI

Operational

A small, fast, super smart reasoning model. OpenAI clearly didn't want DeepSeek to be getting all the attention. Good at science, math, and coding, even if it's not as good at CSS.

o4-mini

OpenAI

Operational

A small, fast, even smarter reasoning model. o3-mini was great, this is even better. Good at science, math, and coding, even if it's not as good at CSS.

GPT OSS 20b

OpenRouter

Operational

A medium-sized open-weight model from OpenAI suitable for general-purpose tasks. gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

GPT OSS 120b

OpenRouter

Operational

A large open-weight model from OpenAI. gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Gemini 2.0 Flash

Google

Operational

Google's flagship model, known for speed and accuracy (and also web search!). Not quite as smart as Claude 3.5 Sonnet, but WAY faster and cheaper. Also has an insanely large context window (it can handle a lot of data).

Gemini 2.5 Flash

Google

Operational

Google's state of the art fast model, known for speed and accuracy (and also web search!). Not quite as smart as Claude Sonnet, but WAY faster and cheaper. Also has an insanely large context window (it can handle a lot of data).

Gemini 2.5 Flash (Thinking)

Google

Operational

Google's state of the art fast model, known for speed and accuracy, now with support for "thinking". These "thinking" capabilities enable it to provide responses with greater accuracy and nuanced context handling.

Gemini 2.5 Flash Lite

Google

Operational

Gemini 2.5 Flash Lite (Thinking)

Google

Operational

Gemini 2.5 Flash-Lite is a member of the Gemini 2.5 series of models, a suite of highly-capable, natively multimodal models. Gemini 2.5 Flash-Lite is Google’s most cost-efficient model, striking a balance between efficiency and quality. This version has "thinking" capabilities that enable it to provide responses with greater accuracy and nuanced context handling.

Gemini 2.0 Flash Lite

Google

Operational

Similar to 2.0 Flash, but even faster. Not as smart, but still good at most things.

Gemini 2.5 Pro

Google

Operational

Google's most advanced model, excelling at complex reasoning and problem-solving. Particularly strong at tackling difficult code challenges, mathematical proofs, and STEM problems. With its massive context window, it can deeply analyze large codebases, datasets and technical documents to provide comprehensive solutions.

DeepSeek R1 (Qwen Distilled)

OpenRouter

Operational

Similar to the Llama distilled model, but distilled on Qwen 32b instead. Slightly better at code, slightly more likely to fall into thought loops.

Llama 4 Scout

Groq

Operational

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of up to 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Llama 4 Maverick

OpenRouter

Operational

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Grok 4

OpenRouter

Operational

xAI's flagship model that breaks records on lots of benchmarks (allegedly). Possesses deep domain knowledge in finance, healthcare, law, and science.

Grok 3

OpenRouter

Operational

xAI's flagship model that excels at data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.

Grok 3 Mini

OpenRouter

Operational

A lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge.

GPT-4.1

Azure

Operational

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks.

GPT-4.1 Mini

Azure

Operational

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency. It has a very large context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider's polyglot diff benchmark) and vision understanding.

GPT-4.1 Nano

Azure

Operational

For tasks that demand low latency, GPT‑4.1 nano is the fastest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It's ideal for tasks like classification or autocompletion.

GPT-5

OpenAI

Operational

OpenAI's latest flagship model. PhD-level intelligence at most things. This is the version that is the best to chat with.

GPT-5 (Reasoning)

OpenAI

Operational

OpenAI's latest flagship model. PhD-level intelligence at most things. This version has reasoning capabilities, and is not well suitied to general chat, but is great for complex reasoning tasks.

GPT-5 mini

OpenAI

Operational

A lighter-weight GPT-5 variant optimized for speed while retaining strong reasoning and tool use.

GPT-5 nano

OpenAI

Operational

An ultra-fast GPT-5 variant tuned for low-latency tasks with reasoning and tool use.

Qwen 2.5 32b

OpenRouter

Operational

The other really good open source model from China. Alibaba's Qwen is very similar to Llama. Good on its own, but strongest when distilled by other data sets or models.

Qwen3 32B

Groq

Operational

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. Hosted on Groq for speed.

Qwen3 235B (Thinking)

OpenRouter

Operational

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.

Qwen3 235B

OpenRouter

Operational

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

Qwen3 Coder

OpenRouter

Operational

Qwen3-Coder is a cutting-edge model from Alibaba that rivals o3, Deepseek, and even Kimi K2. It's the current SOTA in agentic tool use and coding, and excels at complex reasoning and programming tasks, including code generation, reasoning, and tool use. Qwen3-Coder is designed to be a versatile tool for developers, researchers, and students, offering a powerful combination of reasoning and code generation capabilities.

o3

OpenAI

Operational

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.

o3 Pro

OpenAI

Operational

The o3 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.

Kimi K2

OpenRouter

Operational

Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

GLM 4.5

OpenRouter

Operational

GLM-4.5 is an open-weight MoE model that competes with o3 and Claude 4 while being smaller and stronger than DeepSeek-R1 and Kimi K2. It excels at reasoning, coding, and agentic tasks and is trained using the Muon architecture, the same one to train Kimi K2.

GLM 4.5 (Thinking)

OpenRouter

Operational

Gemini Imagen 4

Google

Operational

Google's Imagen 4 is a powerful image generation model that creates high-quality, photorealistic images from text prompts. Built on advanced diffusion techniques and trained on diverse datasets. 2 images per prompt.

Gemini Imagen 4 Ultra

Google

Operational

Google's Imagen 4 Ultra is a powerful image generation model that creates high-quality, photorealistic images from text prompts. Built on advanced diffusion techniques and trained on diverse datasets. 1 image per prompt.

GPT ImageGen

OpenAI

Operational

OpenAI's latest and greatest image generation model, using lots of crazy tech like custom tools for text and reflections. Best image gen available today by a mile.