Model Leaderboard

Compare AI models by capability and cost-effectiveness

Popular Comparisons

Programming & Development

59/271 models

LiveCodeBench: Real-world coding tasks

Use Cases: Code completion, debugging, code review, script generation

Logical Reasoning

61/271 models

HLE: Complex reasoning and problem-solving

Use Cases: Complex decision-making, multi-step analysis, logical reasoning

Knowledge Q&A

63/271 models

MMLU Pro: Broad knowledge assessment

Use Cases: Expert Q&A, fact-checking, educational tutoring

Scientific Research

67/271 models

GPQA: Graduate-level science questions

Use Cases: Academic research, scientific writing, experiment design

Mathematical Computation

51/271 models

AIME: Competition-level math problems

Use Cases: Financial analysis, data computation, statistical reasoning

AI Agent

46/271 models

Tau2: Autonomous task completion

Use Cases: Automated workflows, multi-tool invocation, complex task decomposition

SciCode

58/271 models

SciCode: Scientific coding challenges

Use Cases: Scientific computing, research code, data analysis scripts

Terminal

47/271 models

Terminal-Bench: Command-line operations

Use Cases: Shell scripting, system administration, DevOps automation

Instruction

46/271 models

IFEval: Instruction following accuracy

Use Cases: Precise task execution, format compliance, constraint adherence

Disclaimer: Rankings are for reference only and do not represent precise test results or constitute any purchase or usage advice. We do not guarantee the accuracy, completeness, or timeliness of the data.

Data Sources: Rankings are based on official technical reports and public evaluations from model providers.

Model Benchmarks | OhMyGPT