Top Open Source LLMs — AI Transparency Hub

Open Weights vs. Open Source

Not all "open" AI models are equal. Open-weights models release the trained model weights, allowing anyone to download and run them — but may have restrictions on commercial use or modification.

Truly open-source models release weights, training code, training data, and evaluation infrastructure under permissive licences (e.g. Apache 2.0 or MIT) with no restrictions. Very few frontier models meet this bar — Allen AI's OLMo 2 is one of the clearest examples.

Why Hugging Face?

Hugging Face is the de facto home of open model distribution. It hosts model cards (structured documentation covering intended use, training data, evaluation, limitations, and licence) for virtually every major open-weights model.

Model cards on Hugging Face follow the standard format proposed by Mitchell et al. (2019) and are the primary transparency artefact for open-weights AI models — playing the same role as system cards do for closed models.

Top 12 Open Source LLMs

Ranked by community adoption, Hugging Face downloads, and benchmark performance. Licence key:

MIT Apache 2.0 Llama Community Licence Gemma Terms of Use Custom / Other

Meta AI · #1

Llama Community Licence

Llama 4 (Scout & Maverick)

Sizes: 17B (Scout), 17B MoE (Maverick) Architecture: Mixture-of-Experts Context: 10M tokens (Scout)

Meta's fourth-generation open-weights model family, introducing a Mixture-of-Experts architecture and native multimodal capability (text + image). Llama 4 Scout offers a 10 million token context window — the largest of any open-weights model at release. Maverick is optimised for instruction-following and reasoning. Both are available on Hugging Face under the Llama 4 Community Licence.

Model Card (Scout) ↗ Model Card (Maverick) ↗

Meta AI · #2

Llama Community Licence

Llama 3.3 70B Instruct

Size: 70B parameters Context: 128K tokens Languages: 8 (incl. German, French, Spanish, Hindi)

The most-downloaded open-weights model on Hugging Face. Llama 3.3 70B delivers performance comparable to much larger models, making it the default choice for most open-source deployments. Quantised versions (GGUF, AWQ) run efficiently on consumer hardware. The Llama 3 family collectively has hundreds of millions of downloads.

Model Card ↗ Responsible Use Guide ↗

DeepSeek AI · #3

MIT

DeepSeek-R1

Sizes: 1.5B, 7B, 8B, 14B, 32B, 70B, 671B Architecture: MoE (671B full model)

DeepSeek-R1 is a reasoning-specialised model trained using reinforcement learning on chain-of-thought data, achieving performance matching OpenAI o1 on maths, science, and coding benchmarks — at a fraction of the training cost. Released under the permissive MIT licence, making it one of the most commercially usable frontier-capable open models. Distilled versions run on consumer hardware.

Model Card ↗ 70B Distil Card ↗

Alibaba Cloud (Qwen Team) · #4

Apache 2.0

Qwen3

Sizes: 0.6B – 235B (MoE) Thinking mode: Switchable Languages: 119

Alibaba's Qwen3 series significantly outperforms earlier generations and competes with leading frontier models on coding and reasoning tasks. A unique "thinking mode" toggle allows the model to switch between fast inference and extended chain-of-thought reasoning. Available under Apache 2.0 — one of the most permissive licences for a model at this performance level. The 235B Mixture-of-Experts variant tops many open-source leaderboards.

Model Card (235B MoE) ↗ Model Card (72B) ↗

Mistral AI · #5

Apache 2.0

Mixtral 8x22B & Mistral 7B

8x22B active params: ~39B of 141B Context: 64K tokens

Mistral AI pioneered efficient open-weights models, releasing Mistral 7B in 2023 — still one of the most-downloaded base models on Hugging Face. Mixtral 8x22B uses a Sparse Mixture-of-Experts architecture, activating only ~39B parameters per forward pass while achieving quality comparable to much larger dense models. Both are fully Apache 2.0 licensed with no usage restrictions.

Mixtral 8x22B Card ↗ Mistral 7B Card ↗

Microsoft Research · #6

MIT

Microsoft Phi-4

Sizes: 3.8B (mini), 14B Focus: Reasoning & STEM

Microsoft's Phi series proves that carefully curated training data can produce models that punch well above their weight. Phi-4 (14B) surpasses much larger models on reasoning and STEM tasks, while Phi-4-mini (3.8B) runs efficiently on edge devices and mobile hardware. Both are released under the MIT licence. Phi models include a detailed model card covering training data composition, safety evaluations, and intended use.

Phi-4 Model Card ↗ Phi-4-mini Card ↗

Google DeepMind · #7

Gemma Terms of Use

Gemma 3

Sizes: 1B, 4B, 12B, 27B Context: 128K tokens Multimodal: Yes (4B+)

Google's Gemma 3 family spans 1B to 27B parameters and introduces native multimodal capability (image understanding) to the Gemma line. The 27B model leads most open-source leaderboards for its size class. Gemma is released under Google's Gemma Terms of Use — broadly permissive for commercial use but not fully Apache/MIT. Detailed model cards cover safety evaluations including red-teaming results.

Gemma 3 27B Card ↗ Gemma 3 12B Card ↗

Technology Innovation Institute (TII) · #8

Falcon Licence

Falcon 180B

Size: 180B parameters Training: 3.5 trillion tokens Origin: UAE / Abu Dhabi

Developed by the Technology Innovation Institute in Abu Dhabi, Falcon 180B was the largest openly available model when released and achieved state-of-the-art open-source benchmarks at the time. Trained on 3.5 trillion tokens of the RefinedWeb dataset, it remains one of the most thoroughly documented open-weights models. The Falcon Licence permits commercial use for most organisations (restrictions apply above a threshold of monthly active users).

Model Card ↗ Chat Variant ↗

IBM Research · #9

Apache 2.0

IBM Granite 3.1

Sizes: 2B, 8B Focus: Enterprise, code, RAG Context: 128K tokens

IBM Granite models are purpose-built for enterprise use cases — financial analysis, legal document processing, code generation, and Retrieval-Augmented Generation (RAG). A standout feature is IBM's commitment to transparency: Granite model cards document training data sources at the dataset level, making them among the most transparent models available. This positions them well for EU AI Act technical documentation requirements. Apache 2.0 licence with no commercial restrictions.

Granite 3.1 8B Card ↗ All Granite Models ↗

Mistral AI · #11

Apache 2.0

Mistral NeMo 12B

Size: 12B Context: 128k tokens Architecture: Dense transformer

Developed jointly by Mistral AI and NVIDIA, Mistral NeMo is a 12B dense model designed as a practical, production-ready alternative to much larger models. It offers a 128k token context window, strong multilingual performance across 12+ languages, and state-of-the-art results for its size class on coding and instruction-following benchmarks. Released under Apache 2.0 with no usage restrictions. Widely used as a local inference model and as a fine-tuning base for enterprise deployments. One of the most-downloaded models on Hugging Face in the 10–20B parameter range.

Model Card ↗ Announcement ↗

Cohere · #12

CC-BY-NC

Command R+

Size: 104B Context: 128k tokens Architecture: Dense transformer

Cohere's Command R+ is a 104B model purpose-built for enterprise RAG (Retrieval-Augmented Generation), multi-step tool use, and long-document reasoning. It supports 10 languages and is optimised for grounded generation — producing answers that reliably cite source documents, a key requirement in regulated industries such as legal, finance, and healthcare. Command R+ achieves performance comparable to GPT-4-class models on RAG benchmarks. Available on Hugging Face under a CC-BY-NC licence (free for research and non-commercial use; Cohere's cloud API is used for production deployments). Also covered in the closed model cards section.

Model Card ↗ Cohere Docs ↗

Allen Institute for AI (AI2) · #10

Apache 2.0

OLMo 2

Sizes: 7B, 13B, 32B Open: Weights + data + training code

OLMo 2 (Open Language Model) from the Allen Institute for AI is arguably the most genuinely open frontier-class model available. Unlike most "open" models, OLMo 2 releases model weights, training code, training data (Dolmino Mix), evaluation infrastructure, and training logs — all under Apache 2.0. This makes it uniquely valuable for AI safety research, transparency auditing, and organisations that require complete supply chain visibility. OLMo 2 32B matches Llama 3.1 70B on key benchmarks.

OLMo 2 13B Card ↗ OLMo 2 32B Card ↗

Comparison at a Glance

Key facts for the top 12 open-weights LLMs.

Model	Provider	Largest size	Architecture	Licence	Commercial use	Training data disclosed
Llama 4 Scout/Maverick	Meta AI	17B (MoE)	Mixture-of-Experts	Llama Community	✓ (with restrictions)	Partial
Llama 3.3 70B	Meta AI	70B	Dense transformer	Llama Community	✓ (with restrictions)	Partial
DeepSeek-R1	DeepSeek AI	671B (MoE)	Mixture-of-Experts	MIT	✓ Unrestricted	Partial
Qwen3	Alibaba Cloud	235B (MoE)	Mixture-of-Experts	Apache 2.0	✓ Unrestricted	Partial
Mixtral 8x22B	Mistral AI	141B (active 39B)	Sparse MoE	Apache 2.0	✓ Unrestricted	Minimal
Phi-4	Microsoft	14B	Dense transformer	MIT	✓ Unrestricted	Detailed
Gemma 3	Google DeepMind	27B	Dense transformer	Gemma ToU	✓ (Gemma ToU)	Partial
Falcon 180B	TII	180B	Dense transformer	Falcon Licence	✓ (MAU threshold)	Dataset-level
IBM Granite 3.1	IBM Research	8B	Dense transformer	Apache 2.0	✓ Unrestricted	Full dataset list
OLMo 2	Allen AI (AI2)	32B	Dense transformer	Apache 2.0	✓ Unrestricted	Full (data + code)
Mistral NeMo 12B	Mistral AI / NVIDIA	12B	Dense transformer	Apache 2.0	✓ Unrestricted	Minimal
Command R+	Cohere	104B	Dense transformer	CC-BY-NC	Research / non-commercial only	Minimal

EU AI Act & open-source models

The EU AI Act includes an open-source exception (Article 53(2)) that exempts open-weights GPAI model providers from the documentation and copyright transparency obligations — but not from the systemic risk obligations (Article 55) if the 10²⁵ FLOP training threshold is met. DeepSeek-R1 671B and Llama 3.1 405B/Llama 4 likely cross this threshold. See the EU AI Act page for more detail.

Model cards & transparency

The depth of model cards varies significantly across open-source models. IBM Granite and OLMo 2 set the highest standard — documenting training datasets at source level. Mistral AI provides the least training data transparency. When selecting an open model for a regulated use case, card quality is as important as benchmark performance.

← Closed Model Cards EU AI Act →