Open Weights vs. Open Source
Not all "open" AI models are equal. Open-weights models release the trained model weights, allowing anyone to download and run them — but may have restrictions on commercial use or modification.
Truly open-source models release weights, training code, training data, and evaluation infrastructure under permissive licences (e.g. Apache 2.0 or MIT) with no restrictions. Very few frontier models meet this bar — Allen AI's OLMo 2 is one of the clearest examples.
Why Hugging Face?
Hugging Face is the de facto home of open model distribution. It hosts model cards (structured documentation covering intended use, training data, evaluation, limitations, and licence) for virtually every major open-weights model.
Model cards on Hugging Face follow the standard format proposed by Mitchell et al. (2019) and are the primary transparency artefact for open-weights AI models — playing the same role as system cards do for closed models.
Top 12 Open Source LLMs
Ranked by community adoption, Hugging Face downloads, and benchmark performance. Licence key:
Llama 4 (Scout & Maverick)
Meta's fourth-generation open-weights model family, introducing a Mixture-of-Experts architecture and native multimodal capability (text + image). Llama 4 Scout offers a 10 million token context window — the largest of any open-weights model at release. Maverick is optimised for instruction-following and reasoning. Both are available on Hugging Face under the Llama 4 Community Licence.
Llama 3.3 70B Instruct
The most-downloaded open-weights model on Hugging Face. Llama 3.3 70B delivers performance comparable to much larger models, making it the default choice for most open-source deployments. Quantised versions (GGUF, AWQ) run efficiently on consumer hardware. The Llama 3 family collectively has hundreds of millions of downloads.
DeepSeek-R1
DeepSeek-R1 is a reasoning-specialised model trained using reinforcement learning on chain-of-thought data, achieving performance matching OpenAI o1 on maths, science, and coding benchmarks — at a fraction of the training cost. Released under the permissive MIT licence, making it one of the most commercially usable frontier-capable open models. Distilled versions run on consumer hardware.
Qwen3
Alibaba's Qwen3 series significantly outperforms earlier generations and competes with leading frontier models on coding and reasoning tasks. A unique "thinking mode" toggle allows the model to switch between fast inference and extended chain-of-thought reasoning. Available under Apache 2.0 — one of the most permissive licences for a model at this performance level. The 235B Mixture-of-Experts variant tops many open-source leaderboards.
Mixtral 8x22B & Mistral 7B
Mistral AI pioneered efficient open-weights models, releasing Mistral 7B in 2023 — still one of the most-downloaded base models on Hugging Face. Mixtral 8x22B uses a Sparse Mixture-of-Experts architecture, activating only ~39B parameters per forward pass while achieving quality comparable to much larger dense models. Both are fully Apache 2.0 licensed with no usage restrictions.
Microsoft Phi-4
Microsoft's Phi series proves that carefully curated training data can produce models that punch well above their weight. Phi-4 (14B) surpasses much larger models on reasoning and STEM tasks, while Phi-4-mini (3.8B) runs efficiently on edge devices and mobile hardware. Both are released under the MIT licence. Phi models include a detailed model card covering training data composition, safety evaluations, and intended use.
Gemma 3
Google's Gemma 3 family spans 1B to 27B parameters and introduces native multimodal capability (image understanding) to the Gemma line. The 27B model leads most open-source leaderboards for its size class. Gemma is released under Google's Gemma Terms of Use — broadly permissive for commercial use but not fully Apache/MIT. Detailed model cards cover safety evaluations including red-teaming results.
Falcon 180B
Developed by the Technology Innovation Institute in Abu Dhabi, Falcon 180B was the largest openly available model when released and achieved state-of-the-art open-source benchmarks at the time. Trained on 3.5 trillion tokens of the RefinedWeb dataset, it remains one of the most thoroughly documented open-weights models. The Falcon Licence permits commercial use for most organisations (restrictions apply above a threshold of monthly active users).
IBM Granite 3.1
IBM Granite models are purpose-built for enterprise use cases — financial analysis, legal document processing, code generation, and Retrieval-Augmented Generation (RAG). A standout feature is IBM's commitment to transparency: Granite model cards document training data sources at the dataset level, making them among the most transparent models available. This positions them well for EU AI Act technical documentation requirements. Apache 2.0 licence with no commercial restrictions.
Mistral NeMo 12B
Developed jointly by Mistral AI and NVIDIA, Mistral NeMo is a 12B dense model designed as a practical, production-ready alternative to much larger models. It offers a 128k token context window, strong multilingual performance across 12+ languages, and state-of-the-art results for its size class on coding and instruction-following benchmarks. Released under Apache 2.0 with no usage restrictions. Widely used as a local inference model and as a fine-tuning base for enterprise deployments. One of the most-downloaded models on Hugging Face in the 10–20B parameter range.
Command R+
Cohere's Command R+ is a 104B model purpose-built for enterprise RAG (Retrieval-Augmented Generation), multi-step tool use, and long-document reasoning. It supports 10 languages and is optimised for grounded generation — producing answers that reliably cite source documents, a key requirement in regulated industries such as legal, finance, and healthcare. Command R+ achieves performance comparable to GPT-4-class models on RAG benchmarks. Available on Hugging Face under a CC-BY-NC licence (free for research and non-commercial use; Cohere's cloud API is used for production deployments). Also covered in the closed model cards section.
OLMo 2
OLMo 2 (Open Language Model) from the Allen Institute for AI is arguably the most genuinely open frontier-class model available. Unlike most "open" models, OLMo 2 releases model weights, training code, training data (Dolmino Mix), evaluation infrastructure, and training logs — all under Apache 2.0. This makes it uniquely valuable for AI safety research, transparency auditing, and organisations that require complete supply chain visibility. OLMo 2 32B matches Llama 3.1 70B on key benchmarks.
Comparison at a Glance
Key facts for the top 12 open-weights LLMs.
| Model | Provider | Largest size | Architecture | Licence | Commercial use | Training data disclosed |
|---|---|---|---|---|---|---|
| Llama 4 Scout/Maverick | Meta AI | 17B (MoE) | Mixture-of-Experts | Llama Community | ✓ (with restrictions) | Partial |
| Llama 3.3 70B | Meta AI | 70B | Dense transformer | Llama Community | ✓ (with restrictions) | Partial |
| DeepSeek-R1 | DeepSeek AI | 671B (MoE) | Mixture-of-Experts | MIT | ✓ Unrestricted | Partial |
| Qwen3 | Alibaba Cloud | 235B (MoE) | Mixture-of-Experts | Apache 2.0 | ✓ Unrestricted | Partial |
| Mixtral 8x22B | Mistral AI | 141B (active 39B) | Sparse MoE | Apache 2.0 | ✓ Unrestricted | Minimal |
| Phi-4 | Microsoft | 14B | Dense transformer | MIT | ✓ Unrestricted | Detailed |
| Gemma 3 | Google DeepMind | 27B | Dense transformer | Gemma ToU | ✓ (Gemma ToU) | Partial |
| Falcon 180B | TII | 180B | Dense transformer | Falcon Licence | ✓ (MAU threshold) | Dataset-level |
| IBM Granite 3.1 | IBM Research | 8B | Dense transformer | Apache 2.0 | ✓ Unrestricted | Full dataset list |
| OLMo 2 | Allen AI (AI2) | 32B | Dense transformer | Apache 2.0 | ✓ Unrestricted | Full (data + code) |
| Mistral NeMo 12B | Mistral AI / NVIDIA | 12B | Dense transformer | Apache 2.0 | ✓ Unrestricted | Minimal |
| Command R+ | Cohere | 104B | Dense transformer | CC-BY-NC | Research / non-commercial only | Minimal |
EU AI Act & open-source models
The EU AI Act includes an open-source exception (Article 53(2)) that exempts open-weights GPAI model providers from the documentation and copyright transparency obligations — but not from the systemic risk obligations (Article 55) if the 10²⁵ FLOP training threshold is met. DeepSeek-R1 671B and Llama 3.1 405B/Llama 4 likely cross this threshold. See the EU AI Act page for more detail.
Model cards & transparency
The depth of model cards varies significantly across open-source models. IBM Granite and OLMo 2 set the highest standard — documenting training datasets at source level. Mistral AI provides the least training data transparency. When selecting an open model for a regulated use case, card quality is as important as benchmark performance.