{"id":78797,"date":"2026-03-31T11:22:36","date_gmt":"2026-03-31T05:52:36","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=78797"},"modified":"2026-04-06T10:55:11","modified_gmt":"2026-04-06T05:25:11","slug":"naming-conventions-of-llm-models","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/naming-conventions-of-llm-models\/","title":{"rendered":"Naming Conventions of LLM Models"},"content":{"rendered":"<p><strong>Introduction<\/strong><br \/>\nWhen we see any LLM model names like <strong>GPT-4o<\/strong>, <strong>Claude 3 Sonnet<\/strong>, or <strong>LLaMA-2-7B-chat<\/strong> we wonder why companies give such weird names to their models. But let me tell you, these names have lots of meanings inside it. They provide lots of information about that model.<\/p>\n<ul>\n<li><strong>Common Patterns:<\/strong><br \/>\n<strong>Suffix\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Meaning<\/strong><br \/>\nTurbo \u2014&gt; Optimised for speed + cost<br \/>\nMini \u2014&gt; Smaller + cheaper<br \/>\nPro \u2014&gt; High capability<br \/>\nFlash \u2014&gt; Ultra-fast<br \/>\nInstruct \u2014&gt; Fine-tuned to follow instructions<br \/>\nChat \u2014&gt; Optimised for conversations<br \/>\nrlhf \u2192 trained with human feedback<\/li>\n<li><strong>Size Hierarchy:<\/strong><br \/>\nxxl &gt; xl &gt; large &gt; base &gt; small<\/li>\n<li><strong>Size Indicators:<\/strong><br \/>\n7B, 13B, 70B \u2192 parameters<\/li>\n<li><strong>Versioning:<\/strong><br \/>\nv0.1, v1, v2 \u2192 iteration of fine-tuning<\/li>\n<\/ul>\n<p>There are mainly two types of LLMs, lets understand their naming convention one by one.<\/p>\n<ol>\n<li>\n<h2>Paid Models<\/h2>\n<p>Paid models are mainly business or customer oriented. So, their naming convention mainly focused on Simplicity, branding and positioning.<\/p>\n<p>General Pattern in paid models:<br \/>\n[Model Family] + [Version] + [Variant \/ Capability Tier]<\/p>\n<p><em>Lets see some examples:<\/em><\/p>\n<p><em><strong>Example 1:<\/strong><\/em> GPT-4o<br \/>\nBreakdown:<br \/>\nGPT \u2192 Model family<br \/>\n4 \u2192 Generation (improvement over GPT-3.5)<br \/>\no (omni) \u2192 Multimodal capability<\/p>\n<p><em><strong>Meaning:<\/strong><\/em><br \/>\nA 4th-gen model capable of handling text, image, audio, etc.<\/p>\n<p><em><strong>Example 2:<\/strong><\/em> Gemini 1.5 Pro<br \/>\nBreakdown:<br \/>\nGemini \u2192 Model family<br \/>\n1.5 \u2192 Incremental upgrade<br \/>\nPro \u2192 High capability<\/p>\n<p>Other variants:<br \/>\nFlash \u2192 Faster, cheaper<br \/>\nUltra \u2192 Most powerful<\/p>\n<p>Paid models naming designed for easy understanding of non-technical users, marketing tiers and product differentiation.<\/li>\n<li>\n<h2>Open-Source Model<\/h2>\n<p>Open source model naming is more technical and architecture oriented.<\/p>\n<p>General Pattern in open source models:<br \/>\n[organization]\/[model-family]-[version]-[size]-[variant]-[format]<\/p>\n<p><em><strong>Example 1<\/strong><\/em> : meta-llama\/Llama-2-7b-chat-hf<br \/>\nBreakdown:<br \/>\nmeta-llama \u2192 Organization<br \/>\nLlama-2 \u2192 Model family + version<br \/>\n7b \u2192 7 billion parameters<br \/>\nchat \u2192 Fine-tuned for conversation<br \/>\nhf \u2192 Hugging Face format<\/p>\n<p><strong><em>Meaning:<\/em><\/strong><br \/>\nA 7B parameter chat-optimized LLaMA v2 model<\/p>\n<p><em><strong>Example 2:<\/strong><\/em> mistralai\/Mistral-7B-Instruct-v0.1<br \/>\nBreakdown:<br \/>\nMistral-7B \u2192 Base model<br \/>\nInstruct \u2192 Instruction-following<br \/>\nv0.1 \u2192 Version of fine-tuning<\/p>\n<p><em><strong>Meaning:<\/strong><\/em><br \/>\nInstruction-tuned version of Mistral 7B (early release)<\/li>\n<\/ol>\n<p><strong>Final Thoughts:<\/strong><br \/>\n&#8211; Paid models are designed like products<br \/>\n&#8211; Open-source models are designed like engineering artifacts<\/p>\n<p>Understanding this difference can help us to select a better model for our specific requirements.<\/p>\n<p>To read more such technical blogs, <strong>please follow us on social media<\/strong>. Thanks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction When we see any LLM model names like GPT-4o, Claude 3 Sonnet, or LLaMA-2-7B-chat we wonder why companies give such weird names to their models. But let me tell you, these names have lots of meanings inside it. They provide lots of information about that model. Common Patterns: Suffix\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 [&hellip;]<\/p>\n","protected":false},"author":1211,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":3},"categories":[5871],"tags":[4782,8500,4934,5733,5918,5475,6890,6925,5919],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/78797"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1211"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=78797"}],"version-history":[{"count":6,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/78797\/revisions"}],"predecessor-version":[{"id":78803,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/78797\/revisions\/78803"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=78797"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=78797"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=78797"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}