A comprehensive guide covering the 2023 open LLM wave, including MPT, Falcon, Mistral, and other open models. Learn how these models created a competitive ecosystem, accelerated innovation, reduced dependence on proprietary systems, and democratized access to state-of-the-art language model capabilities through architectural innovations and improved training data curation.

This article is part of the free-to-read History of Language AI
Sign in to mark chapters as read and track your learning journey
Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.
2023: Open LLM Wave
The year 2023 marked a transformative period in the landscape of large language models, as a wave of high-quality open-source models emerged to challenge the dominance of proprietary systems. Following Meta's release of LLaMA in February 2023, which demonstrated that efficient, competitive models could be developed and released openly, numerous organizations began developing and releasing their own open language models throughout the year. Models such as MosaicML's MPT (Mosaic Pretrained Transformer), Technology Innovation Institute's Falcon, and Mistral AI's Mistral models created a competitive ecosystem of open alternatives that accelerated innovation, reduced dependence on proprietary APIs, and democratized access to state-of-the-art language model capabilities.
This proliferation of open models represented a fundamental shift in how language AI technology was developed and distributed. Rather than a field dominated by closed, proprietary systems accessible only through controlled APIs, 2023 saw the emergence of a vibrant open-source ecosystem where researchers, developers, and organizations could download, modify, and deploy powerful language models locally. This shift enabled new use cases, particularly in domains requiring data privacy, offline deployment, or cost-effective scaling, while also fostering rapid innovation through collaborative development and transparent evaluation.
The open model wave also reflected broader trends in the AI community toward transparency and accessibility. While earlier proprietary models like GPT-3 and GPT-4 had demonstrated remarkable capabilities, their closed nature limited research, customization, and independent evaluation. The open models of 2023, by contrast, provided full model weights, training procedures, and in many cases training data, enabling researchers to understand, audit, and improve upon these systems. This transparency accelerated research in areas such as model optimization, fine-tuning techniques, and safety alignment, while also enabling independent verification of model capabilities and limitations.
The competitive dynamics of 2023 also highlighted the importance of efficient model architectures and training procedures. Organizations releasing open models sought to achieve competitive performance while minimizing computational requirements, leading to innovations in model architecture, training data curation, and optimization techniques. These developments demonstrated that with careful engineering, smaller models trained more efficiently could achieve performance comparable to larger proprietary systems, challenging assumptions about the relationship between model size and capability.
The open model wave demonstrated that thoughtful architectural choices could enable smaller models to match or exceed the capabilities of larger systems. Innovations like ALiBi, grouped-query attention, and sliding window attention showed that model efficiency was not just about parameter count but about intelligent design. This emphasis on efficiency made powerful language models accessible to organizations with limited computational resources, democratizing access to state-of-the-art capabilities.
The Problem
By early 2023, despite the release of LLaMA, the language AI ecosystem faced significant barriers that limited widespread adoption and innovation. Proprietary models from OpenAI, Google, and Anthropic offered impressive capabilities but came with critical limitations: access was restricted through APIs with usage costs and rate limits, users had no visibility into model internals or training data, and customization options were limited to prompt engineering and fine-tuning through proprietary interfaces. These constraints made it difficult for researchers to conduct independent evaluations, for organizations to deploy models in privacy-sensitive environments, and for developers to build applications that required predictable costs or offline capabilities.
The closed nature of proprietary models also created concerns about dependency and lock-in. Organizations building applications on proprietary APIs faced risks of service disruptions, policy changes, or cost increases that could jeopardize their projects. The inability to inspect or modify model behavior created challenges for applications requiring specific safety guarantees, domain adaptation, or compliance with regulations governing AI deployment. Researchers interested in understanding how these models worked, improving their capabilities, or addressing their limitations found themselves working with black-box systems where the only interface was the API.
Additionally, the proprietary model ecosystem had created an environment where innovation was concentrated among a small number of large technology companies. Academic researchers, startups, and organizations without massive computational resources struggled to contribute meaningfully to language model development. While transfer learning and fine-tuning techniques enabled some customization, the fundamental models themselves remained inaccessible, limiting the scope of research and development that could occur outside major tech companies.
The field also lacked competitive alternatives that matched proprietary model quality while offering open access. BLOOM, released in 2022, had demonstrated the feasibility of open large language models, but its performance lagged behind state-of-the-art proprietary systems, and its 176 billion parameter size made it difficult to deploy and use in practice. LLaMA had shown that efficient architectures could achieve competitive performance, but as a single model from a single organization, it represented only the beginning of what an open ecosystem could offer. The community needed a diverse ecosystem of open models from multiple organizations, each exploring different architectural choices, training approaches, and optimization strategies.
The absence of competitive open alternatives also limited the development of downstream tooling and infrastructure. With most developers focused on proprietary APIs, the ecosystem for model serving, optimization, quantization, and deployment tools was primarily oriented toward API-based workflows. This created a gap for organizations wanting to deploy models locally or customize them extensively, as the tooling ecosystem was less mature than that for proprietary APIs.
The Solution
The open model wave of 2023 addressed these challenges through a coordinated effort by multiple organizations to develop and release high-quality open language models. Each organization brought different technical approaches, architectural innovations, and training strategies, creating a diverse ecosystem that offered alternatives to proprietary systems while enabling new use cases and research directions.
MosaicML's MPT (Mosaic Pretrained Transformer) models, released starting in May 2023, demonstrated that open models could achieve competitive performance through careful architectural design and training efficiency. The MPT models introduced innovations such as ALiBi (Attention with Linear Biases) for better handling of variable context lengths, and they were trained on carefully curated datasets including code, mathematical text, and web content. MPT-7B and MPT-30B provided strong performance while remaining accessible for organizations with modest computational resources, and MosaicML released the models under permissive Apache 2.0 licensing that enabled commercial use and modification.
The Technology Innovation Institute's Falcon models, released in June 2023, took a different approach to achieving competitive performance. Falcon-40B and Falcon-7B were trained on a massive, carefully filtered dataset called RefinedWeb, which contained trillions of tokens of high-quality web text. The Falcon models emphasized the importance of data quality over quantity, demonstrating that with better data curation, smaller models could achieve performance comparable to larger systems trained on less carefully filtered datasets. Falcon models also featured efficient architectures optimized for inference speed and memory usage, making them practical for deployment in resource-constrained environments.
Mistral AI's Mistral models, starting with Mistral 7B in September 2023, represented another significant contribution to the open ecosystem. Mistral 7B demonstrated that careful architectural choices and training techniques could produce models that outperformed much larger proprietary systems on certain benchmarks. The model used innovations such as grouped-query attention and sliding window attention to improve efficiency, while also showing strong performance on reasoning tasks and multilingual capabilities. Mistral AI's approach emphasized both model quality and deployment practicality, with models designed to run efficiently on consumer hardware.
Beyond these major releases, 2023 saw numerous other open models that explored different aspects of language model development. Organizations experimented with various architectures, training data compositions, optimization techniques, and licensing approaches. This diversity created a rich ecosystem where developers and researchers could choose models that best fit their specific needs, whether prioritizing performance, efficiency, licensing flexibility, or specific capabilities like code generation or multilingual support.
The open models of 2023 also fostered the development of complementary tooling and infrastructure. As models became available for local deployment, the ecosystem responded with improved frameworks for model serving, quantization techniques for reducing memory requirements, and fine-tuning libraries optimized for open models. Projects like Hugging Face's Transformers library expanded support for these new models, while communities developed evaluation frameworks, deployment guides, and optimization techniques specific to open model deployment.
The licensing choices of these open models also addressed different use cases and requirements. Some models, like MPT, used permissive Apache 2.0 licenses that enabled unrestricted commercial use. Others, like LLaMA-based models, used licenses that allowed commercial use but required adherence to acceptable use policies. This variety in licensing gave organizations flexibility to choose models that aligned with their legal requirements and business needs.
Architectural Innovations
The open models of 2023 introduced several architectural innovations that improved efficiency and performance. These innovations demonstrated that thoughtful design choices could enable smaller models to achieve capabilities previously associated with much larger systems.
ALiBi (Attention with Linear Biases), used in MPT models, addressed the challenge of handling variable-length sequences by replacing fixed positional embeddings with learned linear biases that scaled with distance. This approach allowed models to generalize to longer sequences than those seen during training, an important capability for applications requiring long context windows. ALiBi's relative positioning approach also proved more efficient than traditional absolute positional encodings while providing better generalization.
Grouped-query attention, used in models like Mistral 7B, improved inference efficiency by reducing the number of key-value cache entries needed during generation. Instead of maintaining separate key-value pairs for each attention head, grouped-query attention shared key-value pairs across multiple query heads, significantly reducing memory requirements during inference while maintaining model quality. This optimization became particularly important as models were deployed in production environments with memory constraints.
Sliding window attention mechanisms allowed models to efficiently process very long sequences by limiting attention to a local window around each token while maintaining some global context through multiple attention layers. This approach enabled models to handle context lengths of 32,000 tokens or more while keeping computational costs manageable, opening new use cases that required processing long documents or maintaining extended conversation history.
The open models also explored different architectural choices for balancing model capacity with computational efficiency. Some models prioritized parameter count and model depth, while others focused on architectural efficiency and optimized training procedures. This diversity in approaches helped the community understand which design choices mattered most for different use cases and performance requirements.
Training Data and Curation
A critical aspect of the open model wave was the emphasis on training data quality and curation. Organizations releasing open models invested significant effort in collecting, filtering, and composing training datasets that would enable strong performance despite smaller model sizes or more limited computational budgets.
The Falcon models' RefinedWeb dataset demonstrated the importance of data quality. Rather than simply scraping the web indiscriminately, the Falcon team implemented sophisticated filtering and deduplication procedures that produced a high-quality dataset from trillions of tokens of raw web text. This careful curation enabled the Falcon models to achieve strong performance despite having fewer parameters than many proprietary systems trained on larger but less carefully filtered datasets.
The success of Falcon models highlighted a crucial insight: smaller models trained on carefully curated, high-quality data could match or exceed the performance of larger models trained on less filtered datasets. This principle would influence subsequent model development, shifting focus from simply scaling dataset size to improving data quality, diversity, and curation procedures.
MPT models emphasized the importance of diverse training data composition. In addition to web text, the MPT training datasets included substantial portions of code, mathematical content, and other specialized text types. This diversity helped the models develop capabilities across a wide range of domains, from general language understanding to code generation and mathematical reasoning.
The open nature of these models also enabled transparency about training data composition. Unlike proprietary models where training data remained confidential, open model developers often released information about their data sources, filtering procedures, and composition strategies. This transparency helped researchers understand the relationship between training data characteristics and model capabilities, informing future data curation efforts.
Data curation techniques developed for open models also advanced the field more broadly. Procedures for deduplication, quality filtering, toxicity detection, and content balancing that were refined for open model training found applications in proprietary model development and in downstream fine-tuning datasets. The open model wave thus contributed not only new models but also improved methodologies for training data preparation.
Applications and Impact
The open models of 2023 enabled a wide range of new applications and use cases that were difficult or impossible with proprietary API-based systems. These applications demonstrated the value of having local, customizable, and transparent language models available for deployment.
Privacy-sensitive applications became feasible with open models. Healthcare organizations, financial institutions, and government agencies that needed to process sensitive data could deploy open models locally without sending data to external APIs. This capability enabled language model applications in domains where data privacy regulations or security requirements had previously made AI deployment impractical.
Cost-sensitive applications also benefited from open models. Organizations processing large volumes of text could deploy open models locally and avoid per-request API costs, making language model capabilities economically viable for high-volume use cases. This cost structure also made it practical to use language models for batch processing, data augmentation, and other applications where API pricing would have been prohibitive.
Customization and fine-tuning became more accessible with open models. Researchers and developers could fine-tune open models on domain-specific datasets, creating specialized models optimized for particular applications. This capability enabled applications in legal document analysis, medical record processing, scientific literature review, and other domains requiring specialized knowledge and terminology.
Offline and edge deployment became possible with optimized open models. Organizations needing language model capabilities in environments without reliable internet connectivity, or with strict network security requirements, could deploy open models locally. Edge deployment also enabled low-latency applications where API round-trip times would have been problematic.
The open models also accelerated research in model optimization, safety alignment, and evaluation. With full model weights available, researchers could conduct detailed analyses of model internals, develop new optimization techniques, and create more thorough evaluation frameworks. This research contributed to understanding how language models work, how to improve them, and how to make them safer and more reliable.
The competitive ecosystem of open models also drove innovation through healthy competition. As organizations released new models, they pushed each other to improve performance, efficiency, and capabilities. This competitive dynamic accelerated the pace of innovation and ensured that the open ecosystem continued to evolve and improve.
Limitations
Despite their significant contributions, the open models of 2023 had important limitations that shaped their adoption and development. Understanding these limitations helps contextualize both the achievements of the open model wave and the challenges that remained.
Performance gaps remained between open models and the largest proprietary systems. While open models achieved competitive performance on many benchmarks, the most capable proprietary systems like GPT-4 continued to outperform open alternatives on complex reasoning tasks, creative generation, and specialized capabilities. These performance gaps limited the applicability of open models for use cases requiring the highest levels of capability.
Computational requirements still posed barriers for many potential users. Even the smaller open models like Mistral 7B required significant computational resources for training and inference. While optimizations like quantization and efficient architectures helped, deploying and using open models effectively still required more technical expertise and infrastructure than using proprietary APIs. This complexity limited adoption among organizations without strong technical teams.
The open model ecosystem also faced challenges with model alignment and safety. While open models enabled independent evaluation and improvement of safety measures, they also made it easier for bad actors to deploy models without safety guardrails. The balance between openness and safety remained an ongoing challenge, with some arguing that overly permissive licensing or insufficient safety measures could enable harmful applications.
The openness that made these models valuable for research and legitimate applications also created risks. Unlike proprietary systems with built-in safety guardrails, open models could be modified or deployed without safety measures. This tension between accessibility and safety would become a central challenge for the open model ecosystem, requiring careful consideration of licensing, model release practices, and community governance.
Licensing and legal considerations created complexity for commercial users. Different open models used different licenses with varying restrictions on commercial use, modification, and redistribution. Navigating these licensing requirements and ensuring compliance could be challenging for organizations wanting to use multiple models or build commercial applications. Some licenses also created uncertainty about acceptable use cases or requirements for derivative works.
The pace of innovation in the open ecosystem, while generally positive, also created challenges. With new models and updates being released frequently, organizations faced decisions about which models to adopt and when to upgrade. The rapid pace made it difficult to establish stable tooling and infrastructure, as the ecosystem evolved quickly.
Data quality and transparency, while improved in open models, still had limitations. Even when developers released information about training data, complete transparency was often impractical due to dataset size or licensing constraints. Understanding the biases, limitations, and failure modes of models remained challenging despite increased transparency compared to proprietary systems.
Legacy and Looking Forward
The open model wave of 2023 established open-source language models as a permanent and essential component of the AI ecosystem. The proliferation of high-quality open models demonstrated that the field could support both proprietary and open approaches, each serving different needs and enabling different use cases. This dual ecosystem model has continued to evolve, with both proprietary and open systems advancing and influencing each other.
The architectural innovations introduced by open models have influenced subsequent model development across both open and proprietary systems. Techniques like ALiBi, grouped-query attention, and sliding window attention have been adopted more broadly, improving efficiency and capabilities across the field. The emphasis on training data quality and curation has also influenced how models are developed, with both open and proprietary efforts investing more in data engineering.
The open model ecosystem has also fostered the development of improved tooling and infrastructure for model deployment, optimization, and fine-tuning. Frameworks for quantization, serving, and deployment have matured, making it increasingly practical to deploy open models in production environments. These developments have benefited not only open model users but also the broader community interested in efficient model deployment.
The competitive dynamics established in 2023 have continued, with organizations regularly releasing new open models that push the boundaries of capability, efficiency, and accessibility. This healthy competition has accelerated innovation while ensuring that high-quality open alternatives remain available. The ecosystem has also seen consolidation around certain models and approaches that have proven particularly effective, while continuing to support diversity and experimentation.
The open model wave has also influenced policy and governance discussions around AI development. The success of open models has demonstrated that transparency and accessibility can be compatible with high performance, influencing debates about AI regulation, safety standards, and access policies. The existence of competitive open alternatives has also influenced discussions about AI concentration and the importance of maintaining diverse ecosystems.
Looking forward, the open model ecosystem continues to evolve with larger models, improved capabilities, and better efficiency. Organizations are exploring new architectures, training techniques, and optimization strategies while maintaining the commitment to openness and transparency that characterized the 2023 wave. The ecosystem is also seeing increased attention to safety alignment, evaluation frameworks, and responsible deployment practices that ensure open models can be used safely and effectively.
The legacy of the 2023 open model wave extends beyond the specific models released that year. It established patterns and practices for open AI development that continue to influence the field. The emphasis on efficiency, transparency, and practical deployment has shaped how subsequent open models are developed and how the open ecosystem evolves. The success of the 2023 wave has also inspired continued investment in open AI research and development, ensuring that open alternatives remain available as language model capabilities continue to advance.
The open model wave of 2023 represents a crucial moment in the democratization of language AI. By providing competitive open alternatives to proprietary systems, the wave enabled new applications, accelerated research, and fostered innovation through healthy competition. The architectural innovations, training methodologies, and ecosystem developments from this period continue to influence language AI development today, establishing open models as an essential component of the modern AI landscape.
Quiz
Ready to test your understanding of the 2023 Open LLM Wave? Challenge yourself with these questions about the proliferation of open-source language models, architectural innovations, and the impact of this transformative period on the language AI ecosystem. Good luck!
Sign in to mark chapters as read and track your learning journey
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, leading AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Constitutional AI: Principle-Based Alignment Through Self-Critique
A comprehensive guide covering Constitutional AI, including principle-based alignment, self-critique training, reinforcement learning from AI feedback (RLAIF), scalability advantages, interpretability benefits, and its impact on AI alignment methodology.

Multimodal Large Language Models - Vision-Language Integration That Transformed AI Capabilities
A comprehensive exploration of multimodal large language models that integrated vision and language capabilities, enabling AI systems to process images and text together. Learn how GPT-4 and other 2023 models combined vision encoders with language models to enable scientific research, education, accessibility, and creative applications.

LLaMA: Meta's Open Foundation Models That Democratized Language AI Research
A comprehensive guide to LLaMA, Meta's efficient open-source language models. Learn how LLaMA democratized access to foundation models, implemented compute-optimal training, and revolutionized the language model research landscape through architectural innovations like RMSNorm, SwiGLU, and RoPE.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.
No spam, unsubscribe anytime.
Create a free account to unlock exclusive features, track your progress, and join the conversation.
Comments