The case for mLMs (micro Language Models)

👋 Hi, I’m Melissa and welcome to my biweekly Note, Language Processor. Every other Tuesday, I dig deep into language & behavior, the limits of technologies, and the connection between what people say and do. Once a month, Signal Lab takes over and reports on these topics for startups and early-stage investors.

♥ mLM Enthusiasts

Since 2016, before the introduction of the ground-breaking Transformer architecture that allowed for the development of the LLMs of today, I have been working with small and micro language models. They’re still my favorite, even though that preference made me feel very uncool for a long time.

Then, Harvard Business Review backed me up.

📝 What are mLMs?

What are mLMs? They are considerably smaller than SLMs (Small Language Models, like Google’s Gemma, GPT-4o mini, or Microsoft’s Phi-3). And those SLMs are much smaller than LLMs. SLM parameters are in the single-digit billions rather than trillions. And Micro Language Models (mLMs) are even smaller, as small as millions of parameters.

mLMs are an alternative to the prevailing (LLM) authority, and their development goes directly against the logic of the AI arms race and its obsession with more compute, more parameters, the bigger the better….

There are more than a few reasons mLMs are worth your time:

✅ Advantages

🥼 Specialization

mLMs are trained to specific industries or even particular tasks, on a much smaller amount of data than a LLM. LLMs don’t trim fat, so mLMs can be more reliable in domain-specific and/or narrow tasks, because they include far less noise. They can also be more easily (even quickly!) adapted as needed to new priorities.

🔒 Privacy & Security

In use cases involving financial data, patient records, IP, etc., sensitive and confidential information must be protected. Because mLMs can operate entirely in private cloud computing environments and/or on premises, within local servers and even edge devices (i.e., they don’t send data to the cloud), they provide greater control, safety, and regulatory compliance. mLMs support data protection and mitigate cybersecurity threats.

This is why industries like healthcare, finance, and even defense are increasingly looking to mLMs. A raft of startups like Polygraf are deploying small language models in these verticals, where trust is more important than raw compute. The size and structure of their training data make mLMs easier to monitor and audit.

🪄 Innovation

mLMs are nimble. They are much faster to train and easier to deploy. These facts also make them ideal for experimentation and improvement — without big investment, infrastructure upgrades, or vendor support.

🏃‍♂ Efficiency

LLMs are slow and expensive in terms of compute ($$$ cloud-based inference $$$), both to train and to operate. mLMs, by contrast, are cost-effective and can operate in real-time (latency in milliseconds, not seconds, which sometimes matters quite a lot — think autonomous vehicles or critical services).

SLMs are trained and deployed at 30% of the compute that LLMs require, and mLMs much, much less. This translates to savings not only in cost but also in sustainability. And it doesn’t come at the expense of performance, either, in most cases.¹

❌ Drawbacks?

mLM development is faster but does require higher quality, cleaner training data. Because models can pick up biases from their training data, that training data has more influence in an mLM. This requires care.

And of course, you can’t expect an mLM to do just anything — for complex tasks requiring a wide knowledge base, you would do better to use an LLM. But for specific tasks requiring speed, precision, or security, an mLM can be a better tool.

❓ So you want an mLM?

Knowing the use case should be the starting point for any model selection and development.

Huge LLMs are not always the best choice. Often a mLM can do more with less.

And, they are having a moment because NVIDIA researchers have pointed to SLMs as the future of Agentic AI.

Since 2016 I’ve been building domain-specific, targeted mLMs for private and public sector applications where general-purpose LLMs will not serve.

During that time, I’ve found mLMs better suited to proprietary, domain-specific data. They can offer greater efficiency, privacy, and accessibility. They are faster, more energy-efficient, agile, and compliant than LLMs.

So join the revolution — small is mighty!

See you in two weeks.

Forward to a Friend

1 For instance, GPT-4o mini outperformed GPT-3.5 Turbo in several benchmarks including language understanding, question answering, reasoning, and mathematical reasoning.