Sign in Get started

By Josh Fjelstul, Ph.D.

Featured

Not Every NLP Problem Needs a Frontier Model

Frontier LLMs are capable. They're also expensive, slow, and prone to hallucination. For many NLP tasks, using a fine-tuned BERT model will be more accurate, easier to audit, better suited to your domain, and cheaper by orders of magnitude — and it'll keep your data off someone else's servers.

Not Every NLP Problem Needs a Frontier Model

Frontier LLMs are capable. They're also expensive, slow, and prone to hallucination. For many NLP tasks, using a fine-tuned BERT model will be more accurate, easier to audit, better suited to your domain, and cheaper by orders of magnitude — and it'll keep your data off someone else's servers.

All posts

Making a Monolingual Model Bilingual with Domain Adaptation

You have an English BERT model that works well on legal text. But your corpus is bilingual. Here's how domain adaptation on a bilingual corpus can produce a model with strong masked language modeling performance in both languages — and why legal text makes this work better than you might expect.

Making a Monolingual Model Bilingual with Domain Adaptation

You have an English BERT model that works well on legal text. But your corpus is bilingual. Here's how domain adaptation on a bilingual corpus can produce a model with strong masked language modeling performance in both languages — and why legal text makes this work better than you might expect.

Domain Adaptation or Fine Tuning?

Fine-tuning and domain adaptation are often used interchangeably, but they solve different problems and require different approaches. Getting the distinction wrong is one of the more expensive mistakes in applied NLP.

Why General-Purpose Language Models Struggle with Legal Text

Legal language has structural and semantic properties that general-purpose models weren't trained to handle. For high-stakes legal NLP applications, the choice between fine-tuning, domain adaptation, and prompting is a consequential engineering decision.

Can You Tell If Something Was Written by an LLM?

The "em-dash debate" misses the point entirely. Detecting LLM-generated text is a challenging classification problem, and the "folk methods" people use to do it are somewhere between unreliable and useless.

Can You Tell If Something Was Written by an LLM?

The "em-dash debate" misses the point entirely. Detecting LLM-generated text is a challenging classification problem, and the "folk methods" people use to do it are somewhere between unreliable and useless.

Want a Good Model? Start with a Good Measurement Strategy

Measurement strategy is the most consequential modeling decision in a supervised learning project. Treat labeling as a low-skill data-cleaning task and you'll build a model that learns the wrong thing. You can't gloss over measurement and validity and expect to build a good model.

Want a Good Model? Start with a Good Measurement Strategy

Measurement strategy is the most consequential modeling decision in a supervised learning project. Treat labeling as a low-skill data-cleaning task and you'll build a model that learns the wrong thing. You can't gloss over measurement and validity and expect to build a good model.

Adaptive Learning

Review. Learn. Practice.

Whether you want to stay fresh, expand your knowledge, or prepare for interviews, The Inferential can help you improve your mastery of key data science and machine learning concepts.