By Josh Fjelstul, Ph.D.
Featured

Not Every NLP Problem Needs a Frontier Model
Frontier LLMs are capable. They're also expensive, slow, and prone to hallucination. For many NLP tasks, using a fine-tuned BERT model will be more accurate, easier to audit, better suited to your domain, and cheaper by orders of magnitude — and it'll keep your data off someone else's servers.
Read
Not Every NLP Problem Needs a Frontier Model
Frontier LLMs are capable. They're also expensive, slow, and prone to hallucination. For many NLP tasks, using a fine-tuned BERT model will be more accurate, easier to audit, better suited to your domain, and cheaper by orders of magnitude — and it'll keep your data off someone else's servers.
Read

All posts

Making a Monolingual Model Bilingual with Domain Adaptation
You have an English BERT model that works well on legal text. But your corpus is bilingual. Here's how domain adaptation on a bilingual corpus can produce a model with strong masked language modeling performance in both languages — and why legal text makes this work better than you might expect.
Read

Making a Monolingual Model Bilingual with Domain Adaptation
You have an English BERT model that works well on legal text. But your corpus is bilingual. Here's how domain adaptation on a bilingual corpus can produce a model with strong masked language modeling performance in both languages — and why legal text makes this work better than you might expect.
Read

Can You Tell If Something Was Written by an LLM?
The "em-dash debate" misses the point entirely. Detecting LLM-generated text is a challenging classification problem, and the "folk methods" people use to do it are somewhere between unreliable and useless.
Read

Can You Tell If Something Was Written by an LLM?
The "em-dash debate" misses the point entirely. Detecting LLM-generated text is a challenging classification problem, and the "folk methods" people use to do it are somewhere between unreliable and useless.
Read

Want a Good Model? Start with a Good Measurement Strategy
Measurement strategy is the most consequential modeling decision in a supervised learning project. Treat labeling as a low-skill data-cleaning task and you'll build a model that learns the wrong thing. You can't gloss over measurement and validity and expect to build a good model.
Read

Want a Good Model? Start with a Good Measurement Strategy
Measurement strategy is the most consequential modeling decision in a supervised learning project. Treat labeling as a low-skill data-cleaning task and you'll build a model that learns the wrong thing. You can't gloss over measurement and validity and expect to build a good model.
Read
