The Confidence Layer Didn't Belong to BoardPath

BoardPath had a scoring engine before it had paying customers. Every answer the system gave a board came back with a 0–100 confidence score and a per-dimension breakdown — authority, grounding, retrieval quality, and the rest. It was the part of the product I was proudest of, because it was the part that made a skeptic willing to act on an AI answer.

Then I noticed I kept reaching for it in other projects. A forensic document tool wanted the same thing. A side experiment wanted it. Each time, I copy-pasted a slightly different version of the same logic — and each copy drifted a little further from the others.

That's the moment a feature is telling you it wants to be a library.

What It Actually Was

The scoring engine wasn't governance logic. It never had been. It didn't know what a CC&R was, didn't retrieve anything, didn't call a model. It took signals a RAG pipeline already produces — retrieval scores, document metadata, citation overlap, how much the retrieved chunks agreed with each other — and composed them into one auditable number with a reason attached to every point.

The governance lived in which documents got ranked how. The scoring was domain-agnostic the whole time. I just hadn't drawn the boundary.

A confidence number without a per-dimension breakdown and a recommended action isn't auditable. It's a vibe with a decimal point.

The tell

Three projects, three slightly different copies of the same scorer, none of them the source of truth. A bug fixed in one didn't reach the others. A new dimension added in one made the others quietly out of date. The logic wasn't specific to any of the three products it lived inside — which is exactly why it shouldn't have lived inside any of them.

The Design Response

I pulled the scorer out and published it as transparent-confidence — Apache-2.0, on npm. Three decisions defined what it became, and each one was a deliberate constraint, not a default.

npm install transparent-confidence

Zero runtime dependencies

If the job is scoring and policy — not retrieval, not inference — then it should run with what the caller already has. No ML stack, no server, no model calls. There is nothing to audit but the package itself. For a library whose entire purpose is trust, a dependency tree is a liability.

It runs at query time, not in an eval pipeline

This is the line that separates it from RAGAs, TruLens, and DeepEval. Those are evaluation frameworks — they run offline, after the fact, and call an LLM to judge answer quality. That's valuable, and this doesn't replace it. transparent-confidence runs inline, the moment your system answers, using signals already in hand. No extra round-trip. It sits next to the eval tools, not on top of them.

It returns an action, not just a number

Every scorecard comes back with a recommendation — answer, review, or abstain — and a reason string. A score you have to interpret is a score you'll ignore under load. The point is to gate on it: drop below your threshold, route the question to a human before the user ever sees the response. The most important thing one of these systems can say is the corpus does not address this — and the package is built so it can say that out loud.

What It Revealed

Extracting it forced an honesty I'd skipped while it was buried in a product. Inside BoardPath, I could lean on context the scorer technically shouldn't know about. As a standalone library, every assumption had to be stated in a type or a default — and a few of them turned out to be wrong, or at least too specific to governance.

The dimension set got cleaner. The weights got documented. The thing got a calibration story it didn't have before: the score is not a probability of correctness until you calibrate it against your own labeled outcomes — and now the README says exactly that, because a confidence number that overpromises is worse than no number at all.

The version that shipped is v0.3 — eight dimensions, 412 tests, dual ESM/CJS, zero dependencies. It scores authority, grounding, retrieval quality, corpus coverage, freshness, consistency, answer relevance, and index integrity. BoardPath still uses it. So does everything else I build that touches retrieval. The copy-paste drift is gone, because there's one source now, versioned in the open.

The lesson worth keeping

It wasn't about scoring. It was about boundaries. The most reusable thing in a product is usually the part that doesn't know which product it's in.

The package is open source under Apache-2.0. Install it, read the algorithm docs, or take the dimensions apart — it's all in the open.

npm install transparent-confidence

Repo: github.com/emtcmca/transparent-confidence

transparent-confidence RAG Confidence Design Open Source Library Design Zero Dependencies