Architecture Note · 01

Why I built a confidence scoring layer — and what it taught me about AI transparency

BoardPath · May 2026 · Eric Tetzlaff

There's a board meeting I think about often when I'm designing BoardPath's transparency layer. It involved a roof. Specifically, it involved two words — repair and replacement — that appear in most HOA governing documents without any definition attached to them, and which carry enormous financial consequences depending on how a board chooses to interpret them.

The association's documents were clear about one thing: repair of a unit's roof was the association's financial responsibility. Replacement was the unit owner's. What nobody had thought to define when the documents were drafted, decades earlier, was exactly where repair ended and replacement began.

The room

Every person at that table had an interpretation. Every interpretation was shaped — consciously or not — by their own financial stake in the outcome. The board wanted one answer. The homeowner wanted another. They looked to me, the association manager, the supposed impartial party, to validate whichever side they were on. When I presented the baseline legal definitions of both terms, outlined the applicable case law, and declined to simply agree with either party, I managed to please nobody. Doing my job correctly was held against me by both sides simultaneously.

What that room needed — what I desperately wished I had at that table — wasn't a smarter answer. It was a transparent one. Something that showed every person in the room, simultaneously, exactly what the documents said, where the language was ambiguous, what the legal authority was for each interpretation, and where the governing documents simply didn't provide a definitive answer. Not an answer that agreed with one side. A scorecard that made the complexity legible to everyone at once.

That's the origin of Transparent Confidence™.

"If they didn't immediately understand what was being presented, they almost never sought to learn more. They just got frustrated that it wasn't in the 'right' format — whatever that meant to them that day."

The client base I worked with was not technology-forward. Many of my board members were in their 60s and 70s. I once had a coworker accuse me of "always playing with that damn AI instead of getting real work done" — right before I showed her a bot I'd built that automated the compilation and submission of 1120-H tax returns for all of our clients in minutes, instead of the two to three weeks she'd been spending on the same task manually. The results were real. They didn't fully change her mind. For certain people, the medium was the message — and AI as a medium wasn't one they were prepared to trust.

That became a hard design constraint. If the people who most needed this tool were predisposed to distrust it, accuracy alone wasn't going to be enough. Being right wasn't sufficient. The system had to show its work in a way a skeptic could evaluate without needing to understand how AI works at all. Legible at a glance. No explanation required.

The five scoring dimensions didn't arrive fully formed. They emerged from a specific question I kept asking: if this output were ever challenged in a legal setting, what would need to be documented about how it was produced? I'd appeared in court on behalf of client associations. I knew what evidentiary standards looked like. I applied that framework directly to answer scoring:

Authority Rank
Which document type is being cited, and where does it sit in the legal hierarchy? A declaration supersedes bylaws. Bylaws supersede rules and regulations. Amendments modify specific provisions of the documents they amend. Every cited answer surfaces which authority level it draws from — explicitly, not implicitly.
Citation Directness
Is the answer drawn from explicit language, or does it require interpretation? "The declaration states X" is a fundamentally different confidence level than "the declaration implies X, which a reasonable reading suggests means Y." That distinction isn't cosmetic. In a legal context, it's the difference between a defensible position and an exposed one.
Ambiguity Detection
This is the hardest dimension to score — and the most important. Many HOA governing documents were drafted by attorneys unfamiliar with the association, without consideration for existing documents in the corpus. They contain provisions that contradict existing language, use undefined terms, and leave room for multiple reasonable readings. The system flags this explicitly rather than papering over it with a confident-sounding answer that crosses a line it has no authority to cross.
Conflict Detection
Do multiple documents in the corpus address the same issue with different language? If the declaration says one thing and the rules say another, the answer must surface that conflict openly — not silently resolve it by weighting one source over another. The conflict is information. Hiding it is a design failure.
Statute Compliance Risk
Does the answer touch an area governed by state HOA law? A corpus-based answer that's fully supported by governing documents but conflicts with state statute isn't a correct answer — it's a liability. This dimension flags that risk rather than letting a confident corpus answer obscure a legal exposure the platform has no business ignoring.

A sixth dimension sits beneath all five: corpus completeness. If a client's governing document set is incomplete — missing recorded amendments, absent exhibits, no archived board resolutions — the system flags it. Confidence in any answer is bounded by the completeness of its source material. An incomplete corpus doesn't just limit the answer. It changes the meaning of any confidence score attached to it.

The guardrail architecture built around this scoring system is equally deliberate. The roof scenario represents the exact category of case that kept me up at night during design: a situation where the corpus is ambiguous, the legal definitions are unclear, and a confident-sounding answer would cross a line the system has no authority to cross. I needed the platform to understand its own limitations — not just as a system prompt instruction, but as a structural constraint at every layer of the architecture:

01
Explicit tool permissions & exclusions
Primary guardrail. Certain actions are structurally impossible — not discouraged. Impossible. Implemented at the tool definition layer, not in prompts.
02
PreToolUse / PostToolUse SDK hooks
Enforcement layer. Validates inputs before tool execution and outputs before they propagate upstream. Catches edge cases the permission layer doesn't fully address.
03
Mandated analytical workflows
Structural layer. The system cannot skip corpus completeness checks, conflict detection, or statute flagging on its way to an answer. The sequence is required, not suggested.
04
System prompt guardrails
Last line of defense — explicitly designed not to carry the weight the structural layers are meant to bear. Behavioral guidance for edge cases the architecture above doesn't fully close.

Zero room for hallucination. Zero room for analysis that meanders across legal boundaries the system has no authority to cross. That wasn't a feature added later. It was a design requirement from the first line of code.

What this entire process taught me is that transparency in AI systems isn't a UX layer you bolt on at the end. It's an architectural commitment you make at the beginning — in the data structures, in the retrieval logic, in the answer generation pipeline, in the guardrail stack. If you wait until you have an answer to figure out how to explain it, you've already made decisions that make full explanation impossible.

The principle I carry into every system I build: an AI system that makes its uncertainty visible is more trustworthy than one that doesn't — not because it's less capable, but because it's more honest. The board member who can read a transparent scorecard and make her own judgment is better served than the one who receives a confident answer from a black box with no way to evaluate it.

That's not a limitation of the technology. That's a design philosophy. And it's the one I think actually matters as this technology becomes something people rely on for decisions that affect real lives and real money.

← All writing Next post →