How SpendCraft Thinks About AI

A note for people evaluating the product

Why this exists

A lot of procurement software vendors will tell you they are powered by AI. The claim has become common enough that it does not actually tell you much. Some products have AI as a core engineering investment. Others have a general purpose language model behind a single screen. The label does not distinguish them. The products are very different to use, very different to defend, and very different to live with for a year.

This page exists for the buyers and IT teams and AI governance reviewers who want a real answer to a specific question. What kind of AI is in SpendCraft, where does it sit, and why did we build it this way.

The short version is this. SpendCraft is a specialized ML product for procurement with a selective AI layer. The core, including classification, normalization, confidence scoring, and savings math, is built on traditional ML and rules based engineering. AI is used in a small number of specific places where it genuinely helps. That is the position. The rest of this page is the reasoning.

The user is in the middle

Before the architecture, the frame. SpendCraft is built around a person doing procurement work. A category manager. A sourcing lead. A CPO. That person sits in the middle of the product and the product works alongside them. They are not handing the product a job and waiting for an answer. They are working with the product, day to day, on data that flows in continuously.

That framing has consequences. The user has to be able to verify what the product is telling them. They have to be able to disagree with it and correct it, and have the correction stick. They have to be able to explain the product's output to finance, to audit, and to their own team without saying "the AI said so." They have to trust the product with decisions that have financial consequences, and that trust has to compound.

The architecture follows from this. Most of the choices below come back to the user, and what they need from the product to actually do the work.

What runs when you use SpendCraft

A typical customer dataset is hundreds of thousands of records, often into the millions. Continuous reclassification as new data flows in. This is the working assumption, not the stress test. Most of what happens in this loop involves no language model at all.

Vendor normalization. "C.H. Robinson Worldwide, Inc.," "CH Robinson," and "C H Robinson Worldwide" are the same vendor, shown three different ways. SpendCraft reconciles these into a canonical record using deterministic rules, fuzzy matching, and embedding similarity. Mature ML handles the messy cases that matter in practice.

Classification. The classifier is a specialized ML model. It is based on embeddings, tuned on your seed labels, and supported by calibrated confidence scoring. It is not a language model. At our scale, an LLM as the classifier is the wrong tool. It is too expensive, too slow, and not auditable enough for the user to defend the output. A specialized classifier is the right tool, and it is what makes feature attribution possible. When you ask why a record was classified a particular way, the product can point at the specific input fields that drove the decision.

Confidence scoring. Calibrated buckets, high, medium, and low, are derived from the classifier and validated against held out data. When the product says "high confidence," it corresponds to a specific empirical accuracy rate. It does not refer to a language model's token probability, which is a different thing dressed in the same word.

Drop off analysis. This shows where classification confidence collapses in the taxonomy tree, including the feature that flags a vendor as "drops at L3." It uses deterministic statistics on classifier output. This is what makes battle cards work. They direct your attention to the hardest classification problems first, in a way you can verify.

Savings calculations. Every figure in the Savings Opportunity Scans is deterministic arithmetic on structured data. Addressable spend. Minimum and maximum ranges. Baseline comparisons. Every number traces to a specific calculation on specific records. The principle that does not bend is simple: a procurement leader has to defend savings numbers to finance, and a number that came out of a language model is not defensible.

Each of these is designed so the user can see the output, verify it, and defend it. That is what working alongside the product actually requires.

Where AI helps you work

AI is in the product. It is used where it genuinely helps the user do the work. No further.

Explainability in the review surface. When you drill into a record, you see a single line rationale citing the specific reference fields that drove the classifier's decision, with those fields highlighted in the evidence view. Most of this is constructed from the classifier's feature attribution. You see the claim and the evidence side by side. Where templated rationale falls short, a language model may assist on demand.

Chat. Natural language queries against your data. This is where a language model is most visibly part of the product, and most defensibly so. Understanding what you are asking is a task language models are genuinely good at. Chat interprets your question and narrates the result. It does not invent facts about your data. The answer is grounded in the same analytical engine that powers the rest of the product.

Narrative in savings scans. Short descriptions of what was detected. The numbers come from the deterministic engine. The language explaining them is assisted by AI, with strict guardrails that prevent the model from making claims the math does not support. The numbers and the narrative appear in the same view. They have to agree, or the narrative fails.

Group level summaries on the review queue. When records cluster around the same vendor, same proposed classification, and same confidence, a single sentence description of why the cluster formed can help you act faster. This is low volume, optional, and available on demand.

That is the whole list. When we consider adding a capability powered by AI, the burden of proof is on the AI path.

What we deliberately do not do

A short list of patterns that are common in this category and absent from SpendCraft. Each absence is a deliberate choice.

A language model is not the classifier. Wrong tool at our scale. We evaluated this carefully and walked it back.

Savings estimates are not generated by a language model. Every number is the output of deterministic calculation on your data. If you cannot trace it, you cannot defend it.

There are no autonomous AI agents taking action on your behalf. SpendCraft keeps humans in the loop by design. AI surfaces signals. You make decisions and take actions. The closed loop tracking in Scout is a workflow tool, not an agent.

Vendor master resolution is not done by language model. Mature ML outperforms LLMs here, and the cost of getting vendor identity wrong cascades through classifications, savings, vendor profiles. Wrong tool for too important a job.

Battle cards do not include business context generated by AI when the model cannot verify it. A language model can produce plausible sounding strategic narrative about why a card matters to your business. It cannot actually know your business. We declined to put that in front of users.

Why this matters

Four reasons, stated plainly.

Scale. Hundreds of thousands to millions of records per customer. A specialized classifier operates here at reasonable cost and latency. An LLM classifier does not. Not at a price that works. Not at a speed that supports a working loop.

Auditability. You have to explain the product's output to finance, to audit, and to your team. A deterministic engine can be explained by pointing at the inputs, the logic, and the calculation. A language model's output sometimes cannot be explained, at least not in a way that holds up to a specific question about a specific number.

Governance. Enterprise AI governance committees in 2026 are scrutinizing where and how language models are used in vendor products. A product where the classifier is specialized ML and language models appear only in chat and narrative is a shorter, cleaner review.

Trust that compounds. Every time you verify that a number traces to a calculation, or that a classification points at input fields you can see, trust accrues to the product. That trust is what makes the surfaces powered by AI credible. A user who has spent a month watching the deterministic engine be right is a user who will trust the AI layer. A product whose core is a language model has to earn that trust the hard way, and often does not.

What this is not

This is not a position against AI. We use AI in the product. We will add AI capabilities over time as specific use cases earn them. Foundation models will keep improving, and some things that are better as specialized ML today may shift in two or three years. The architecture will be revisited.

What is durable is the principle. A user in the middle, working alongside the product, with the product built to be worthy of that work. Deterministic where it matters. AI where it helps.

In Short

SpendCraft uses AI where it earns its keep, and uses proper engineering everywhere else.

If that describes the product you want to evaluate, the rest of this page is the long form explanation. If you want to go deeper on any specific part, including the classifier, the confidence calibration, the explainability layer, or the savings math, we are happy to walk you through it.