Production-Grade Is a Threshold, Not a Spectrum
"In production" is not the same as "production-grade." Many AI features run in customer-facing settings without meeting any reasonable definition of production-grade engineering. They work most of the time. They fail in ways that the team has not characterized. They cost more than the spreadsheets predicted. They cannot be modified safely.
KPMG's 2024 enterprise AI survey found that 50 percent of AI features running in production had been redesigned at least once from their initial architecture within 18 months (KPMG, "AI in Control 2024 Survey," 2024). The rework was usually triggered by a production incident that exposed a structural weakness. Production-grade work is the work that avoids that rework cycle.
If your AI feature is heading into production this year, the question worth asking is not whether it works. The question is whether it meets the threshold.
The Six Things a Production-Grade Feature Has
Six characteristics define production-grade AI in 2026. A feature that lacks any one of them is not production-grade regardless of how well the model performs.
It has a measurable quality bar that runs continuously. Not "the team reviewed it before launch." A specific eval set, with specific accuracy targets, running against production samples on a schedule, with alerts when results regress.
It has bounded failure modes. The team has documented what the feature does when the model is wrong, when the model is unavailable, when the input is malformed, when the cost ceiling is exceeded. Each failure path has a designed response, not an emergent one.
It has explicit cost economics per unit of work. The team can answer the question "what does one customer interaction cost" in cents, with current numbers, derived from real telemetry rather than estimation.
It has versioned artifacts. Prompts, models, retrieval configurations, eval sets all under version control with documented changes. The team can answer "what was running on March 14" without forensics.
It has audit trail for high-impact outputs. Decisions, actions, and outputs that have consequence are logged with sufficient context to reconstruct after the fact. This is required by EU AI Act for high-risk systems and by good practice for everything else.
It has documented operational ownership. A named team or individual owns the feature in production. On-call rotation includes coverage for AI-specific incidents. Runbook exists and has been tested.
A feature with all six characteristics is production-grade. A feature with four or five is approaching the bar and needs the remaining work. A feature with three or fewer is in production by accident.
Where Most Implementations Fall Short
Three of the six are commonly missing.
The measurable quality bar is the most common gap. Teams often have manual quality review during development and abandon it once the feature is live. Production drift goes unnoticed until customer complaints accumulate.
Bounded failure modes are the second common gap. The feature works in the happy path. When the happy path breaks, the team finds out through customer reports rather than through designed degradation.
Cost economics per unit of work is the third gap. The aggregate spend is visible in the cloud bill. The per-interaction unit cost is not, which means the team cannot reason about price changes, traffic changes, or feature changes economically.
Teams that close these three gaps cover most of the difference between accidental-production and production-grade.
The Six-Week Crossing
Moving an existing AI feature from accidental-production to production-grade typically takes six weeks of focused engineering work. The work has a recognizable shape.
Week one runs a feature audit against the six characteristics. The team documents what exists and what is missing. The audit produces a punch list.
Weeks two and three build the eval framework. Eval set construction, harness setup, baseline measurement, alerting integration. By the end of week three, the team has a continuous measurement of quality.
Weeks four and five design and implement bounded failure modes. Fallback paths, circuit breakers, rate limit handling, cost ceiling enforcement, graceful degradation when the model is wrong or unavailable.
Week six establishes the audit trail, versioning discipline, and ownership documentation. Most of this is process work rather than code work.
This sequence produces a production-grade feature without changing what the feature does. It does change what happens when something goes wrong, which is most of what production-grade actually means.
What Distinguishes 2026 From 2024
Two things have shifted what production-grade requires.
Eval infrastructure has matured to the point where the measurable quality bar is no longer optional. Tools like Langfuse, Braintrust, Arize, and Galileo make continuous eval straightforward enough that teams without it are choosing not to have it. In 2023, eval infrastructure was a build. In 2026, it is a buy decision.
Regulatory expectation has shifted with EU AI Act enforcement starting in August 2026 (European Commission, AI Act timeline). High-risk systems require risk management, data governance, transparency, and human oversight. Production-grade engineering practices that produce these artifacts as byproducts also produce the regulatory documentation. The discipline is converging.
What This Costs
Crossing the threshold for a typical AI feature in a mid-market enterprise costs 4 to 8 engineering weeks plus tooling investment in the range of $30K to $150K annually depending on traffic. The investment is small relative to the cost of the rework cycle that production incidents otherwise trigger.
For features where AI is load-bearing on revenue or customer experience, the math has been settled for several quarters. Production-grade engineering pays back through reduced incident cost and faster iteration. The teams that have not done the work are usually deferring rather than rejecting it.
What Logiciel Does Here
Logiciel works with engineering teams whose AI features are running in production but have not been engineered to the production-grade threshold. The work is typically structured around the six-characteristic audit followed by a six-week crossing program for the highest-priority feature.
The AI Reliability framework covers the observability surfaces that production-grade requires. The Pilot to Production Path framework covers the 12-week sequence for features that have not yet reached production but should be engineered to the threshold from the start.
A 30-minute working session is enough to audit your current production AI features against the six characteristics.
Frequently Asked Questions
How do I justify production-grade investment to product leadership?
Through the cost of the rework cycle that production incidents trigger. Most teams have at least one example of a feature that had to be redesigned because of an incident. The cost of that incident plus the redesign is usually larger than the cost of production-grade engineering up front.
Can I skip the eval framework if my use case is low-stakes?
For internal tools or non-customer-facing features, the eval framework can be lighter. For customer-facing features, the eval framework is the difference between catching quality regression early and finding out from customer complaints. Skip it only at low traffic levels.
What is the right team to own production-grade engineering?
The team that owns the feature in production. Production-grade is not a separate concern that another team can add later. It is engineering discipline that the feature-owning team has to internalize.
How do I measure whether a feature has crossed the threshold?
The six-characteristic checklist. Each characteristic has a binary answer. Five or six produces a production-grade feature. Three or fewer does not.
How often do I need to revisit the threshold?
Quarterly for high-stakes features. Annually for lower-stakes ones. The threshold drifts as the feature evolves; characteristics that were present at launch can erode through subsequent changes. Sources: - KPMG, "AI in Control 2024 Survey" - European Commission, AI Act timeline