A leading Workday consulting firm
A leading Workday consulting firm was incurring rising costs from large language model (LLM) API usage, with no intelligent routing in place to match prompt complexity to the right model. High-cost APIs were being applied indiscriminately, responses lacked organizational context, and the firm had no scalable framework for governing AI usage as demand grew. Marlabs solved this by designing and deploying PromptRouter, its proprietary AI routing platform, which dynamically evaluates each prompt and directs it to the most appropriate and cost-effective model while drawing on the firm's knowledge base for contextual accuracy. The result was a reduction in LLM API costs of more than 50% and a transformation of the organization's AI operations from reactive to strategic.
As internal demand for AI-assisted support grew, a leading Workday consulting firm found itself incurring rising LLM API costs with little control over how those costs were being generated. Without a routing mechanism in place, high-cost models were being applied to simple queries that could have been handled far more economically, and complex queries were not always matched to the models best suited to answer them. The result was a pattern of unnecessary spend, inconsistent output quality, and AI responses that lacked the organizational context needed to be genuinely useful to the firm's staff.
The firm needed a solution that could address three interconnected problems at once: reduce costs by matching prompt complexity to the right LLM; improve response relevance by drawing on internal knowledge; and create a scalable, repeatable framework for AI usage as demand continued to grow. Meeting all three goals required not just a new tool but a fundamental rethinking of how the organization approached its AI operations.
Marlabs designed and deployed PromptRouter, its proprietary web-based AI routing platform, as the core of the solution. PromptRouter evaluates each incoming AI request in real-time, assessing prompt complexity and business context to determine the most appropriate and cost-effective model to use. The platform also integrates with the client's knowledge base, ensuring that responses draw on internal documentation and proprietary content to deliver organizationally relevant, high-quality answers.The solution was executed across four phases.
The first phase established the intelligence layer that powers PromptRouter's routing decisions. Our team analyzed the firm's existing AI usage patterns and defined complexity tiers, categorizing prompts by the depth of reasoning, domain knowledge, and compute resources each type of query required. Each tier was mapped to an appropriate LLM provider and pricing model, creating a capability matrix that aligned business requirements with the available model landscape. Governance rules were embedded into the routing logic to ensure alignment with cost thresholds, quality expectations, and business priorities, giving the organization a principled foundation on which to scale.
With the routing framework defined, the team designed and built the PromptRouter platform as a scalable, web-based application built for real-world enterprise use. The architecture separates front-end and back-end concerns cleanly, with Python and FastAPI services powering the routing engine and a Jinja2 and JavaScript interface providing a responsive, accessible user experience. The routing engine itself was implemented as a configurable system capable of evaluating each prompt against the defined complexity tiers and directing it to the appropriate model in real-time, with no manual intervention required. The platform was built to accommodate evolving model options and changing business rules without requiring a rebuild of the core infrastructure.
The third phase addressed the quality gap created by AI responses that lacked organizational context. Marlabs connected the firm's internal documentation and knowledge sources to the platform, indexing content and building a context retrieval layer that enriches each prompt with relevant proprietary information before it is routed to the selected model. API connectors were developed to keep the knowledge base current, ensuring the system draws on up-to-date content rather than relying solely on the general knowledge embedded in the underlying LLMs. This integration was the key to producing responses that felt genuinely tailored to the organization rather than generic, and it differentiated PromptRouter from a simple cost-optimization tool into a platform for contextually intelligent AI.
The final phase brought the platform into production and established the operational infrastructure needed to sustain and optimize it over time. Marlabs containerized the application using Docker for portable, consistent deployment and integrated ELK and Splunk for real-time logging and observability across all routing activity. A cost analytics dashboard was built to give stakeholders clear visibility into API spend by model, prompt category, and department, enabling continuous optimization as usage patterns evolved.This monitoring layer transformed cost management from a periodic review exercise into an ongoing, data-driven practice that allows the organization to respond quickly to shifts in demand or pricing.
The deployment of PromptRouter delivered immediate and measurable improvements across cost, quality, and operational efficiency. LLMAPI costs fell by more than 50% as simple prompts were redirected to lower-cost models and high-cost models were reserved for queries that genuinely required their capabilities. Response quality improved in parallel, with knowledge base integration producing answers that were richer, more contextually accurate, and more directly useful to the firm's Workday support staff.
Beyond the headline cost figure, the platform also reduced the time staff spent waiting for AI-generated outputs, lowered the burden of manual prompt engineering, and established a consistent standard for AI response quality across departments. The firm entered the engagement with a reactive approach to LLM usage, where cost and quality outcomes were largely unpredictable. It emerged with a strategic AI operating model built on are peatable, scalable framework that can accommodate growing demand without growing costs proportionally.