The Cost Curve That Started Going Wrong
A director of engineering at a logistics company opened her cloud bill in late 2024 and noticed something her predecessor had not flagged. The cloud spend had been growing 15 percent year over year for three years without corresponding business growth. The workloads that had been lift-and-shifted from on-premises in 2021 were now consuming a disproportionate share of cloud cost. The workloads that had been rebuilt cloud-native during the same period were not.
She investigated the pattern. The lift-and-shift workloads ran on EC2 instances sized for peak load that ran at 30 percent utilization most of the time. They used managed services minimally because the original architecture predated the managed services. They could not scale down because the architecture assumed always-on availability. They could not scale up because the architecture had been sized in concrete years earlier.
The lift-and-shift had saved migration cost in 2021. The accumulated cost in 2024 exceeded what cloud-native rebuilds would have cost. The cumulative arithmetic had crossed.
Her experience is common. Lift-and-shift is a reasonable starting point for cloud migration. The economics work for the migration itself. The architecture that results does not capture the cost benefits that justify cloud over time. The refactoring trigger arrives eventually for most lift-and-shift workloads.
We analysed 100 CTOs
This report shows what actually predicts delivery success and what CTOs discover too late.
What Lift-and-Shift Actually Costs Over Time
Lift-and-shift migration costs the initial migration effort plus an ongoing premium relative to cloud-native equivalents. The premium has specific components.
The first component is utilization waste. Lift-and-shift typically sizes instances for peak load because the original architecture cannot scale dynamically. The instances run at low utilization most of the time. Cloud-native architectures scale with demand and run at higher average utilization.
The second component is managed service avoidance. Lift-and-shift continues to operate self-managed databases, queues, caches, and other infrastructure. The operational overhead is real. Cloud-native architectures use managed equivalents that reduce operational burden and often cost less in aggregate.
The third component is architectural rigidity. Lift-and-shift workloads cannot easily adopt new cloud capabilities. AI services, serverless components, advanced networking all require some rearchitecting to integrate. The opportunity cost is the value of capabilities the workload cannot use.
The fourth component is talent friction. Engineers who understand the original architecture are increasingly rare and expensive. New engineers want to work on modern architectures. The team operating lift-and-shift workloads gets older and smaller over time.
These components are not visible in monthly bills directly. They accumulate as the gap between what the workload could achieve in cloud-native form and what it does achieve in lift-and-shift form.
The Refactoring Trigger
Three signals indicate the refactoring trigger has arrived for a lift-and-shift workload.
The first signal is cloud cost growing faster than business value. The workload's cloud spend increases over time without corresponding business benefit. The cost trajectory does not match the value trajectory.
The second signal is operational friction increasing. The team operating the workload spends more time on routine maintenance, security patching, capacity adjustments, and incident response. The work is unglamorous and growing.
The third signal is capability constraints affecting the business. The workload cannot adopt features the business needs. AI integration, real-time analytics, mobile responsiveness all sit blocked behind architectural limitations.
When two of three signals are present, the refactoring trigger has arrived. The exact timing varies; most lift-and-shift workloads hit the trigger between three and six years after initial migration.
Workloads that hit the trigger and refactor capture the cloud-native benefits. Workloads that hit the trigger and continue lift-and-shift accumulate technical debt that becomes more expensive to address as it grows.
The Refactoring Strategies That Work
Three strategies handle the refactoring depending on workload characteristics and team capacity.
The first strategy is incremental modernization within the existing architecture. The workload's monolithic structure stays. Specific components get replaced with managed services. The application server moves to ECS. The database moves to RDS. The cache moves to ElastiCache. The architecture looks similar but the operational characteristics improve.
This strategy fits workloads where the underlying architecture works but the operational cost is too high. The refactoring is targeted at the operational pain points. The investment is moderate. The benefits accumulate as more components get modernized.
The second strategy is decomposition into cloud-native services. The monolithic workload gets split into services that can scale independently. Each service uses cloud-native patterns (managed databases, serverless components, container orchestration). The architecture changes substantially.
This strategy fits workloads where the architecture itself is limiting the business. The investment is large. The benefits are correspondingly large.
The third strategy is replacement with a different solution. The lift-and-shift workload gets replaced by a different application entirely. SaaS that handles the same business function. Custom application built on modern architecture. The original workload gets decommissioned.
This strategy fits workloads where the original code is more liability than asset. The investment depends on the replacement choice. The benefits include shedding the maintenance burden of the original code.
The right strategy depends on the workload. Most large organizations have a portfolio of lift-and-shift workloads that fit different strategies. The portfolio-level decision is about which workloads get which strategy and in what order.
The Sequence That Works
Refactoring all lift-and-shift workloads simultaneously rarely works. The engineering capacity is limited. The organizational disruption is large. The sequence matters.
The pattern that works for large portfolios is to prioritize by the refactoring trigger signals. Workloads with the strongest signals refactor first. The work demonstrates the benefits and builds organizational capability. Subsequent waves refactor lower-priority workloads with the patterns established.
Within each wave, the refactoring approach varies by workload. Incremental modernization for workloads that fit. Decomposition for workloads that need it. Replacement for workloads that should not survive.
The sequence usually runs over two to four years for a substantial portfolio. The pace is bounded by engineering capacity and organizational change tolerance, not by the technical work itself.
What Goes Wrong With Refactoring Programs
Three patterns of refactoring failure are common.
The first pattern is scope creep. The refactoring starts as incremental modernization and grows into decomposition that the team did not plan for. The work takes longer than estimated. Business stakeholders lose patience. The program gets paused with workloads in intermediate states.
The second pattern is parallel-system trap. The refactoring builds the new architecture alongside the old. Cutover is deferred indefinitely. The team operates both systems for years. The cost doubles rather than transitioning.
The third pattern is under-investment in operational maturity for the new architecture. The new cloud-native workloads ship without the operational practices they need. Incidents emerge. Trust in cloud-native erodes. Future refactoring proposals face resistance.
These patterns are preventable through deliberate program design. Scope discipline. Explicit cutover commitments. Operational investment alongside technical refactoring.
What This Costs
Refactoring investment varies substantially by workload. Incremental modernization typically costs 20-40 percent of the original migration effort. Decomposition typically costs 100-200 percent of the original migration effort. Replacement costs depend on the replacement choice.
The return varies similarly. Incremental modernization typically reduces operational cost 20-40 percent. Decomposition typically reduces operational cost 40-60 percent and unlocks capability. Replacement returns depend on the new solution.
For most portfolios, the cumulative return justifies the investment over three to five years. The math depends on workload-specific characteristics. Honest analysis at the portfolio level is what produces good prioritization.
Why the Best CTOs Don't Hire, They Audit
Inside a one-quarter overhead audit that pulled a five-person data team back from 67% firefighting.
What Logiciel Does Here
Logiciel works with engineering and IT leadership running cloud refactoring programs or evaluating lift-and-shift workloads against the refactoring trigger. The work is typically structured around portfolio assessment, strategy selection per workload, and sequenced execution.
The Cloud Infrastructure Modernization framework covers the broader modernization approach that refactoring fits within. The Cloud Migration Patterns That Deliver ROI framework covers the Six R framework that informs refactoring decisions.
A 30-minute working session is enough to assess your workload portfolio against the refactoring trigger.
Frequently Asked Questions
How do I know if my workload has hit the refactoring trigger?
Through the three signals (cost growth, operational friction, capability constraints). Two or three signals present indicate trigger. The signals are easier to assess against historical data than against intuition.
Should I refactor all lift-and-shift workloads?
No. Some lift-and-shift workloads are appropriate to maintain. Workloads scheduled for retirement, workloads that meet business needs without trigger signals, workloads where the cost-benefit of refactoring is unfavorable. The portfolio decision is selective.
What is the right team for refactoring?
A mix of original-architecture knowledge and cloud-native expertise. Pure cloud-native teams without legacy knowledge often miss workload-specific constraints. Pure legacy teams without cloud-native expertise often miss modernization opportunities. The combination works.
How does AI workload integration affect refactoring decisions?
AI integration often becomes the refactoring trigger. Workloads that the business wants AI-enhanced typically cannot integrate AI without architectural changes. The refactoring serves two purposes (modernization plus AI integration) and the combined business case is stronger than either alone.
What is the right sequence for a large portfolio?
High-trigger-signal workloads first. Workloads that block business capability second. Lower-priority workloads in subsequent waves. The sequence builds capability and demonstrates value, which helps fund the longer program. Sources: - McKinsey, "Cloud's trillion-dollar prize is up for grabs," 2024 - Flexera, "2024 State of the Cloud Report"