Monolith, Microservices, or Something In Between
I have built layered monoliths, modular monoliths, service-oriented systems and event-driven architectures over the course of twenty-five years. The question I get asked most often is which one is the right choice. The honest answer is that the question itself is usually the problem. You do not start with the style. You start with the constraints.
There is a version of this conversation that plays out at conferences, in blog posts, and in architecture review meetings where someone draws a diagram full of boxes and arrows and declares that the system needs microservices. Sometimes they are right. More often, they have skipped the part where you work out whether the problem actually requires that level of distribution. I have been on both sides of that conversation, and the mistakes I have made have almost always come from choosing a style too early, before the shape of the problem was properly understood.
This post is not a defence of any single approach. It is an attempt to lay out how I think about the decision, based on building real systems that serve real users at scale.
The monolith is not the enemy
Somewhere along the way, "monolith" became a dirty word. That is unfortunate, because a well-structured monolith is a genuinely good architecture for a lot of problems.
When I say well-structured, I mean something specific. Clean separation of concerns between presentation, business logic and data access. Dependency injection and inversion of control throughout. Repository patterns for data access. Clear layering that a new developer can understand within their first week. A single deployable unit with a single deployment pipeline and a single set of logs to read when something goes wrong.
For a focused product with a small team, this approach has enormous advantages. You can run the entire system on a developer's machine. Debugging is straightforward because everything happens in one process. Deployment is one pipeline, one artefact, one thing to monitor. There is no network boundary between components, which means no latency surprises, no serialisation overhead, no distributed transaction headaches.
The argument against monoliths usually centres on scaling and team independence. Those are valid concerns. But they are concerns that apply at a certain scale and team size, not universally. A team of three or four developers building a focused product does not need the operational overhead of a distributed system. They need clean code, good tests, a fast deployment pipeline, and the ability to ship changes confidently.
I have built products as monoliths that have served reliably for years, handling significant load, because the architecture was clean and the deployment was simple. The style suited the problem. That is all that matters.
When distribution earns its cost
The reasons to move beyond a monolith are real, but they are specific. In my experience, the legitimate triggers fall into a small number of categories.
The first is avoiding single points of failure. When a system serves public-facing infrastructure and availability matters enormously, having a single process that handles everything becomes a liability. If one component fails, it takes everything with it. Separating critical paths from non-critical ones, with proper isolation between them, means a failure in background processing does not bring down the API that users depend on.
The second is scalability. When different parts of the system have fundamentally different load profiles, scaling them independently makes sense. A read-heavy journey planning API and a write-heavy data ingestion pipeline do not need to scale together. Running them as separate services means you can allocate resources where they are actually needed rather than scaling everything to accommodate the most demanding component.
The third is resilience through decoupling. Message brokers like RabbitMQ allow components to communicate asynchronously, which means a temporary failure in one part of the system does not cascade into others. The message sits in the queue until the downstream service recovers. This is genuinely useful when you are integrating with external data sources that may be unreliable or when processing steps are naturally asynchronous.
CQRS follows a similar logic. When the read model and the write model have fundamentally different performance characteristics, separating them allows each to be optimised independently. In journey planning, the queries that users make and the data ingestion that feeds the system are very different workloads. Treating them as a single concern forces compromises on both sides.
API gateway patterns, through something like Azure API Management, add another layer of useful separation. External consumers get a stable interface. Internal services can evolve independently. Rate limiting, security policies and monitoring happen at the boundary rather than being scattered through the codebase.
All of this is genuinely valuable. But every single one of these patterns comes with operational cost, and that cost is easy to underestimate.
The cost nobody talks about
Here is the thing about distributed systems that does not make it into the conference talks. They are harder to run. Not slightly harder. Meaningfully harder, in ways that compound over time.
Start with the developer experience. With a monolith, a developer clones the repository, runs the application, and they are working. With a distributed system, they need multiple services running simultaneously. Docker and containerisation help, but they do not eliminate the complexity. You still need to manage service dependencies, shared configuration, inter-service communication, and the local equivalent of whatever message broker or API gateway you use in production. The gap between the developer's laptop and the production environment grows, and that gap is where bugs hide.
Then there is debugging. In a monolith, a stack trace tells you roughly what happened and where. In a distributed system, a request might touch three services, two queues and a gateway before it produces a result. Tracing that path requires distributed tracing infrastructure, correlated logging, and the discipline to actually instrument everything properly. Most teams underestimate how much work this is.
Deployment pipelines multiply. Instead of one pipeline you now have several, each with their own build, test, and release process. Coordinating releases across services requires either very careful API versioning or acceptance that things will occasionally break at the integration boundary.
Monitoring and alerting become more complex. You need to know not just whether each service is healthy, but whether the interactions between services are healthy. A service can be returning 200 responses while silently producing wrong results because a downstream dependency changed its behaviour.
None of these problems are unsolvable. But they require investment, tooling, discipline, and a team that is large enough and experienced enough to manage the overhead. For a small team, that overhead can easily consume more engineering time than the problem it was meant to solve.
The Strangler Fig: modernising without the gamble
One of the patterns I come back to most often is the Strangler Fig. The idea is simple. Rather than rewriting a legacy system from scratch, you wrap it. New functionality goes through new code. Old functionality is progressively migrated. Over time, the legacy component shrinks until it can be retired.
I have used this pattern repeatedly, and it has served me well for a few reasons.
First, it manages risk. A full rewrite is a bet. You are wagering that the new system will handle everything the old system handled, including the edge cases nobody documented and the behaviours that only exist because of bugs that users now depend on. Rewrites take longer than expected, cost more than budgeted, and frequently fail to reproduce the subtle behaviours of the system they are replacing. The Strangler Fig avoids this by keeping the old system running while you build the new one alongside it.
Second, it delivers value throughout the migration. Every phase produces something usable. You are not asking the business to wait eighteen months for a big-bang switch. You are shipping improvements continuously while the migration happens in the background.
Third, it pairs well with resilience patterns. Circuit breakers between the new components and the legacy system mean that failures in the old code do not bring down the new code, and vice versa. You can set timeouts, fallback behaviours, and monitoring at the boundary so that the migration itself does not introduce instability.
The Strangler Fig is not glamorous. It does not look as clean on a whiteboard as a fresh greenfield architecture. But it works, and in my experience it works more reliably than the alternative.
Event-driven does not mean microservices
This is a distinction that gets lost surprisingly often. Event-driven architecture and microservices are separate decisions. You can have one without the other.
You can introduce a message broker into a monolith. A well-structured monolith might publish events to a RabbitMQ queue for background processing while remaining a single deployable unit. The event handling happens within the same codebase, or in a small number of worker processes that share the same domain model. You get the benefits of asynchronous decoupling without the operational overhead of a fully distributed system.
Similarly, CQRS can exist at the module level. Separating your read model from your write model does not require separating them into different services. It requires separating the concerns cleanly in the code. The deployment boundary is a separate question.
The decision to decouple is about managing complexity in the domain. The decision to distribute is about managing complexity in operations. They should be made independently, based on different criteria. Conflating them leads to systems that are distributed for the sake of being distributed, which is one of the more expensive mistakes an architect can make.
How I actually choose
When I am making an architectural style decision, I work through a set of questions. They are not particularly original, but they have proven reliable over many years.
How large is the team? A small team building a single product almost always benefits from a simpler architecture. The coordination overhead of distributed services only makes sense when you have enough people to own them independently.
How often do different parts of the system need to change independently? If the answer is rarely, a monolith with clean module boundaries is fine. If different components genuinely need to evolve and deploy on different schedules, separation becomes more valuable.
What are the scaling characteristics? Does the whole system need to scale together, or do specific components have materially different load profiles? Uniform scaling points towards a simpler architecture. Divergent scaling profiles point towards separation.
What is the team's operational maturity? Running a distributed system well requires solid CI/CD, monitoring, distributed tracing, and incident response processes. If those are not in place, adding distribution adds risk rather than reducing it.
What does the data consistency model look like? If strong consistency across the system is important, a monolith with a shared database is simpler and more reliable. If eventual consistency is acceptable for certain flows, asynchronous patterns become viable.
And finally, what is the cost? Not just the build cost, but the ongoing operational cost. Cloud infrastructure, monitoring tooling, developer time spent managing the platform rather than building features. An architecture that the team cannot afford to run properly is an architecture that will degrade over time.
These questions do not produce a formula. They produce a conversation, and that conversation is where the real architectural decision happens.
The modular monolith: the answer nobody finds exciting
If I had to pick the architectural style that I think is underrated, it would be the modular monolith. It is the approach that gets the least attention at conferences and in blog posts, probably because it sounds boring. But boring and effective is a combination I have learned to value.
The idea is straightforward. You build a monolith, but you enforce clear boundaries between modules. Each module owns its own domain logic, its own data access, and exposes a defined interface to the rest of the system. Communication between modules happens through those interfaces, not by reaching directly into each other's internals.
This gives you many of the benefits of microservices without the operational overhead. Modules can be developed somewhat independently. Domain boundaries are explicit. The codebase is organised around the business rather than around technical layers. And if a module genuinely needs to become a separate service later, the extraction is much cleaner because the boundaries already exist.
The discipline required is real. Without enforcement, module boundaries erode. Developers take shortcuts, reach across boundaries, and the clean separation dissolves into a traditional monolith with extra folders. Enforcing boundaries through packaging conventions, access controls in the code, and architectural fitness functions in the build pipeline takes deliberate effort.
But that effort is significantly less than the effort required to run a distributed system. And for many products, particularly those with small to medium teams, a modular monolith with the option to extract later is the pragmatic sweet spot.
What I have learned
After twenty-five years of building systems in different styles, a few things have become clear to me.
The best architecture is the simplest one that handles the actual requirements. Not the requirements you might have in two years. Not the scale you hope to reach. The requirements you have now, with the team you have now, and the budget you have now.
Distributed systems should be a conclusion, not a starting point. You should be able to articulate exactly why distribution is necessary for your specific problem, and the answer should not be "because that is what modern architecture looks like."
The decision to decouple and the decision to distribute are separate. You can have clean domain boundaries in a monolith. You can have event-driven patterns without microservices. Separating these decisions gives you more options and fewer regrets.
The Strangler Fig is almost always better than a rewrite. It is slower, less dramatic, and produces worse conference talks. It also works.
And the architectures that age best, in my experience, are the ones where the style was chosen for the problem rather than for the architect's preferences. That sounds obvious, but it is surprisingly easy to get wrong when the industry is telling you that one particular approach is the future.
The future, as it turns out, is whatever actually works for your team, your product, and your users. Everything else is just fashion.