Architecture Category for Udi Dahan's Blog

Udi Dahan – The Software Simplist
Enterprise Development Expert & SOA Specialist

Blog

Consulting

Training

Articles

Speaking

About

Archive for the ‘Architecture’ Category

High Availability Presentation

Monday, June 21st, 2010

OK – this is the last one, I promise. Well, for now, anyway.

Earlier this month at TechEd North America I gave a fairly new presentation that was only delivered once before (at the Connected Systems User Group in London) and I’m happy to say is now online for your viewing pleasure.

High Availability – A Contrarian View

Comments? Thoughts? Let me know.

Posted in Architecture, Availability, Presentations, Reliability, The Team | 4 Comments »

Server Naming and Configuration Conflicts

Saturday, June 5th, 2010

Configuration In my work with clients the topic of how to handle the movement of software from one environment to another inevitably comes up. Sometimes this is in the context of NServiceBus but the problem is more generic. The faster that an organization is able to get software out the door, the more agile they can be.

Unfortunately, there is one tiny little mistake that I see almost everywhere that gets in the way, and that’s going to be the topic of this post.

The Problem

Let’s say you have a standard web app environment – some web servers, application servers, and a database server. Your web servers need to send messages to the application servers. So far, so good.

In your test environment, you have an application server called AS_01_Test, and your web servers are configured to send it messages. However, in your staging environment the application server fulfilling that same role is called AS_01_Stage. This creates a configuration problem – you need to change the config of your web servers as you move the web app from Test to Staging.

I’ve seen companies doing all sorts of creative things to get around this problem – some of them involve putting all configuration settings in a database so that they can be centrally managed and visualized. I’d like to suggest an alternative approach.

What if…

What if server names were the same across all environments?

Well, you wouldn’t need to change configuration as you moved the system between environments. That’s a good thing.

But how can that be? Wouldn’t there be a conflict if there were two machines with the same name?

The answer is that there wouldn’t be a conflict if the machines were on different networks. Not all machines have to be on the same network. We can set up as many networks / virtual networks as we like. And it is clear that we don’t need machines in one environment / network to talk to machines in another environment. I mean, under no circumstances would we want web servers in our test environment to talk to application servers in the production environment.

These separate networks provide much needed isolation, beyond solving the server naming problem.

In closing

It’s really a tiny thing when you think about – multiple networks. But that’s exactly why software developers overlook it so often – because it’s not a “software solution” to the configuration problem we perceive as a “software problem”.

I wrote about related multi-environment configuration issues in this earlier post: Convention over Configuration – The Next Generation

I’m happy to say that this functionality is now in NServiceBus called “profiles” and you can read more about how they work here.

How are you handling the flow of moving software through to production? Leave your comments below.

Posted in Architecture, Development, Management, Simplicity | 9 Comments »

CQRS isn’t the answer – it’s just one of the questions

Friday, May 7th, 2010

dont panic With the growing interest in Command/Query Responsibility Segregation (CQRS), more people are starting to ask questions about how to apply it to their applications. CQRS is actually in danger of reaching “best practice” status at which point in time people will apply it indiscriminately with truly terrible results.

One of the things that I’ve been trying to do with my presentations around the world on CQRS was to explain the why behind it, just as much as the what. The problem with the format of these presentations is that they’re designed to communicate a fairly closed message: here’s the problem, here’s how that problem manifests itself, here’s a solution.

In this post, I’m going to try to go deeper.

The hitchhiker’s guide to the galaxy

In this most excellent book, one of the things that struck me was the theme that made it’s way through the whole book – starting with the answer to life, the universe, and everything: 42. By the time you get to the end of the book, you find out that the real question to life, the universe, and everything is “what do you get when you multiply 6 by 9”. And that’s how the book leaves it.

To us engineers, we can’t just accept the fact that the book would say that 6*9 = 42 when we know it’s 54. After bashing our heads on the rigid rules of math, we realize that not all math problems are necessarily in base 10, and that if we switch to base 13, the number 42 is 4*13 + 2 = 54. So, the book was right – but that’s not the point.

What’s the point?

The hitchhiker’s guide is an example of a teaching technique which presents an apparent paradox, leaving the student to dig up unspoken and unthought assumptions in order to resolve it. Key to this technique are rigid rules which do not allow any compromise or shortcuts on the student’s part.

The purpose of this technique is not for the student to learn the answer, but to gain deeper understanding, which in turn changes the way they go about thinking about problems in the future.

So, when given the problem 4*5, we do not just immediately answer 20, instead we clarify in which numeric base the question is being phrased, and only then go to solve the problem. In base 13, the answer would be 17. In hex, the answer would be 14.

The externally visible change is that we know which questions to ask in order to arrive at the right answer – not that we know the answer ahead of time.

Making an “ass” out of “u” and “me”

Let’s start at the end – one of the unspoken assumptions that has been causing problems:

All businesses can be treated the same from the perspective of software.

In our previous example, we assumed that all math problems use base 10. It turns out that different bases are useful for different domains (like base 2 for computers). We can say similar things about degrees and radians in geometry. The more we look at the real world, the more we see this repeating itself. There’s no reason that software should be any different.

Base 10 is not a ubiquitous best practice. We shouldn’t be surprised that there really aren’t best practices for software either.

Here’s another problematic assumption:

“The business” can (and do) tell us what they need in a way we can understand.

So many software fads have been built on the quicksand of this assumption. OOAD – on verbs and nouns. 4GL and other visual tools that “the business” will use directly. SOA – on IT business alignment. I expect we haven’t seen the end of this.

Some of you may be wondering why this is false, others are sagely nodding their heads in agreement.

The myth of “the business”

Unless you have a single user, who is also the CEO paying for the development, there is no “the”. It’s an amalgam of people with different backgrounds, skills, and goals – there is no homogeneity. Even if no software was involved, many business organizations are dysfunctional with conflicting goals, policies, and politics.

To some extent, we technical people have hidden ourselves away in IT to avoid the scary world of business whose rules we don’t understand. With the rise in importance of information to the world, we’ve been pulled back – being forced to talk to people, and not just computers. Luckily, we’ve been able to create a buffer to insulate ourselves – we’ve taken the less successful technical people from our heard and nominated them “business analysts”. No, not all companies do it this way, but we do need to take a minute to reflect on how information flows between the business Mars into and out of the IT Venus.

On human communication

Even if we made this insulation layer more permeable, allowing and encouraging more technical people and business people to cross its boundary, we still need to deal with the problem of two humans communicating with each other. There are enough books that have been written on this topic, so I won’t go into that beyond recommending (strongly) to technical people to read (some of) them.

Rather, I’d like to focus on the environment in which these discussions take place. IT has been around long enough, and users have used computers long enough, that a certain amount of tainting has taken place. If the world was a trial, the evidence would have been thrown out as untrustworthy.

When users tell you what they want, they’re usually framing that with respect to the current system that they’re using. “Like the old system – but faster, and with better search, and more information on that screen, and…”

At this point, business analysts write down and formalize these “requirements” into some IT-sanctioned structure (use cases, user stories, whatever), at which point developers are told to build it. Users only know what they didn’t want when developers deliver exactly what was asked.

How can that be?

These are not the “requirements” you are looking for

Users ultimately dictate solutions to us, as a delta from the previous set of solutions we’ve delivered them. That’s just human psychology – writer’s block when looking at a blank page, as compared to the ease with which we provide “constructive criticism” on somebody else’s work.

We need to get the real requirements. We need to probe beyond the veneer:

Why do you need this additional screen?
What real-world trigger will cause you to open it?
Is there more than one trigger?
How are they different?
etc, etc, etc…

This is real work – different work than programming. It requires different skills. And that’s not even getting into the political navigation between competing organizational forces.

But let’s say that you don’t have (enough) people with these skills in your organization. What then?

Enter CQRS

CQRS gives us a set of questions to ask, and some rigid rules that our answers must conform to. If our answers don’t fit, we need to go back to the drawing board and move things around and/or go back to “the business” and seek deeper understanding there.

For each screen/task/piece of data:

Will multiple users be collaborating on data related to this task?
Look at every shred of raw data, not just at the entity level.
Are there business consistency requirements around groups of raw data?

If “the business” answers no – ask them if they see that answer changing, and if so, in what time frame, and why. What changing conditions in the business environment would cause that to change – what other parts of the system would need to be re-examined under those conditions.

After understanding all that and you find a true single-user-only-thing, then you can use standard “CRUD” techniques and technologies. There are no inherent time-propagation problems in a single-user environment – so eventual consistency is beyond pointless, it actually makes matters worse.

On the other hand, if the business-data-space is collaborative, the inherent time-propagation of information between actors means they will be making decisions on data that isn’t up-to-the-millisecond-accurate anyway. This is physics, gravity – you can’t fight it (and win).

The rule for collaboration

Actors must be able submit one-way commands that will fail only under exceptional business circumstances.

The challenge we have is how to achieve the real business objectives uncovered in our previous “requirements excavation” activities and follow this rule at the same time. This will likely involve a different user-system interaction than those implemented in the past. UI design is part of the solution domain – it shouldn’t be dictated by the business (otherwise it’s like someone asking you to run a marathon, but also dictating how you do so, like by tying your shoelaces together).

Many of the technical patterns I described in my previous blog post describe the tools involved. BTW, hackers can be considered “exceptional actors” – the business actually wants their commands to fail.

In Summary

The hard and fast rule of CQRS about one-way commands is relevant for collaborative domains only. This domain has inherent eventual consistency – in the real world. Taking that and baking it into our solution domain is how we align with the business.

The process we go through, until ultimately arriving at one-way-almost-always-successful-commands is business analysis. Rejecting pre-formulated solutions, truly understanding the business drivers, and then representing those as directly as possible in our solution domain – that’s our job.

After doing this enough times and/or in more than one business domain, we may gain the insight that there is no cookie-cutter, one-size-fits-all, best-practice solution architecture for everything. Each problem domain is distinct and different – and we need to understand the details, because they should shape the resulting software structure.

The next time the business tell us to implement 42, we’ll use CQRS along with other questioning techniques until we can get “6 x 9” out of them, learning from the exercise what are the significant and stable parts of the business – ultimately helping us to “build the right system, and to build the system right”.

Don’t Panic 🙂

Posted in Architecture, CQRS, Development, Management, Simplicity | 24 Comments »

On Design for Testability

Sunday, April 18th, 2010

keeping balance Almost at every conference, event, training, or consulting engagement someone asks for my opinion on the whole design for testability thing. I’m not quite sure why I haven’t blogged on this topic, especially at the time that a lot of the other bloggers were weighing in, but better late than never.

Before getting into that, I want to start with a slightly broader scope of discussion.

You see, I get asked about “best practices” on all sorts of things. And I try not to be the kind of consultant that responds with “it depends”, but the context of the question often makes the answer irrelevant. And the unspoken context of a best-practice question is:

Given infinite time and budget

The biggest problem that I see with well-intentioned, best-practices-following developers and architects is that they don’t ask the question “is this the right thing for us to be focusing on right now?” Understandably, that is a difficult question to answer – but it needs to be asked, since you don’t have infinite time or budget to do everything according to best practices (assuming those even exist).

About testing

The biggest issue I have with the “design for testability” topic is the extremely narrow view it takes of the word “testability”, usually in the form of more code written by a developer which invokes the production code of the system, also known as “unit tests”.

There are many different kinds of testing – unit, integration, functional, load, performance, exploratory, etc… where some may be automated and others not. Should we not discuss what “design for testability” means for not-just-unit-testing?

And what’s the point of testing anyway?

It’s not to find bugs.

Research has shown that testing (of all kinds) is not the most effective way of finding bugs. I don’t have the reference handy but I’m pretty sure that it’s from Alistair Cockburn’s work. Code reviews are (on average) about 60% more effective.

Don’t get me wrong – testing can provide indications that the software has bugs in it, but not necessarily where in the code those bugs are.

The purpose of testing is to provide quantitative and qualitative information about the system that can help various stakeholders in their decision-making processes. The relevance of that information indicates the quality of the testing. Here are some examples:

The system supports 100 concurrent users, with the expected user-type distribution (X% role A, Y% role B, etc), performing expected use-case distributions, and collaboration scenarios.
Time to proficiency for new users in role A is expected to be 3 days
Alternate #2 of use case #12 fails on step #3

As you can see, the relevance of the above information is dependent on what decisions the various stakeholders need to make. The bullet on load can help us decide if more machines are needed or if developers need to tune the performance of the systems. The bullet on time to proficiency can help us decide if larger investment in usability is required. Information like the last bullet can be used in conjunction with the first two to decide on the timing and type of a release.

The timeliness of this relevant information is critical to the success of a project.

Choosing which and how much of the various testing activities to perform when is something that needs to be revisited several times throughout the lifetime of a project, taking into account the current risks (threats and probabilities) and time and resource investment to mitigate them.

Let me reiterate – we’re not going to have enough time to do everything.

On iterations

If the only part of your organization that is doing iterations are your developers, you’re not agile.

In order to capitalize on the information that testers are providing, you need them in your iterations.

The same goes for the other roles involved in the project – business analysts, DBAs, sysadmins, etc.

I know that 99% of organizations aren’t structured in a way to do this.

I never said doing this would be easy.

On design

Figuring out what kind of design and how much to do when is just as important, and just as hard. Design for testability is one part of that, but not the only one, or necessarily the most important one at any point of time.

Within that design for testability topic is the “design for unit-testing” sub-topic which seems to be the popular one. Before getting into the design aspects of it, let’s take a closer look at the unit-testing side of things.

On unit-testing

The assumption is that having more unit tests will lead to a code-base with less bugs, thus requiring shorter time to get the system into production, which will pay back the time it took to write those unit tests to begin with.

In practice, what tends to happen is that as development progresses, testing code breaks as the structure of the production code changes. Now one of two things happens – either the testing code is removed or rewritten. In either case, we didn’t get the return on investment we expected on the first bit of testing code. Unfortunately, rare is the case where the relevant people in the organization understand why, resulting in the same situation repeating itself over and over again.

Those projects would have been better off without unit testing, though the organization as a whole might have used those experiences to learn and improve. It’s been my experience that if the organization wasn’t conscious enough in the context of the project to notice the situation, it is unlikely to do so at higher levels.

On fragile unit tests

The reason that a unit test ends up being rewritten (or removed) is that its code was coupled to the production code in such a way that it broke when the production code changed. This tendency to break (fragility) is a critical property of a unit test. A fragile unit test will slow down a developer doing work on some existing code – it actually makes the system less maintainable.

For a unit test code to be stable (not fragile) it needs to be coupled to stable properties of the production code. The question of whether the production code is designed in such a way that it has stable properties – is a design question. Is it a unit? If not, you will not be able to write a unit-test against it.

And anyway, who said that every class is a unit, or should be a unit? Domain models (when done right) are good examples of a unit, yet the classes that make them up may not be units. Unit-testing should only be attempted with things which are units.

I think too much weight is put on whether a dependency of a class is a concrete or interface type, and not nearly enough on the nature of the dependency. I wouldn’t blame the hammer for pounding my thumb, and by the same token I think that blame should not be directed towards tools like those from TypeMock.

On tools

There is so much more depth to both design and testability that needs to be more broadly understood. No tool has yet been created to handle either design or testing in such a way that humans can give up responsibility for the outcome.

Over the years I’ve noticed that tools are most significant when used by skilled practitioners, which makes sense in retrospect. Giving a novice carpenter a laser-guided saw probably won’t significantly change the outcome of their work. Ultimately, the skilled practitioners are the ones that create tools – not the novices. And no tool, no matter how advanced, will make a novice perform at levels like the skilled practitioner.

In the case of a project too big for a single skilled practitioner to complete in the time required (or at all), the balance of importance shifts away from tools to the project management topics described above.

In summary

I hope that this post has shed some light on the context in which decisions with respect to testing need to be made. Design is one activity that can support certain kinds of testing, but not the only one, or even the most important one for the given type of testing necessary at that time in the project.

Design is hard. Project management is hard. Testing is hard.

Getting the right mix of people that together have enough experience and skills in these activities isn’t easy.

Don’t expect that sprinkling some interfaces in your code base will be enough.
That doesn’t count much in the way of design, just as writing code in a testing namespace doesn’t count much in the way of testability.

Looking forward to hearing your comments.

Posted in Agile, Architecture, Development, Management, Projects, TDD, Testing, The Team | 20 Comments »

On Small Applications

Sunday, March 7th, 2010

small I hear this too often: “X sounds like a great pattern, but it’s overkill for small applications”. Many patterns have been subjected to this including (but not limited to): SOA, DDD, CQRS, ORM, etc. Often the statement is made by a person without experience in the given pattern (though possibly experienced in other patterns). Let’s take a look at the second part – the “small application”, and ask:

What makes an app small?

Or inversely, what makes an app warrant the “enterprise” moniker?

If there’s one thing that the history of our industry has shown repeatedly, it’s that developers aren’t particularly accurate with their estimates. Like, orders-of-magnitude inaccurate. Knowing this, it’s surprising that the “small app” argument seems to win so many arguments. The same goes for justifications in the form of “we’ve got to have an X, this is a BIG project”.

So, what makes an app small?

Is it a small number of lines of code? Well, what if those lines of code are keeping planes in the air?

Is it a small number of developers? Same as above. Actually, history has shown that some of the most valuable bits of code written were done by small numbers of developers.

Is it that it will only be installed on a single machine?

Is it…

What could it be?

The real issue

The small app argument is a diversionary tactic.

Loosely translated, it means “I’m comfortable where I am and I don’t want to change”.

Moving on…

The real story of size

Once we actually look at the specific context of an app, we tend to see that someone cares a great deal about it, enough to finance its custom development – rather than buying an off-the-shelf alternative. The expected lifetime of business use is easily 3-5 years, if not 7-10, during which many enhancements will likely be requested. Thus, some non-functional properties of the code matter – at the very least maintainability.

In which case, if the given pattern or approach does significantly improve the desired non-functional properties of the app, it only makes sense to use it.

There is one class of software that might possibly be treated as “small” – the one-off script that’s written to automate some IT task. And even then, so many of these scripts end up living longer than the apps themselves that they should be engineered at the same level of quality.

In closing

Don’t counter a “small app” argument with psychology.
It will only make matters worse.

Instead, rephrase the issue around the lifetime of business use.

I’ve found that there are precious few cases where the harsh light of reality doesn’t help the appropriate decisions be made. If indeed this is a small-lifetime-app, just drag-and-drop until you’re done. Otherwise, the time it takes to understand and evaluate the applicability of the given patterns will definitely pay itself back many times over the life of the app.

And managers, keep your ears open for it. The technical risks behind that statement are icebergs waiting to sink your project.

* with thanks to Mike Nichols for pushing my buttons.

Posted in Architecture, Development, Management, The Team | 20 Comments »

CQRS Video Online

Friday, February 26th, 2010

A couple of weeks ago I gave a talk on Command/Query Responsibility Segregation in London.

The recording of the talk is online here.

There is one important thing that I didn’t have enough time to cover, but I want you to keep in mind as you’re watching this. It is that CQRS is applicable only *within* the context of a single service/BC – NOT across or between them.

Let me know what you think.

Posted in Architecture, Community, CQRS, Messaging, Presentations, Pub/Sub, Scalability, Validation | 19 Comments »

Non-functional Architectural Woes

Tuesday, January 12th, 2010

Non-functional architecture
As I sit here in the lounge at Bogota airport waiting for my delayed flight, I remembered something interesting that came up in my 2-week training/consulting in Cali. It’s not a question that came up, or anything like that. It was that I suddenly noticed a pattern in many of my consulting and training clients over the past years. And as I thought about it, I realized that it was prevalent in our industry as a whole – in the literature, on the web, everywhere.

It’s how people think about functional and non-functional requirements.

The problem with categorization

There’s nothing wrong with categorizing requirements as either functional or non-functional.

The problem is that people project that categorization from the problem domain to the solution domain as well.

There is an impression that architecture and technology choices are, to a large extent, based on non-functional requirements, and only on non-functional requirements.

Here are some examples:

Extensibility: Workflow/BPM engine , DSL, Plug-in framework, etc
Scalability: Message/Service Bus, Database (NoSQL camp, I’m looking at you too), etc
High Availability: See scalability

Too many times have I noticed architects so focused on these issues that they all but ignore the functional requirements and the business objectives of the stakeholders.

Not to place an unfair portion of the blame there, the vendors have been perpetuating the above fallacies to sell more and/or new products. Given the enormous influence of the big vendors (conferences, training, etc) it isn’t any wonder that architects in the field use the vendors’ “best practices”.

The problem that arises from this kind of thinking is a shoe-horning of functional requirements into the architecture decided upon entirely in the context of non-functional requirements.

Functional Earthquakes

Stable architecture cannot be created based entirely on non-functional requirements. There are functional requirements that can shake the foundation of a technically-oriented architecture.

Let’s take the canonical layered architecture with its normalized database. Now we get a new requirement in the form of:

“As a supplier, when I log-in, I want to see on my home page my most recent purchase orders, grouped by highest value retailer (total historical purchase order value), sorted according to requested time of delivery.”

In order for a developer to implement this according to the architectural guidelines of normalization and layering here’s what they do: join retailers who have an agreement with the supplier (join between supplier, retailer, and agreement tables), join the purchase orders and their lines, (ignoring tax for now), sum the line value and group by retailer, and use as an input to the purchase order table joining again, filter in last 24 hours, sort by time of delivery.

So, in our normalized database, we have many millions of purchase orders, hundreds of millions of lines, that we’re joining against each other as well as several other tables.

After this new feature has been implemented, any time a supplier sees their home page, the system stops accepting purchase orders until the home page has been rendered several minutes later.

Can we really say that our architecture is stable if a single functional requirement can undo all of it’s non-functional properties?

Obviously not – but the question the architect (and his boss) are asking is, how did this happen? And if it happened once, can it happen again?

Lessons Learned – sorta

So a reporting database is introduced so that all complex queries like those performed above won’t prevent the system from accepting new purchase orders. A nightly batch moves data from the normalized DB to the reporting DB. Sounds good – non-functionally.

And then a new requirement comes in for handling rush-orders.

So we do an hourly batch. But now these batches start to cause hiccups to our transactional system, which gets backed up, and when released, allocates many more threads to deal with the pending load, and the increased concurrency in the databases sharply increases the number of deadlocks, further backing up the system, until the system becomes effectively unavailable.

Can this be happening again? A functional requirement undoing all of our technically elegant non-functional architectural decisions?

Now what?

The technology is blamed. We should have never counted on SQL Server to handle our kind of *enterprise* requirements. Let’s move to Oracle – it’s *unbreakable*. (Several months and functionalities later) We should have never counted on a database to handle our kind of enterprise requirements. Let’s introduce an *Enterprise Service Bus*. (Several months and functionalities later) We should have never counted on our internal IT to host this. Let’s move to *The Cloud*. (Several months and functionalities later, looking at the bill from our cloud provider) We should have never used .NET for our application because it requires the more expensive Windows cloud. Let’s rewrite on Linux to reduce our cloud costs.

By this time, all the people who originally worked on the project aren’t there anymore. And the cycle continues with limited memory of where we started and how we got here.

Soon, soon, we’ll find that non-functional silver bullet that will make all of our problems go away.

These are not the droids you are looking for

They really aren’t.

If we want our architecture to be stable, we need to base it on stable abstractions. The only thing is that there aren’t any inherently stable abstractions in the solution domain (as we’ve had the chance to witness). That really only leaves one other place to look for them – in the problem domain, also known as the functional requirements.

But functional requirements change all the time! Wasn’t that what got us into this mess to begin with?

Indeed, but in between the functional requirements and behind them is something that is quite stable: the stakeholders business objectives.

The supply chain will continue to strive to optimize itself. To shorten the time for an order to be fulfilled. To decrease the amount of inventory that a retailer holds. To choose the best set of suppliers for our product catalog. To recognize which retailers give me the most business and serve them better. To identify high potential retailers – big retailers (like Walmart) who aren’t buying as much from me as other retailers.

This is how the business has been done for decades and will continue to be done for decades more.

If we could find a way to capture those stable elements and represent them as core elements in our architectural structure, and then balance the non-functional requirements within those functional contexts, maybe, just maybe, our architecture will stand the test of time.

More to come…

Posted in Architecture | 27 Comments »

Scalability Podcast on Herding Code

Monday, January 11th, 2010

The great folks over at Herding Code were nice enough to interview me back in November as I was over in Paris giving my 5-day SOA course. We talked about quite a lot of topics related to scalability.

Click here for the full list of topics and to download the podcast.

Let me know what you think or any questions you may have in the comments.

Posted in Architecture, Community, Podcast, Scalability | No Comments »

Clarified CQRS

Wednesday, December 9th, 2009

After listening how the community has interpreted Command-Query Responsibility Segregation I think that the time has come for some clarification. Some have been tying it together to Event Sourcing. Most have been overlaying their previous layered architecture assumptions on it. Here I hope to identify CQRS itself, and describe in which places it can connect to other patterns.

Download as PDF – this is quite a long post.

Why CQRS

Before describing the details of CQRS we need to understand the two main driving forces behind it: collaboration and staleness.

Collaboration refers to circumstances under which multiple actors will be using/modifying the same set of data – whether or not the intention of the actors is actually to collaborate with each other. There are often rules which indicate which user can perform which kind of modification and modifications that may have been acceptable in one case may not be acceptable in others. We’ll give some examples shortly. Actors can be human like normal users, or automated like software.

Staleness refers to the fact that in a collaborative environment, once data has been shown to a user, that same data may have been changed by another actor – it is stale. Almost any system which makes use of a cache is serving stale data – often for performance reasons. What this means is that we cannot entirely trust our users decisions, as they could have been made based on out-of-date information.

Standard layered architectures don’t explicitly deal with either of these issues. While putting everything in the same database may be one step in the direction of handling collaboration, staleness is usually exacerbated in those architectures by the use of caches as a performance-improving afterthought.

A picture for reference

I’ve given some talks about CQRS using this diagram to explain it:

CQRS

The boxes named AC are Autonomous Components. We’ll describe what makes them autonomous when discussing commands. But before we go into the complicated parts, let’s start with queries:

Queries

If the data we’re going to be showing users is stale anyway, is it really necessary to go to the master database and get it from there? Why transform those 3rd normal form structures to domain objects if we just want data – not any rule-preserving behaviors? Why transform those domain objects to DTOs to transfer them across a wire, and who said that wire has to be exactly there? Why transform those DTOs to view model objects?

In short, it looks like we’re doing a heck of a lot of unnecessary work based on the assumption that reusing code that has already been written will be easier than just solving the problem at hand. Let’s try a different approach:

How about we create an additional data store whose data can be a bit out of sync with the master database – I mean, the data we’re showing the user is stale anyway, so why not reflect in the data store itself. We’ll come up with an approach later to keep this data store more or less in sync.

Now, what would be the correct structure for this data store? How about just like the view model? One table for each view. Then our client could simply SELECT * FROM MyViewTable (or possibly pass in an ID in a where clause), and bind the result to the screen. That would be just as simple as can be. You could wrap that up with a thin facade if you feel the need, or with stored procedures, or using AutoMapper which can simply map from a data reader to your view model class. The thing is that the view model structures are already wire-friendly, so you don’t need to transform them to anything else.

You could even consider taking that data store and putting it in your web tier. It’s just as secure as an in-memory cache in your web tier. Give your web servers SELECT only permissions on those tables and you should be fine.

Query Data Storage

While you can use a regular database as your query data store it isn’t the only option. Consider that the query schema is in essence identical to your view model. You don’t have any relationships between your various view model classes, so you shouldn’t need any relationships between the tables in the query data store.

So do you actually need a relational database?

The answer is no, but for all practical purposes and due to organizational inertia, it is probably your best choice (for now).

Scaling Queries

Since your queries are now being performed off of a separate data store than your master database, and there is no assumption that the data that’s being served is 100% up to date, you can easily add more instances of these stores without worrying that they don’t contain the exact same data. The same mechanism that updates one instance can be used for many instances, as we’ll see later.

This gives you cheap horizontal scaling for your queries. Also, since your not doing nearly as much transformation, the latency per query goes down as well. Simple code is fast code.

Data modifications

Since our users are making decisions based on stale data, we need to be more discerning about which things we let through. Here’s a scenario explaining why:

Let’s say we have a customer service representative who is one the phone with a customer. This user is looking at the customer’s details on the screen and wants to make them a ‘preferred’ customer, as well as modifying their address, changing their title from Ms to Mrs, changing their last name, and indicating that they’re now married. What the user doesn’t know is that after opening the screen, an event arrived from the billing department indicating that this same customer doesn’t pay their bills – they’re delinquent. At this point, our user submits their changes.

Should we accept their changes?

Well, we should accept some of them, but not the change to ‘preferred’, since the customer is delinquent. But writing those kinds of checks is a pain – we need to do a diff on the data, infer what the changes mean, which ones are related to each other (name change, title change) and which are separate, identify which data to check against – not just compared to the data the user retrieved, but compared to the current state in the database, and then reject or accept.

Unfortunately for our users, we tend to reject the whole thing if any part of it is off. At that point, our users have to refresh their screen to get the up-to-date data, and retype in all the previous changes, hoping that this time we won’t yell at them because of an optimistic concurrency conflict.

As we get larger entities with more fields on them, we also get more actors working with those same entities, and the higher the likelihood that something will touch some attribute of them at any given time, increasing the number of concurrency conflicts.

If only there was some way for our users to provide us with the right level of granularity and intent when modifying data. That’s what commands are all about.

Commands

A core element of CQRS is rethinking the design of the user interface to enable us to capture our users’ intent such that making a customer preferred is a different unit of work for the user than indicating that the customer has moved or that they’ve gotten married. Using an Excel-like UI for data changes doesn’t capture intent, as we saw above.

We could even consider allowing our users to submit a new command even before they’ve received confirmation on the previous one. We could have a little widget on the side showing the user their pending commands, checking them off asynchronously as we receive confirmation from the server, or marking them with an X if they fail. The user could then double-click that failed task to find information about what happened.

Note that the client sends commands to the server – it doesn’t publish them. Publishing is reserved for events which state a fact – that something has happened, and that the publisher has no concern about what receivers of that event do with it.

Commands and Validation

In thinking through what could make a command fail, one topic that comes up is validation. Validation is different from business rules in that it states a context-independent fact about a command. Either a command is valid, or it isn’t. Business rules on the other hand are context dependent.

In the example we saw before, the data our customer service rep submitted was valid, it was only due to the billing event arriving earlier which required the command to be rejected. Had that billing event not arrived, the data would have been accepted.

Even though a command may be valid, there still may be reasons to reject it.

As such, validation can be performed on the client, checking that all fields required for that command are there, number and date ranges are OK, that kind of thing. The server would still validate all commands that arrive, not trusting clients to do the validation.

Rethinking UIs and commands in light of validation

The client can make of the query data store when validating commands. For example, before submitting a command that the customer has moved, we can check that the street name exists in the query data store.

At that point, we may rethink the UI and have an auto-completing text box for the street name, thus ensuring that the street name we’ll pass in the command will be valid. But why not take things a step further? Why not pass in the street ID instead of its name? Have the command represent the street not as a string, but as an ID (int, guid, whatever).

On the server side, the only reason that such a command would fail would be due to concurrency – that someone had deleted that street and that that hadn’t been reflected in the query store yet; a fairly exceptional set of circumstances.

Reasons valid commands fail and what to do about it

So we’ve got a well-behaved client that is sending valid commands, yet the server still decides to reject them. Often the circumstances for the rejection are related to other actors changing state relevant to the processing of that command.

In the CRM example above, it is only because the billing event arrived first. But “first” could be a millisecond before our command. What if our user pressed the button a millisecond earlier? Should that actually change the business outcome? Shouldn’t we expect our system to behave the same when observed from the outside?

So, if the billing event arrived second, shouldn’t that revert preferred customers to regular ones? Not only that, but shouldn’t the customer be notified of this, like by sending them an email? In which case, why not have this be the behavior for the case where the billing event arrives first? And if we’ve already got a notification model set up, do we really need to return an error to the customer service rep? I mean, it’s not like they can do anything about it other than notifying the customer.

So, if we’re not returning errors to the client (who is already sending us valid commands), maybe all we need to do on the client when sending a command is to tell the user “thank you, you will receive confirmation via email shortly”. We don’t even need the UI widget showing pending commands.

Commands and Autonomy

What we see is that in this model, commands don’t need to be processed immediately – they can be queued. How fast they get processed is a question of Service-Level Agreement (SLA) and not architecturally significant. This is one of the things that makes that node that processes commands autonomous from a runtime perspective – we don’t require an always-on connection to the client.

Also, we shouldn’t need to access the query store to process commands – any state that is needed should be managed by the autonomous component – that’s part of the meaning of autonomy.

Another part is the issue of failed message processing due to the database being down or hitting a deadlock. There is no reason that such errors should be returned to the client – we can just rollback and try again. When an administrator brings the database back up, all the message waiting in the queue will then be processed successfully and our users receive confirmation.

The system as a whole is quite a bit more robust to any error conditions.

Also, since we don’t have queries going through this database any more, the database itself is able to keep more rows/pages in memory which serve commands, improving performance. When both commands and queries were being served off of the same tables, the database server was always juggling rows between the two.

Autonomous Components

While in the picture above we see all commands going to the same AC, we could logically have each command processed by a different AC, each with it’s own queue. That would give us visibility into which queue was the longest, letting us see very easily which part of the system was the bottleneck. While this is interesting for developers, it is critical for system administrators.

Since commands wait in queues, we can now add more processing nodes behind those queues (using the distributor with NServiceBus) so that we’re only scaling the part of the system that’s slow. No need to waste servers on any other requests.

Service Layers

Our command processing objects in the various autonomous components actually make up our service layer. The reason you don’t see this layer explicitly represented in CQRS is that it isn’t really there, at least not as an identifiable logical collection of related objects – here’s why:

In the layered architecture (AKA 3-Tier) approach, there is no statement about dependencies between objects within a layer, or rather it is implied to be allowed. However, when taking a command-oriented view on the service layer, what we see are objects handling different types of commands. Each command is independent of the other, so why should we allow the objects which handle them to depend on each other?

Dependencies are things which should be avoided, unless there is good reason for them.

Keeping the command handling objects independent of each other will allow us to more easily version our system, one command at a time, not needing even to bring down the entire system, given that the new version is backwards compatible with the previous one.

Therefore, keep each command handler in its own VS project, or possibly even in its own solution, thus guiding developers away from introducing dependencies in the name of reuse (it’s a fallacy). If you do decide as a deployment concern, that you want to put them all in the same process feeding off of the same queue, you can ILMerge those assemblies and host them together, but understand that you will be undoing much of the benefits of your autonomous components.

Whither the domain model?

Although in the diagram above you can see the domain model beside the command-processing autonomous components, it’s actually an implementation detail. There is nothing that states that all commands must be processed by the same domain model. Arguably, you could have some commands be processed by transaction script, others using table module (AKA active record), as well as those using the domain model. Event-sourcing is another possible implementation.

Another thing to understand about the domain model is that it now isn’t used to serve queries. So the question is, why do you need to have so many relationships between entities in your domain model?

(You may want to take a second to let that sink in.)

Do we really need a collection of orders on the customer entity? In what command would we need to navigate that collection? In fact, what kind of command would need any one-to-many relationship? And if that’s the case for one-to-many, many-to-many would definitely be out as well. I mean, most commands only contain one or two IDs in them anyway.

Any aggregate operations that may have been calculated by looping over child entities could be pre-calculated and stored as properties on the parent entity. Following this process across all the entities in our domain would result in isolated entities needing nothing more than a couple of properties for the IDs of their related entities – “children” holding the parent ID, like in databases.

In this form, commands could be entirely processed by a single entity – viola, an aggregate root that is a consistency boundary.

Persistence for command processing

Given that the database used for command processing is not used for querying, and that most (if not all) commands contain the IDs of the rows they’re going to affect, do we really need to have a column for every single domain object property? What if we just serialized the domain entity and put it into a single column, and had another column containing the ID? This sounds quite similar to key-value storage that is available in the various cloud providers. In which case, would you really need an object-relational mapper to persist to this kind of storage?

You could also pull out an additional property per piece of data where you’d want the “database” to enforce uniqueness.

I’m not suggesting that you do this in all cases – rather just trying to get you to rethink some basic assumptions.

Let me reiterate

How you process the commands is an implementation detail of CQRS.

Keeping the query store in sync

After the command-processing autonomous component has decided to accept a command, modifying its persistent store as needed, it publishes an event notifying the world about it. This event often is the “past tense” of the command submitted:

MakeCustomerPerferredCommand -> CustomerHasBeenMadePerferredEvent

The publishing of the event is done transactionally together with the processing of the command and the changes to its database. That way, any kind of failure on commit will result in the event not being sent. This is something that should be handled by default by your message bus, and if you’re using MSMQ as your underlying transport, requires the use of transactional queues.

The autonomous component which processes those events and updates the query data store is fairly simple, translating from the event structure to the persistent view model structure. I suggest having an event handler per view model class (AKA per table).

Here’s the picture of all the pieces again:

CQRS

Bounded Contexts

While CQRS touches on many pieces of software architecture, it is still not at the top of the food chain. CQRS if used is employed within a bounded context (DDD) or a business component (SOA) – a cohesive piece of the problem domain. The events published by one BC are subscribed to by other BCs, each updating their query and command data stores as needed.

UI’s from the CQRS found in each BC can be “mashed up” in a single application, providing users a single composite view on all parts of the problem domain. Composite UI frameworks are very useful for these cases.

Summary

CQRS is about coming up with an appropriate architecture for multi-user collaborative applications. It explicitly takes into account factors like data staleness and volatility and exploits those characteristics for creating simpler and more scalable constructs.

One cannot truly enjoy the benefits of CQRS without considering the user-interface, making it capture user intent explicitly. When taking into account client-side validation, command structures may be somewhat adjusted. Thinking through the order in which commands and events are processed can lead to notification patterns which make returning errors unnecessary.

While the result of applying CQRS to a given project is a more maintainable and performant code base, this simplicity and scalability require understanding the detailed business requirements and are not the result of any technical “best practice”. If anything, we can see a plethora of approaches to apparently similar problems being used together – data readers and domain models, one-way messaging and synchronous calls.

Although this blog post is over 3000 words (a record for this blog), I know that it doesn’t go into enough depth on the topic (it takes about 3 days out of the 5 of my Advanced Distributed Systems Design course to cover everything in enough depth). Still, I hope it has given you the understanding of why CQRS is the way it is and possibly opened your eyes to other ways of looking at the design of distributed systems.

Questions and comments are most welcome.

Posted in Architecture, Autonomous Services, Business Rules, CQRS, Messaging, Pub/Sub, Scalability, Validation | 139 Comments »

Search and Messaging

Sunday, November 1st, 2009

One question that I get asked about quite a bit with relation to messaging is about search. Isn’t search inherently request/response? Doesn’t it have to return immediately? Wouldn’t messaging in this case hurt our performance?

While I tend to put search in the query camp in the when keeping the responsibility of commands and queries separate, and often recommend that those queries be done without messaging, there are certain types of search where messaging does make sense.

In this post, I’ll describe certain properties of the problem domain that make messaging a good candidate for a solution.

Searching is besides the point – Finding is what it’s all about

Remember that search is only a means to an end in the eyes of the user – they want to find something. One of the difficulties we users have is expressing what we want to find in ways that machines can understand.

In thinking about how we build systems to interact with users, we need to take this fuzziness into account. The more data that we have, the less homogeneous it is, the harder this problem becomes.

When talking about speed, while users are sensitive to the technical interactivity, the thing that matters most is the total time it takes for them to find what they want. If the result of each search screen pops up in 100ms, but the user hasn’t found what they’re looking for after clicking through 20 screens, the search function is ultimately broken.

Notice that the finding process isn’t perceived as “immediate” in the eyes of the user – the evaluation they do in their heads of the search results is as much a part of finding as the search itself.

Also, if the user needs to refine their search terms in order to find what they want, we’re now talking about a multi-request/multi-response process. There is nothing in the problem domain which indicates that finding is inherently request/response.

Relationships in the data

When bringing back data as the result of a search, what we’re saying is that there is a property which is the same across the result elements. But there may be more than one such property. For example, if we search for “blue” on Google Images, we get back pictures of the sky, birds, flowers, and more. Obvious so far – but let’s exploit the obvious a bit.

When the user sees that too many irrelevant results come back, they’ll want to refine their search. One way they can do that is to perform a new search and put in a more specific search phrase – like “blue sky”. Another way is for them to indicate this is by selecting an image and saying “not like this” or “more of these”. Then we can use the additional properties we know about those images to further refine the result group – either adding more images of one kind, or removing images of another.

Here’s something else that’s obvious:

Users often click or change their search before the entire result screen is shown.

It’s beginning to sound like users are already interacting with search in an asynchronous manner. What if we actually designed a system that played to that kind of interaction model?

Data-space partitioning

Once we accept the fact that the user is willing to have more results appear in increments, we can talk about having multiple servers processing the search in parallel. For large data spaces, it is unlikely for us to be able to store all the required meta data for search on one server anyway.

All we really need is a way to index these independent result-sets so that the user can access them. This can be done simply by allocating a GUID/UUID for the search request and storing the result-sets along with that ID.

Browser interaction

When the browser calls a server with the search request the first time, that server allocates an ID to that request, returns a URL containing that ID to the browser, and publishes an event containing the search term and the ID. Each of our processing nodes is subscribed to that event, performs the search on its part of the data-space, and writes its results (likely to a distributed cache) along with that ID.

The browser polls the above URL, which queries the cache (give me everything with this ID), and the browser sees which resources have been added since the last time it polled, and shows them to the user.

If the user clicks “more of these”, that initiates a new search request to the server, which follows the same pattern as before, just that the system is able to pull more relevant information. When implementing “not like this”, this performs a similar search but, instead of adding to the list of items shown, we’re removing items from the list shown based on the response from the server.

In this kind of user-system interaction model, having the user page through the result set doesn’t make very much sense as we’re not capturing the intent of the user, which is “you’re not showing me what I want”. By making it easy for the user to fine tune the result set, we get them closer to finding what they want. By performing work in parallel in a non-blocking manner on smaller sets of data, we greatly decrease the “time to first byte” as well as the time when the user can refine their search.

But Google doesn’t work like that

I know that this isn’t like the search UI we’ve all grown used to.

But then again, the search that you’re providing your users is more specific – not just pages on the web. If you’re a retailer allowing your users to search for a gift, this kind of “more like this, less like that” model is how users would interact with a real sales-person when shopping in a store. Why not model your system after the ways that people behave in the real world?

In closing

If we were to try to make use of messaging underneath “classical” search interaction models, it probably wouldn’t have been the greatest fit. If all we’re doing at a logical level is blocking RPC, then messaging would probably make the system slower. The real power that you get from messaging is being able to technically do things in parallel – that’s how it makes things faster. If you can find ways to see that parallelism in your problem domain, not only will messaging make sense technically – it will really be the only way to build that kind of system.

Learning how to disconnect from seeing the world through the RPC-tinted glasses of our technical past takes time. Focusing on the problem domain, seeing it from the user’s perspective without any technical constraints – that’s the key to finding elegant solutions. More often than not, you’ll see that the real world is non-blocking and parallel, and then you’ll be able to make the best use of messaging and other related patterns.

What are your thought? Post a comment and let me know.

Posted in Architecture, Caching, EDA, ESB, Messaging, Usability | 8 Comments »

« Previous Entries

Next Entries »

Recommendations

Bryan Wheeler, Director Platform Development at msnbc.com
“Udi Dahan is the real deal.

We brought him on site to give our development staff the 5-day “Advanced Distributed System Design” training. The course profoundly changed our understanding and approach to SOA and distributed systems.

Consider some of the evidence: 1. Months later, developers still make allusions to concepts learned in the course nearly every day 2. One of our developers went home and made her husband (a developer at another company) sign up for the course at a subsequent date/venue 3. Based on what we learned, we’ve made constant improvements to our architecture that have helped us to adapt to our ever changing business domain at scale and speed If you have the opportunity to receive the training, you will make a substantial paradigm shift.

If I were to do the whole thing over again, I’d start the week by playing the clip from the Matrix where Morpheus offers Neo the choice between the red and blue pills. Once you make the intellectual leap, you’ll never look at distributed systems the same way.

Beyond the training, we were able to spend some time with Udi discussing issues unique to our business domain. Because Udi is a rare combination of a big picture thinker and a low level doer, he can quickly hone in on various issues and quickly make good (if not startling) recommendations to help solve tough technical issues.” November 11, 2010

Sam Gentile, Independent WCF & SOA Expert
“Udi, one of the great minds in this area.
A man I respect immensely.”

Ian Robinson, Principal Consultant at ThoughtWorks
"Your blog and articles have been enormously useful in shaping, testing and refining my own approach to delivering on SOA initiatives over the last few years. Over and against a certain 3-layer-application-architecture-blown-out-to- distributed-proportions school of SOA, your writing, steers a far more valuable course."

Shy Cohen, Senior Program Manager at Microsoft
“Udi is a world renowned software architect and speaker. I met Udi at a conference that we were both speaking at, and immediately recognized his keen insight and razor-sharp intellect. Our shared passion for SOA and the advancement of its practice launched a discussion that lasted into the small hours of the night.
It was evident through that discussion that Udi is one of the most knowledgeable people in the SOA space. It was also clear why – Udi does not settle for mediocrity, and seeks to fully understand (or define) the logic and principles behind things.
Humble yet uncompromising, Udi is a pleasure to interact with.”

Glenn Block, Senior Program Manager - WCF at Microsoft
“I have known Udi for many years having attended his workshops and having several personal interactions including working with him when we were building our Composite Application Guidance in patterns & practices. What impresses me about Udi is his deep insight into how to address business problems through sound architecture. Backed by many years of building mission critical real world distributed systems it is no wonder that Udi is the best at what he does. When customers have deep issues with their system design, I point them Udi's way.”

Karl Wannenmacher, Senior Lead Expert at Frequentis AG
“I have been following Udi’s blog and podcasts since 2007. I’m convinced that he is one of the most knowledgeable and experienced people in the field of SOA, EDA and large scale systems.
Udi helped Frequentis to design a major subsystem of a large mission critical system with a nationwide deployment based on NServiceBus. It was impressive to see how he took the initial architecture and turned it upside down leading to a very flexible and scalable yet simple system without knowing the details of the business domain. I highly recommend consulting with Udi when it comes to large scale mission critical systems in any domain.”

Simon Segal, Independent Consultant
“Udi is one of the outstanding software development minds in the world today, his vast insights into Service Oriented Architectures and Smart Clients in particular are indeed a rare commodity. Udi is also an exceptional teacher and can help lead teams to fall into the pit of success. I would recommend Udi to anyone considering some Architecural guidance and support in their next project.”

Ohad Israeli, Chief Architect at Hewlett-Packard, Indigo Division
“When you need a man to do the job Udi is your man! No matter if you are facing near deadline deadlock or at the early stages of your development, if you have a problem Udi is the one who will probably be able to solve it, with his large experience at the industry and his widely horizons of thinking , he is always full of just in place great architectural ideas.
I am honored to have Udi as a colleague and a friend (plus having his cell phone on my speed dial).”

Ward Bell, VP Product Development at IdeaBlade
“Everyone will tell you how smart and knowledgable Udi is ... and they are oh-so-right. Let me add that Udi is a smart LISTENER. He's always calibrating what he has to offer with your needs and your experience ... looking for the fit. He has strongly held views ... and the ability to temper them with the nuances of the situation.
I trust Udi to tell me what I need to hear, even if I don't want to hear it, ... in a way that I can hear it. That's a rare skill to go along with his command and intelligence.”

Eli Brin, Program Manager at RISCO Group
“We hired Udi as a SOA specialist for a large scale project. The development is outsourced to India. SOA is a buzzword used almost for anything today. We wanted to understand what SOA really is, and what is the meaning and practice to develop a SOA based system.
We identified Udi as the one that can put some sense and order in our minds. We started with a private customized SOA training for the entire team in Israel. After that I had several focused sessions regarding our architecture and design.
I will summarize it simply (as he is the software simplist): We are very happy to have Udi in our project. It has a great benefit. We feel good and assured with the knowledge and practice he brings. He doesn’t talk over our heads. We assimilated nServicebus as the ESB of the project. I highly recommend you to bring Udi into your project.”

Catherine Hole, Senior Project Manager at the Norwegian Health Network
“My colleagues and I have spent five interesting days with Udi - diving into the many aspects of SOA. Udi has shown impressive abilities of understanding organizational challenges, and has brought the business perspective into our way of looking at services. He has an excellent understanding of the many layers from business at the top to the technical infrstructure at the bottom. He is a great listener, and manages to simplify challenges in a way that is understandable both for developers and CEOs, and all the specialists in between.”

Yoel Arnon, MSMQ Expert
“Udi has a unique, in depth understanding of service oriented architecture and how it should be used in the real world, combined with excellent presentation skills. I think Udi should be a premier choice for a consultant or architect of distributed systems.”

Vadim Mesonzhnik, Development Project Lead at Polycom
“When we were faced with a task of creating a high performance server for a video-tele conferencing domain we decided to opt for a stateless cluster with SQL server approach. In order to confirm our decision we invited Udi.

After carefully listening for 2 hours he said: "With your kind of high availability and performance requirements you don’t want to go with stateless architecture."

One simple sentence saved us from implementing a wrong product and finding that out after years of development. No matter whether our former decisions were confirmed or altered, it gave us great confidence to move forward relying on the experience, industry best-practices and time-proven techniques that Udi shared with us.
It was a distinct pleasure and a unique opportunity to learn from someone who is among the best at what he does.”

Jack Van Hoof, Enterprise Integration Architect at Dutch Railways
“Udi is a respected visionary on SOA and EDA, whose opinion I most of the time (if not always) highly agree with. The nice thing about Udi is that he is able to explain architectural concepts in terms of practical code-level examples.”

Neil Robbins, Applications Architect at Brit Insurance
“Having followed Udi's blog and other writings for a number of years I attended Udi's two day course on 'Loosely Coupled Messaging with NServiceBus' at SkillsMatter, London.

I would strongly recommend this course to anyone with an interest in how to develop IT systems which provide immediate and future fitness for purpose. An influential and innovative thought leader and practitioner in his field, Udi demonstrates and shares a phenomenally in depth knowledge that proves his position as one of the premier experts in his field globally.

The course has enhanced my knowledge and skills in ways that I am able to immediately apply to provide benefits to my employer. Additionally though I will be able to build upon what I learned in my 2 days with Udi and have no doubt that it will only enhance my future career.

I cannot recommend Udi, and his courses, highly enough.”

Nick Malik, Enterprise Architect at Microsoft Corporation
“You are an excellent speaker and trainer, Udi, and I've had the fortunate experience of having attended one of your presentations. I believe that you are a knowledgable and intelligent man.”

Sean Farmar, Chief Technical Architect at Candidate Manager Ltd
“Udi has provided us with guidance in system architecture and supports our implementation of NServiceBus in our core business application.

He accompanied us in all stages of our development cycle and helped us put vision into real life distributed scalable software. He brought fresh thinking, great in depth of understanding software, and ongoing support that proved as valuable and cost effective.

Udi has the unique ability to analyze the business problem and come up with a simple and elegant solution for the code and the business alike.
With Udi's attention to details, and knowledge we avoided pit falls that would cost us dearly.”

Børge Hansen, Architect Advisor at Microsoft
“Udi delivered a 5 hour long workshop on SOA for aspiring architects in Norway. While keeping everyone awake and excited Udi gave us some great insights and really delivered on making complex software challenges simple. Truly the software simplist.”

Motty Cohen, SW Manager at KorenTec Technologies
“I know Udi very well from our mutual work at KorenTec. During the analysis and design of a complex, distributed C4I system - where the basic concepts of NServiceBus start to emerge - I gained a lot of "Udi's hours" so I can surely say that he is a professional, skilled architect with fresh ideas and unique perspective for solving complex architecture challenges. His ideas, concepts and parts of the artifacts are the basis of several state-of-the-art C4I systems that I was involved in their architecture design.”

Aaron Jensen, VP of Engineering at Eleutian Technology
“Awesome. Just awesome.

We’d been meaning to delve into messaging at Eleutian after multiple discussions with and blog posts from Greg Young and Udi Dahan in the past. We weren’t entirely sure where to start, how to start, what tools to use, how to use them, etc. Being able to sit in a room with Udi for an entire week while he described exactly how, why and what he does to tackle a massive enterprise system was invaluable to say the least.

We now have a much better direction and, more importantly, have the confidence we need to start introducing these powerful concepts into production at Eleutian.”

Gad Rosenthal, Department Manager at Retalix
“A thinking person. Brought fresh and valuable ideas that helped us in architecting our product. When recommending a solution he supports it with evidence and detail so you can successfully act based on it. Udi's support "comes on all levels" - As the solution architect through to the detailed class design. Trustworthy!”

Chris Bilson, Developer at Russell Investment Group
“I had the pleasure of attending a workshop Udi led at the Seattle ALT.NET conference in February 2009. I have been reading Udi's articles and listening to his podcasts for a long time and have always looked to him as a source of advice on software architecture.
When I actually met him and talked to him I was even more impressed. Not only is Udi an extremely likable person, he's got that rare gift of being able to explain complex concepts and ideas in a way that is easy to understand.
All the attendees of the workshop greatly appreciate the time he spent with us and the amazing insights into service oriented architecture he shared with us.”

Alexey Shestialtynov, Senior .Net Developer at Candidate Manager
“I met Udi at Candidate Manager where he was brought in part-time as a consultant to help the company make its flagship product more scalable. For me, even after 30 years in software development, working with Udi was a great learning experience. I simply love his fresh ideas and architecture insights.
As we all know it is not enough to be armed with best tools and technologies to be successful in software - there is still human factor involved. When, as it happens, the project got in trouble, management asked Udi to step into a leadership role and bring it back on track. This he did in the span of a month. I can only wish that things had been done this way from the very beginning.
I look forward to working with Udi again in the future.”

Christopher Bennage, President at Blue Spire Consulting, Inc.
“My company was hired to be the primary development team for a large scale and highly distributed application. Since these are not necessarily everyday requirements, we wanted to bring in some additional expertise. We chose Udi because of his blogging, podcasting, and speaking. We asked him to to review our architectural strategy as well as the overall viability of project.
I was very impressed, as Udi demonstrated a broad understanding of the sorts of problems we would face. His advice was honest and unbiased and very pragmatic. Whenever I questioned him on particular points, he was able to backup his opinion with real life examples. I was also impressed with his clarity and precision. He was very careful to untangle the meaning of words that might be overloaded or otherwise confusing. While Udi's hourly rate may not be the cheapest, the ROI is undoubtedly a deal. I would highly recommend consulting with Udi.”

Robert Lewkovich, Product / Development Manager at Eggs Overnight
“Udi's advice and consulting were a huge time saver for the project I'm responsible for. The $ spent were well worth it and provided me with a more complete understanding of nServiceBus and most importantly in helping make the correct architectural decisions earlier thereby reducing later, and more expensive, rework.”

Ray Houston, Director of Development at TOPAZ Technologies
“Udi's SOA class made me smart - it was awesome.

The class was very well put together. The materials were clear and concise and Udi did a fantastic job presenting it. It was a good mixture of lecture, coding, and question and answer. I fully expected that I would be taking notes like crazy, but it was so well laid out that the only thing I wrote down the entire course was what I wanted for lunch. Udi provided us with all the lecture materials and everyone has access to all of the samples which are in the nServiceBus trunk.

Now I know why Udi is the "Software Simplist." I was amazed to find that all the code and solutions were indeed very simple. The patterns that Udi presented keep things simple by isolating complexity so that it doesn't creep into your day to day code. The domain code looks the same if it's running in a single process or if it's running in 100 processes.”

Ian Cooper, Team Lead at Beazley
“Udi is one of the leaders in the .Net development community, one of the truly smart guys who do not just get best architectural practice well enough to educate others but drives innovation. Udi consistently challenges my thinking in ways that make me better at what I do.”

Liron Levy, Team Leader at Rafael
“I've met Udi when I worked as a team leader in Rafael. One of the most senior managers there knew Udi because he was doing superb architecture job in another Rafael project and he recommended bringing him on board to help the project I was leading.
Udi brought with him fresh solutions and invaluable deep architecture insights. He is an authority on SOA (service oriented architecture) and this was a tremendous help in our project.
On the personal level - Udi is a great communicator and can persuade even the most difficult audiences (I was part of such an audience myself..) by bringing sound explanations that draw on his extensive knowledge in the software business. Working with Udi was a great learning experience for me, and I'll be happy to work with him again in the future.”

Adam Dymitruk, Director of IT at Apara Systems
“I met Udi for the first time at DevTeach in Montreal back in early 2007. While Udi is usually involved in SOA subjects, his knowledge spans all of a software development company's concerns. I would not hesitate to recommend Udi for any company that needs excellent leadership, mentoring, problem solving, application of patterns, implementation of methodologies and straight out solution development.
There are very few people in the world that are as dedicated to their craft as Udi is to his. At ALT.NET Seattle, Udi explained many core ideas about SOA. The team that I brought with me found his workshop and other talks the highlight of the event and provided the most value to us and our organization. I am thrilled to have the opportunity to recommend him.”

Eytan Michaeli, CTO Korentec
“Udi was responsible for a major project in the company, and as a chief architect designed a complex multi server C4I system with many innovations and excellent performance.”

Carl Kenne, .Net Consultant at Dotway AB
“Udi's session "DDD in Enterprise apps" was truly an eye opener. Udi has a great ability to explain complex enterprise designs in a very comprehensive and inspiring way. I've seen several sessions on both DDD and SOA in the past, but Udi puts it in a completly new perspective and makes us understand what it's all really about. If you ever have a chance to see any of Udi's sessions in the future, take it!”

Avi Nehama, R&D Project Manager at Retalix
“Not only that Udi is a briliant software architecture consultant, he also has remarkable abilities to present complex ideas in a simple and concise manner, and...
always with a smile. Udi is indeed a top-league professional!”

Ben Scheirman, Lead Developer at CenterPoint Energy
“Udi is one of those rare people who not only deeply understands SOA and domain driven design, but also eloquently conveys that in an easy to grasp way. He is patient, polite, and easy to talk to. I'm extremely glad I came to his workshop on SOA.”

Scott C. Reynolds, Director of Software Engineering at CBLPath
“Udi is consistently advancing the state of thought in software architecture, service orientation, and domain modeling.
His mastery of the technologies and techniques is second to none, but he pairs that with a singular ability to listen and communicate effectively with all parties, technical and non, to help people arrive at context-appropriate solutions. Every time I have worked with Udi, or attended a talk of his, or just had a conversation with him I have come away from it enriched with new understanding about the ideas discussed.”

Evgeny-Hen Osipow, Head of R&D at PCLine
“Udi has helped PCLine on projects by implementing architectural blueprints demonstrating the value of simple design and code.”

Rhys Campbell, Owner at Artemis West
“For many years I have been following the works of Udi. His explanation of often complex design and architectural concepts are so cleanly broken down that even the most junior of architects can begin to understand these concepts. These concepts however tend to typify the "real world" problems we face daily so even the most experienced software expert will find himself in an "Aha!" moment when following Udi teachings.
It was a pleasure to finally meet Udi in Seattle Alt.Net OpenSpaces 2008, where I was pleasantly surprised at how down-to-earth and approachable he was. His depth and breadth of software knowledge also became apparent when discussion with his peers quickly dove deep in to the problems we current face. If given the opportunity to work with or recommend Udi I would quickly take that chance. When I think .Net Architecture, I think Udi.”

Sverre Hundeide, Senior Consultant at Objectware
“Udi had been hired to present the third LEAP master class in Oslo. He is an well known international expert on enterprise software architecture and design, and is the author of the open source messaging framework nServiceBus. The entire class was based on discussion and interaction with the audience, and the only Power Point slide used was the one showing the agenda.
He started out with sketching a naive traditional n-tier application (big ball of mud), and based on suggestions from the audience we explored different solutions which might improve the solution. Whatever suggestions we threw at him, he always had a thoroughly considered answer describing pros and cons with the suggested solution. He obviously has a lot of experience with real world enterprise SOA applications.”

Raphaël Wouters, Owner/Managing Partner at Medinternals
“I attended Udi's excellent course 'Advanced Distributed System Design with SOA and DDD' at Skillsmatter. Few people can truly claim such a high skill and expertise level, present it using a pragmatic, concrete no-nonsense approach and still stay reachable.”

Nimrod Peleg, Lab Engineer at Technion IIT
“One of the best programmers and software engineer I've ever met, creative, knows how to design and implemet, very collaborative and finally - the applications he designed implemeted work for many years without any problems!”

Jose Manuel Beas
“When I attended Udi's SOA Workshop, then it suddenly changed my view of what Service Oriented Architectures were all about. Udi explained complex concepts very clearly and created a very productive discussion environment where all the attendees could learn a lot. I strongly recommend hiring Udi.”

Daniel Jin, Senior Lead Developer at PJM Interconnection
“Udi is one of the top SOA guru in the .NET space. He is always eager to help others by sharing his knowledge and experiences. His blog articles often offer deep insights and is a invaluable resource. I highly recommend him.”

Pasi Taive, Chief Architect at Tieto
“I attended both of Udi's "UI Composition Key to SOA Success" and "DDD in Enterprise Apps" sessions and they were exceptionally good. I will definitely participate in his sessions again. Udi is a great presenter and has the ability to explain complex issues in a manner that everyone understands.”

Eran Sagi, Software Architect at HP
“So far, I heard about Service Oriented architecture all over. Everyone mentions it – the big buzz word. But, when I actually asked someone for what does it really mean, no one managed to give me a complete satisfied answer. Finally in his excellent course “Advanced Distributed Systems”, I got the answers I was looking for. Udi went over the different motivations (principles) of Services Oriented, explained them well one by one, and showed how each one could be technically addressed using NService bus. In his course, Udi also explain the way of thinking when coming to design a Service Oriented system. What are the questions you need to ask yourself in order to shape your system, place the logic in the right places for best Service Oriented system.

I would recommend this course for any architect or developer who deals with distributed system, but not only. In my work we do not have a real distributed system, but one PC which host both the UI application and the different services inside, all communicating via WCF. I found that many of the architecture principles and motivations of SOA apply for our system as well. Enough that you have SW partitioned into components and most of the principles becomes relevant to you as well. Bottom line – an excellent course recommended to any SW Architect, or any developer dealing with distributed system.”

Consult with Udi

Guest Authored Books

Article: The Enterprise Service Bus and Your SOA

97 Things Every Software Architect Should Know

Follow me on Mastodon