Udi Dahan   Udi Dahan – The Software Simplist
Enterprise Development Expert & SOA Specialist
    Blog Consulting Training Articles Speaking About

Archive for the ‘Messaging’ Category

Saga Persistence and Event-Driven Architectures

Monday, April 20th, 2009

imageWhen working with clients, I run into more than a couple of people that have difficulty with event-driven architecture (EDA). Even more people have difficulty understanding what sagas really are, let alone why they need to use them. I’d go so far to say that many people don’t realize the importance of how sagas are persisted in making it all work (including the Workflow Foundation team).

The common e-commerce example

We accept orders, bill the customer, and then ship them the product.

Fairly straight-forward.

Since each part of that process can be quite complex, let’s have each step be handled by a service:

Sales, Billing, and Shipping. Each of these services will publish an event when it’s done its part. Sales will publish OrderAccepted containing all the order information – order Id, customer Id, products, quantities, etc. Billing will publish CustomerBilledForOrder containing the customer Id, order Id, etc. And Shipping will publish OrderShippedToCustomer with its data.

So far, so good. EDA and SOA seem to be providing us some value.

Where’s the saga?

Well, let’s consider the behavior of the Shipping service. It shouldn’t ship the order to the customer until it has received the CustomerBilledForOrder event as well as the OrderAccepted event. In other words, Shipping needs to hold on to the state that came in the first event until the second event comes in. And this is exactly what sagas are for.

Let’s take a look at the saga code that implements this. In order to simplify the sample a bit, I’ll be omitting the product quantities.

   1:      public class ShippingSaga : Saga<ShippingSagaData>,
   2:          ISagaStartedBy<OrderAccepted>,
   3:          ISagaStartedBy<CustomerBilledForOrder>
   4:      {
   5:          public void Handle(OrderAccepted message)
   6:          {
   7:              this.Data.ProductIdsInOrder = message.ProductIdsInOrder;
   8:          }
  10:          public void Handle(CustomerBilledForOrder message)
  11:          {
  12:               this.Bus.Send<ShipOrderToCustomer>(
  13:                  (m =>
  14:                  {
  15:                      m.CustomerId = message.CustomerId;
  16:                      m.OrderId = message.OrderId;
  17:                      m.ProductIdsInOrder = this.Data.ProductIdsInOrder;
  18:                  }
  19:                  ));
  21:              this.MarkAsComplete();
  22:          }
  24:          public override void Timeout(object state)
  25:          {
  27:          }
  28:      }

First of all, this looks fairly simple and straightforward, which is good.
It’s also wrong, which is not so good.

One problem we have here is that events may arrive out of order – first CustomerBilledForOrder, and only then OrderAccepted. What would happen in the above saga in that case? Well, we wouldn’t end up shipping the products to the customer, and customers tend not to like that (for some reason).

There’s also another problem here. See if you can spot it as I go through the explanation of ISagaStartedBy<T>.

Saga start up and correlation

The “ISagaStartedBy<T>” that is implemented for both messages indicates to the infrastructure (NServiceBus) that when a message of that type arrives, if an existing saga instance cannot be found, that a new instance should be started up. Makes sense, doesn’t it? For a given order, when the OrderAccepted event arrives first, Shipping doesn’t currently have any sagas handling it, so it starts up a new one. After that, when the CustomerBilledForOrder event arrives for that same order, the event should be handled by the saga instance that handled the first event – not by a new one.

I’ll repeat the important part: “the event should be handled by the saga instance that handled the first event”.

Since the only information we stored in the saga was the list of products, how would we be able to look up that saga instance when the next event came in containing an order Id, but no saga Id?

OK, so we need to store the order Id from the first event so that when the second event comes along we’ll be able to find the saga based on that order Id. Not too complicated, but something to keep in mind.

Let’s look at the updated code:

   1:      public class ShippingSaga : Saga<ShippingSagaData>,
   2:          ISagaStartedBy<OrderAccepted>,
   3:          ISagaStartedBy<CustomerBilledForOrder>
   4:      {
   5:          public void Handle(CustomerBilledForOrder message)
   6:          {
   7:              this.Data.CustomerHasBeenBilled = true;
   9:              this.Data.CustomerId = message.CustomerId;
  10:              this.Data.OrderId = message.OrderId;
  12:              this.CompleteIfPossible();
  13:          }
  15:          public void Handle(OrderAccepted message)
  16:          {
  17:              this.Data.ProductIdsInOrder = message.ProductIdsInOrder;
  19:              this.Data.CustomerId = message.CustomerId;
  20:              this.Data.OrderId = message.OrderId;
  22:              this.CompleteIfPossible();
  23:          }
  25:          private void CompleteIfPossible()
  26:          {
  27:              if (this.Data.ProductIdsInOrder != null && this.Data.CustomerHasBeenBilled)
  28:              {
  29:                  this.Bus.Send<ShipOrderToCustomer>(
  30:                     (m =>
  31:                     {
  32:                         m.CustomerId = this.Data.CustomerId;
  33:                         m.OrderId = this.Data.OrderId;
  34:                         m.ProductIdsInOrder = this.Data.ProductIdsInOrder;
  35:                     }
  36:                     ));
  37:                  this.MarkAsComplete();
  38:              }
  39:          }
  40:      }

And that brings us to…

Saga persistence

We already saw why Shipping needs to be able to look up its internal sagas using data from the events, but what that means is that simple blob-type persistence of those sagas is out. NServiceBus comes with an NHibernate-based saga persister for exactly this reason, though any persistence mechanism which allows you to query on something other than saga Id would work just as well.

Let’s take a quick look at the saga data that we’ll be storing and see how simple it is:

   1:      public class ShippingSagaData : ISagaEntity
   2:      {
   3:          public virtual Guid Id { get; set; }
   4:          public virtual string Originator { get; set; }
   5:          public virtual Guid OrderId { get; set; }
   6:          public virtual Guid CustomerId { get; set; }
   7:          public virtual List<Guid> ProductIdsInOrder { get; set; }
   8:          public virtual bool CustomerHasBeenBilled { get; set; }
   9:      }

You might have noticed the “Originator” property in there and wondered what it is for. First of all, the ISagaEntity interface requires the two properties Id and Originator. Originator is used to store the return address of the message that started the saga. Id is for what you think it’s for. In this saga, we don’t need to send any messages back to whoever started the saga, but in many others we do. In those cases, we’ll often be handling a message from some other endpoint when we want to possibly report some status back to the client that started the process. By storing that client’s address the first time, we can then “ReplyToOriginator” at any point in the process.

The manufacturing sample that comes with NServiceBus shows how this works.

Saga Lookup

Earlier, we saw the need to search for sagas based on order Id. The way to hook into the infrastructure and perform these lookups is by implementing “IFindSagas<T>.Using<M>” where T is the type of the saga data and M is the type of message. In our example, doing this using NHibernate would look like this:

   1:      public class ShippingSagaFinder : 
   2:          IFindSagas<ShippingSagaData>.Using<OrderAccepted>,
   3:          IFindSagas<ShippingSagaData>.Using<CustomerBilledForOrder>
   4:      {
   5:          public ShippingSagaData FindBy(CustomerBilledForOrder message)
   6:          {
   7:              return FindBy(message.OrderId)
   8:          }
  10:          public ShippingSagaData FindBy(OrderAccepted message)
  11:          {
  12:              return FindBy(message.OrderId)
  13:          }
  15:          private ShippingSagaData FindBy(Guid orderId)
  16:          {
  17:              return sessionFactory.GetCurrentSession().CreateCriteria(typeof(ShippingSagaData))
  18:                  .Add(Expression.Eq("OrderId", orderId))
  19:                  .UniqueResult<ShippingSagaData>();
  20:          }
  22:          private ISessionFactory sessionFactory;
  24:          public virtual ISessionFactory SessionFactory
  25:          {
  26:              get { return sessionFactory; }
  27:              set { sessionFactory = value; }
  28:          }
  29:      }

For a performance boost, we’d probably index our saga data by order Id.

On concurrency

Another important note is that for this saga, if both messages were handled in parallel on different machines, the saga could get stuck. The persistence mechanism here needs to prevent this. When using NHibernate over a database with the appropriate isolation level (Repeatable Read – the default in NServiceBus), this “just works”. If/When implementing your own saga persistence mechanism, it is important to understand the kind of concurrency your business logic can live with.

Take a look at Ayende’s example for mobile phone billing to get a feeling for what that’s like.


In almost any event-driven architecture, you’ll have services correlating multiple events in order to make decisions. The saga pattern is a great fit there, and not at all difficult to implement. You do need to take into account that events may arrive out of order and implement the saga logic accordingly, but it’s really not that big a deal. Do take the time to think through what data will need to be stored in order for the saga to be fault-tolerant, as well as a persistence mechanism that will allow you to look up that data based on event data.

If you feel like giving this approach a try, but don’t have an environment handy for this, download NServiceBus and take a look at the samples. It’s really quick and easy to get set up.

Messaging ROI

Sunday, February 22nd, 2009

There’s been some recent discussion as to the “cost” of messaging:

Greg Young asserts:image

“I believe that this shows there to be a rather negligible cost associated with the use of such a model. There is however a small cost, this cost however I believe only exists when one looks at the system in isolation.”

Ayende adds his perspective:image

“The cost of messaging, and a very real one, comes when you need to understand the system. In a system where message exchange is the form of communication, it can be significantly harder to understand what is going on.”

Of course, both these intelligent fellows are right. The reason for the apparent disparity in viewpoints has to do with which part of the following graph you look at. Ayende zooms in on the left side:

left graph

As systems get larger, though, the only way to understand them is by working at higher levels of abstraction. That’s where messaging really shines, as the incremental complexity remains the same by maintaining the same modularity as before:

full graph

In Ayende’s post, he follows the design I described a while back on using messaging for user management and login for a high-scale web scenario. In his comments, he agrees with the above stating:

“I certainly think that a similar solution using RPC would be much more complex and likely more brittle.”

I feel quite conservative in saying the most enterprise solutions fall on the right side of the intersection in the graph.

That being said, don’t underestimate the learning curve developers go through with messaging. While the mechanics are similar, the mindset is very different. Think about it like this:image

You’ve driven a car for years in the US. It’s practically second nature. Then you fly to the UK, rent a car, and all of a sudden, your brain is in meltdown. (or vice versa for those going from the UK to the US)


If you are going down the messaging route, please be aware that there are shades of gray there as well. You don’t have to implement your user management and login the way I outlined in my post if you don’t require such high levels of scalability, but even lower levels of scalability can benefit from messaging.

Just as there isn’t a single correct design for non-messaging solutions, the same is true for those using messaging. Finding the right balance is tricky, and critical.

When the code is simple in every part of the system, and the asynchronous interactions are what provide for the necessary complexity the problem domain requires, that’s when you know you’ve got it just right.

Lost Notifications? No Problem.

Sunday, December 7th, 2008

One of the most common questions I get on the topic of pub/sub messaging is what happens if a notification is lost. Interestingly enough, there are some who almost entirely write-off this pattern because of this issue, preferring the control of request/response-exception. So, what should be done about lost messages? The short answer is durable messaging. The long answer is design.

Durable Messaging

In order to prevent a message from being lost when it is sent from a publisher to a subscriber, the message is written to disk on the publisher side, and then forwarded to the subscriber, where it is also written to disk. This store-and-forward mechanism enables our systems to gracefully recover from either side being temporarily unavailable.

In my MSDN article on this topic, I outlined some problems with this approach. These problems are exacerbated for publishers. Imagine a publisher with 40 subscribers, publishing 10 messages a second, each containing 1MB of XML. If 10 of the subscribers are unavailable, that’s 100MB of data being written to the publisher’s disk every second, 6GB every minute. That’s liable to bring down a publisher before an administrator brews a cup of coffee.

Publishers have no choice but to throw away messages after a certain period of time.

Publisher Contracts

The whole issue of contracts and schema is considered one of the better understand parts of SOA. Unfortunately, the operational aspects of service contracts is hardly ever taken into account.

On top of the schema of the messages a service publishers, additional information is needed in the contract:

  1. How big will this message be?
  2. How often will it be published?
  3. How long will this message be stored if a subscriber is unavailable?

This first two pieces of information are important for subscribers to do load and capacity planning. The last one is the most important as it dictates the required availability and fault-tolerance characteristic of subscribers.

For Example

In the canonical retail scenario, when our sales service accepts an order, it publishes an order accepted event. Other services subscribed to this event include shipping, billing, and business intelligence.

While shipping and billing are highly available and able to keep up with the rate at which orders are accepted, the business intelligence service is not. BI has two main parts to it – a nightly batch that does the number crunching, and a UI for reporting off of the results of that number crunching. Some even do the reporting in a semi-offline fashion, emailing reports back to the user when they’re ready.

Furthermore, nobody’s going to invest in servers for making BI highly available.

And wasn’t the whole point of this publish/subscribe messaging to keep our services autonomous? That not all services have to have the same level uptime?

Houston, do we have a problem.?

Data Freshness

There is a glimmer of light in all this doom and gloom.

Not all services have the same data freshness requirements.

The business intelligence service above doesn’t need to know about orders the second they’re accepted. A daily roll-up would be fine, and an hourly roll-up bring us that much closer to “real time business intelligence”.

So, while BI is ready to accept the sales message schema, it would like a slightly different contract around it – less messages per unit of time, more data in each message.

From the operational perspective of the sales service, it would be cost effective to have less “online” subscribers. It could even take things a few steps further. Instead of using the regular messaging backbone for transmitting these hourly messages, it could use FTP. The data could even be zipped to take up even less space. Since the total data size is less than the corresponding online stream, is stored on cheaper, large storage, and the number of subscribers for this zipped, hourly update is fairly small, these messages can be kept around far longer.

If you’ve heard about consumer-driven contracts, this is it.

Note that we’re still talking about the same logical message schema.


It’s not that lost notifications aren’t a problem.

It’s that they feed the design process in such a way that the resulting service ecosystem is set up in such a way that notifications won’t get lost. I know that that sounds kind of recursive, but that’s how it works. Either subscribers take care of their SLA allowing them to process the online stream of events, or they should subscribe to a different pipe (which will have different SLA requirements, but maybe they can deal with those).

It make sense to have multiple pipes for the same logical schema.

It’s practically a necessity to make pub/sub a feasible solution.


Related Content

MSDN article on messaging and lost messages

Durable messaging dilemmas

Additional logic required for service autonomy

More in depth example on events and pub/sub between services

Consumer-Driven Contracts

Command Query Separation and SOA

Monday, August 11th, 2008

One of the common questions I receive from people starting to use nServiceBus is how one-way messaging fits with showing the user a grid (or list) of data. Thinking about publish/subscribe usually just gets them even more confused. Trying to resolve all this with Service Oriented Architecture leaves them wondering – why bother?

client server

In regular client-server development, the server is responsible for providing the client with all CRUD (create, read, update, and delete) capabilities. However, when users look at data they do not often require it to be up to date to the second (given that they often look at the same screen for several seconds to minutes at a time). As such, retrieving data from the same table as that being used for highly consistent transaction processing creates contention resulting in poor performance for all CRUD actions under higher load.

A Scalable Solution

One of the common answers to this question is for the server/service to publish a message when data changes (say, as the result of processing a message) and for clients to subscribe to these messages. When such a notification arrives at a client, the client would cache the data it needs. Then, when the user wants to see a grid of data, that data is already on the client. Of course, this solution doesn’t work so well for older client machines (like some point of service devices) or if there are millions of rows of data.

The thing is that this solution is one implementation of a more general pattern – command query separation (CQS).

Command Query Separation

Wikipedia describes CQS as a pattern where "… every method should either be a command that performs an action, or a query that returns data to the caller, but not both. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects."

Martin Fowler is less strict about the use of CQS allowing for exceptions: "Popping a stack is a good example of a modifier that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I’m prepared to break it to get my pop."

So, how does separating commands from queries and SOA help at all in getting data to and from a UI? The answer is based on Pat Helland’s thinking as described in his article Data on the Inside vs. Data on the Outside.

Services Cross Boxes

The biggest lie around SOA is that services run.

Let that sink in a second.

Sure services have runnable components, but that’s not why they’re important.

I’ll skip the books of background and cut to the chase:

Services communicate with each other using publish/subscribe and one-way messaging. Services have components inside them. Inside a service, these components can communicate with each using synchronous RPC, or any other mechanism. Also, these components can reside on different machines.

This is broader than just scaling out a service. There can be service components running on the client as well as the server.


Combining these two concepts together, here’s what comes out:

In this solution there are two services that span both client and server – one in charge of commands (create, update, delete), the other in charge of queries (read). These services communicate only via messages – one cannot access the database of the other.

The command service publishes messages about changes to data, to which the query service subscribes. When the query service receives such notifications, it saves the data in its own data store which may well have a different schema (optimized for queries like a star schema).

The client component which is in charge of showing grids of data to the user behaves the same as it would in a regular layered/tiered architecture, using synchronous blocking request/response to get its data – SOA doesn’t change that.

Composite Applications

Although the client side components of both the command and query services are hosted in the same process, they are very much independent of each other. That being said, from an interoperability perspective (the one that most people attribute to SOA), all of the client-side components will likely be developed using the same technology – although there are already ways to host Java code in .NET and vice-versa.

Of course, once we talk about web UI’s things are a bit different – but still similar. While web-server-side there may be a level of independence, for browser side inter-component communications we’re still likely to target javascript. There, I’ve managed to say something technical supporting mashups and SOA without lying through my teeth.

On the Microsoft side with the recent release of the Composite Application Guidance & Library (pronounced "prism") I hope that more of these principles will be reaching the "smart client". The command pattern is especially critical in maintaining the separation while enabling communication to still occur so I’m glad that, as one of the Prism advisors, I was able to simplify that part (Glenn still has nightmares about that rooftop conversation).

Publish / Subscribe

In the "scalable solution" section up top I mentioned how publish/subscribe to the smart client is really just one implementation of CQS and SOA. So, how different is it really?

smart client pub/sub

Well, there will probably be a different technology mapping. Instead of a star-schema OLAP product, we might simply store the published data in memory on the client. That is, if you designed your components to be technology agnostic.

In terms of the use of nServiceBus, the same component is going to be subscribing to the same type of message – all that’s different is that now every client will be having data pushed to them rather than this occurring server-side only.

You could have the same code deployed differently in the same system – stronger clients subscribing themselves, weaker ones using a remote server. Web servers would probably be considered stronger clients. This kind of flexible deployment has proven to be extremely valuable for my larger clients. The added benefit of enabling users to work (view data) even while offline (somewhere there’s no WIFI) is just icing on the cake.

A Word of Warning

Once the client starts receiving notifications, and handling those on a background thread (as it should) the code becomes susceptible to deadlocks and data races. Juval does a good job of outlining some of those with respect to the use of WCF. Prism doesn’t provide any assurances in this area either.


NServiceBus is not designed to be used for any and all types of communication in a given architecture. In the examples above, nServiceBus handles the publish/subscribe but leaves the synchronous RPC to existing solutions like WCF. Not only that, but synchronous RPC does have its place in architecture, just not across service boundaries. In all cases, data is served to users from a store different from that which transaction processing logic uses.

Command Query Separation is not only a good idea at the method/class level but has advantages at the SOA/System level as well – yet another good idea from 20 years ago that services build upon. Making use of CQS requires understanding your data and its uses – SOA builds on that by looking into data volatility and the freshness business requirements around it.

Finally, designing the components of your services in such a way that their dependency on technology is limited buys a lot of flexibility in terms of deployment and, consequently, significant performance and scalability gains.

Simple, it is. Easy, it is not.

Scaling Long Running Web Services

Wednesday, July 30th, 2008

While I was at TechEd USA I had an attendee, Will, come up and ask me an interesting question about how to handle web service calls that can take a long time to complete. He has a number of these kinds of requests ranging from computationally intensive tasks to those requiring sifting through large amounts of data. What Will was having problems with was preventing too many of these resource-intensive tasks from running concurrently (causing increased memory usage, paging, and eventually the server becoming unavailable).

For comparison later, here’s a diagram showing the trivial interaction:


One solution that he’d tried was to set up the web server to throttle those requests and keep a much smaller maximum thread-pool size for that application pool. The unfortunate side effect of that solution was that clients would get “turned away” by a not-so-pleasant Connection Refused exception.

Will had been to my web scalability talk and was curious about how I was using queues behind my web services. I’ve also heard this question from people just getting started with nServiceBus when looking at the Web Services Bridge sample. Here’s the code that’s in the sample and in just a second I’ll tell you why you shouldn’t do this:

public ErrorCodes Process(Command request)
    object result = ErrorCodes.None;
    IAsyncResult sync = Global.Bus.Send(request).Register(
        delegate(IAsyncResult asyncResult)
              CompletionResult completionResult = asyncResult.AsyncState as CompletionResult;
              if (completionResult != null)
                  result = (ErrorCodes) completionResult.ErrorCode;
    return (ErrorCodes)result;

Let me repeat, this is demo-ware. Do not use this in production.

What’s happening is that in this web service call we’re putting a message in a queue for some other process/machine to process. When that processing is complete, we’ll get a message back in our local queue (which you don’t see) which is correlated to our original request, firing off the callback. We block the web method from completing (using the WaitOne call) thus keeping the HTTP connection to the client open.

The problem here is that we’re wasting resources (the HTTP connection and the thread) while waiting for a response which, as already mentioned, can take a long time. In B2B or other server to server integration environments there are all sorts of middleware solutions that help us solve these problems, however in Will’s case browsers needed to interact with this web service. All he had was HTTP.

HTTP Solutions

Another attendee who was listening in (sorry I don’t remember your name) said that he was solving similar problems using polling but that he was having scalability problems as well.

What often surprises my clients when we deal with these same issues is that I do suggest a polling based solution, but one that still uses messaging, and this is what I described to Will:

Since we can’t actually push a message to a browser over HTTP from our server when processing is complete, the browser itself will be responsible for pulling the response. We still don’t want to leave costly resources like HTTP connections open a long time, however if the browser is going to polling for a response, we’ll need some way to correlate those following requests with the original one. What we’re going to do is use the Asynchronous Completion Token pattern, and later I’ll show how to optimize it for web server technology.

Basic Polling


When the browser calls the web service, the web service will generate a Guid, put it in the message that it sends for processing, and return that guid to the browser. When the processing of the message is complete, the result will be written to some kind of database, indexed by that guid. The browser will periodically call another web method, passing in the guid it previously received as a parameter. That web method will check the database for a response using the guid, returning null if no response is there. If the browser receives a null response, it will “sleep” a bit and then retry.

One of the problems with this solution is that polling uses up server resources – both on the web server and our DB; threads, memory, DB connections. A better solution would decrease the resource cost of the polling. Let’s use the fundamental building blocks of the web to our advantage – HTTP GET and resources:

REST-full Polling

Instead of using a guid to represent the id of the response, let’s consider the REST principle of “everything’s a resource”. That would mean that the response itself would be a resource. And since every resource has a URI, we might as well use that URI in lieu of the guid. So, instead of our web service returning a guid, let’s return a URI – something like:


As you can see, the guid is still there. So, what’s different?


What’s different is that instead of having the processing code write the response to the database, it writes it to a resource. This can be done by writing some XML to a file on the SAN in the case of a webfarm. Also, the browser wouldn’t need to call a web service to get the response, it would just do an HTTP GET on the URI. If the it gets an HTTP 404, it would sleep and retry as before. The reason that the SAN is needed is that, as the browser polls, it may have its requests arrive at various web servers so the response needs to be accessible from any one of them.

Just as an aside, it would be better to free the processing node as quickly as possible and have something else write the response to the SAN. That would be done simply by sending a message from the processing node that would be handled by a different node that all it did was write responses to disk.

The reason that the URI makes a difference is that serving “static” resources is something that web servers do extremely efficiently without requiring any managed resources (like ASP.NET threads). That’s a big deal.

We’re still using HTTP connections for the polling but that’s something whose effect can be mitigated to a certain degree.

Timed REST-full Pollingimage

Since various requests can take varying amounts of time to process, it’s difficult to know at what rate the browser should poll. So, why don’t we have the web service tell it. As a part of the response to the original web service call, instead of just returning a URI, we could also return the polling interval – 1 second, 5 seconds, whatever is appropriate for the type of request. This value could easily be configurable [RequestType, PollingInterval].

An even more advanced solution would allow you to change these values dynamically. The advantage that would be gained would be that your operations team could better manage the load on your servers. When a large number of users are hitting your system, you could decrease the rate at which your servers would be polled, thus leaving more HTTP connections for other users.

Scaling and Adaptive Polling

You’d probably also want to scale out the number of processing nodes behind your queue. The nice thing is that you could change the polling interval as you scale the various processing nodes per request type providing better responsiveness for the more critical requests. Once we add virtualization, things get really fun:

We had separate queues per request type, so that we could easily see the load we were under for each type of request. That way, we could scale out the processing nodes per request type as well as change the polling interval. By virtualizing our processing nodes, and writing scripts to monitor queue sizes, we had those scripts automatically provisioning (and de-provisioning) nodes as well as changing the polling interval of the browsers.

This had the enormous benefit of the system automatically shifting resources to provide the appropriate relative allocation for the current load as its macroscopic make-up changed.


Will was well-pleased with the solution which, although more complicated than what he had originally tried, was flexible enough to meet his needs. As opposed to pure server-based solutions, here we make more use of the browser (writing our own Javascript) instead of putting our faith in some Ajax-y library. That’s not to say that you couldn’t wrap this up into a library – in essence, it is a kind of messaging transport for browser to server communication allowing duplex conversations.

In fact, what could be done is to return multiple responses to the browser over a long period of time. In the response that comes back to the browser could be an additional URI where the next response will be. This can be used for reporting the status of a long running process, paging results, and in many other scenarios.

And, one parting thought, could this not be used for all browser to web service communication?

Durable Messaging Dilemmas

Thursday, July 17th, 2008

I’ve received some great feedback on my MSDN article and some really great questions that I think more people are wondering about, so I think I’ll try to do a post per question and see how that goes.

Libor asks:

“Would you recommend using durable messaging for systems where there are similar requirements with respect to data reliability as you had – ie. not losing any messages? If so, then why didn’t the final version of your solution use it? If not, can you explain why?”

The answer is, as always, it depends, but here’s on what it depends:

When designing a system, we need to take a good, hard look at how we manage state, and what properties that state has. In a system of reasonable size we can expect various families of state with respect to their business value, data volatility, and fault-tolerance window. Each family needs to be treated differently. While durable messaging may be suitable for one, it may be overkill or underkill for another.

So, here’s what we’re going to be looking at:

  1. Business Value
  2. Data Volatility
  3. Fault-Tolerance Window

Business Value

When talking about business value, I want to talk about what it means “not losing any messages”. The question is under what conditions will the messages not be lost, or rather, what are the threshold conditions where messages may start getting lost. If all our datacenters are nuked, we will lose data. It’s likely the business is OK with that (as much as can be expected under those circumstances). If a single server goes down, it’s likely the business would not be OK with losing messages containing financial data. However if a message requesting the health of a server were to get lost under those same conditions, that would probably be alright. In other words, what does that message represent in business terms.

Data Volatility

Data volatility also has an impact. Let’s say that we’re building a financial trading system. The time that it takes us to respond to an event (message) that the cost of a certain financial instrument has changed, and the message that we send requesting to buy that security is critical. Let’s say that has to be done in under 10ms. Now, some failure has occurred preventing our message from reaching its destination for 20ms. What should we do with that message? Should we keep it around, making sure it doesn’t get lost? Not in this domain. On the contrary, that message should be thrown away as its “business lifetime” has been exceeded. Furthermore, even during that original period of 10ms, the use of durable messaging may make it close to impossible to maintain our response times.

Fault-Tolerance Window

These two topics feed into the third and more architectural one – fault-tolerance window: what period of time do we require fault tolerance, and with respect to how many (and what kind of) faults? This will lead us into an analysis of to how many machines do we need to copy a message before we release the calling thread. We’d also look at in which datacenters those machines reside. This will also impact (or be impacted by) the kinds of links we have to these datacenters if we want to maintain response times. These numbers will need to change when the system identifies a disaster – degrading itself to a lower level of fault-tolerance after a hurricane knocks out a datacenter, and returning to normal once it comes back up.

Re-Evaluating Durable Messaging

Durable messaging may be used at various points in each part of the solution, but we need to look at message size, the rate those messages are being written to disk, how fast the disk is, how much available disk we have (so we don’t make things worse in the case of degraded service), etc. Companies like Amazon also take into account disk failure rates, replacement rates (disks aren’t replaced immediately you know), and many other factors when making these decisionsimage


Our job as architects when designing the system is to find that cost-benefit balance for the various parts of the system according to these very applicative parameters. No, it’s not easy. No, cloud computing will not magically solve all of this for us. But, we are getting more technical tools to work with, operations staff is getting better at working with us in the design phase, and our thought processes more rigorous in dealing with the scary conditions of the real world.

To your question, Libor, as to why we didn’t eventually use durable messaging in our solution, the answer is that we solved the overall state management problem by setting up an applicative protocol with our partners which was resilient in the face of faults by using idempotent messages that could be resent as many times as necessary. You can read more about it here. This solution isn’t viable for other kinds of interactions but was just what we needed to get the job done.

Hope that helps.

[Podcast] Highly Scalable Web Architectures

Thursday, June 19th, 2008

For those people who couldn’t come to TechEd USA and didn’t see my talks on how to build highly scalable web architectures, you’re in luck – Craig, the man behind the Polymorphic Podcast sat down with me and we chatted about what the problems, common solutions, and effective tactics there are in this space. For those of you who were at TechEd and still didn’t come to my talk – what were you thinking?!


Check it out.

Some of this stuff is a bit counter-intuitive (and not readily supported by the tools available in Visual Studio) so please, do feel free to ask questions (in the comments below).

NServiceBus Performance

Wednesday, May 21st, 2008

I’ve gotten this question several times already but now companies are beginning to look for performance comparisons in making decisions around the use of nServiceBus. It’s often compared to straight WCF, BizTalk, and now Neuron ESB. In Sam’s recent post he posts to a case study of Neuron doing 28 million messages an hour. That’s far more than I’ve ever heard quoted for BizTalk.


Before giving some numbers, please keep in mind that high performance of system infrastructure does not necessarily by itself mean that the system above it is running that fast. For instance, you may have server heartbeats running really quickly but the time it takes to save a purchase order borders on a minute. So, please, take all benchmarks with a grain of salt, or two, or a whole shaker-full.

While I’m not at liberty to say on which specific domain/company these numbers were measured, I can say that we had the full gamut of “stateless services”, statefull services (sagas), number crunching, large data sets, many users, complex visualization, etc. Also, this wasn’t the largest installation of nServiceBus that I’m aware of, but its the one I have the most specific numbers for.


OK, so using the default nServiceBus distribution using MSMQ, on servers where the queue files themselves were on separate SCSI RAID disks, we were pumping around 1000 durable, transactionally processed messages per second, per server. That means that similar to the Neuron case, no messages would be lost in the case of a single fault per server per window (time to replace a failed disk set at 3 hours from failure, through detection, to replacement per site – but that’s more an operational staffing concern, not the technology itself).

So, that’s 3.6 million messages per hour per server, at full load. We had a total of 98 servers doing these kinds of processing, not including web servers, databases, etc. Keep in mind that web servers would be communicating with other servers using nServiceBus, but that would maybe be an unfair comparison to the Neuron numbers.

Server Breakdown

Anyway, the 48 number crunching servers (blade centers) we had were at full load, so we were pumping more than 170 million messages there. Keep in mind that those servers had a really fast backbone so weren’t held up by IO. Your environment may be different.

Another 30 (regular pizza boxes) were doing our sagas. Saga state was stored in a distributed in-memory “cache”, so once again IO wasn’t an issue for processing those messages. We were at about 70% utilization there, coming to just over 100 million messages an hour.

The last 20 were clustered boxes (fairly expensive) that handled the various nServiceBus distributor and timeout manager processes were at full load since they handled control messages for all the servers as well as dynamically routing the load. However, on those boxes we used much higher performance disks for the messages, since they had to feed everything else, capable of doing, on average, around 5000 messages a second. That adds up to 360 million messages an hour.

Unnecessary Durability

Later, we moved a bunch of messages that didn’t need all that durability and transactionality off the disks, pushing the total throughput over 1 billion messages an hour. That was about 100 million per hour durable, 900 million per hour non-durable. You can guess that we were left with plenty of IO to spare at that point while we weren’t yet pushing the limit of our memory.

One thing that’s important to understand is the size of the messages that didn’t require durability was less than 1MB, with most weighing in under 10KB. Also, since most of those messages were published, less state management was required around them, enabling us to further improve performance.


NServiceBus didn’t give us all that by itself. It was the result of skilled architects, developers, and operations staff working together for many iterations, deploying, monitoring, re-designing, etc. You need to understand your technology, your hardware, and your specific performance, availability, and fault-tolerance requirements if you want to get anywhere.

There’s no magic.

I didn’t see the number or kinds of servers involved in the Neuron case study so this wasn’t ever really a comparison. Nor or we talking about the same system here.

So, please, don’t base your decisions on arbitrary numbers. Spend some time setting up a scaled down version of your target architecture with all the relevant technologies and measure. Be aware that you want high performance end to end, not just of the messaging part. At times, it makes sense to actively throw away messages (of the non-durable, published kind) to help a server come online faster especially after a restart.

Thus ends the tale of another “benchmark”.

Scalability Article up on InfoQ

Thursday, April 10th, 2008

I’ve published a new article on performance and scalability on InfoQ:

Spectacular Scalability with Smart Service Contracts

In this article, I attempt to debunk some of the myths around stateless-ness as the key to scalability.

Here’s how it starts:

It was a sunny day in June 2005 and our spirits were high as we watched the new ordering system we’d worked on for the past 2 years go live in our production environment. Our partners began sending us orders and our monitoring system showed us that everything looked good. After an hour or so, our COO sent out an email to our strategic partners letting them know that they should send their orders to the new system. 5 minutes later, one server went down. A minute after that, 2 more went down. Partners started calling in. We knew that we wouldn’t be seeing any of that sun for a while.

The system that was supposed to increase the profitability of orders from strategic partners crumbled. The then seething COO emailed the strategic partners again, this time to ask them to return to the old system. The weird thing was that although we had servers to spare, just a few orders from a strategic customer could bring a server to its knees. The system could scale to large numbers of regular partners, but couldn’t handle even a few strategic partners.

This is the story of what we did wrong, what we did to fix it, and how it all worked out.

Continue reading…

NServiceBus Explanations

Sunday, March 30th, 2008

Ayende’s been going over nServiceBus, seeing how it’s built, and raising various questions and concerns. I’ll begin by taking them from the outside, in – that is, first API questions, and then internal structure issues.


First of all, the effect of calling SendLocal on IBus takes all the logical messages passed in (params IMessage[] messages), wraps them in a single TransportMessage, and puts that physical message at the end of the local queue. This call is equivalent to calling “Send(TransportMessage m, string destination);” on ITransport when passing in transport.Address as the parameter of destination.

There are numerous advantages to having this method, but one is the most important.

When client send a service a set of messages using “void Send(params IMessage[] messages);”, the client is requesting that the server treat this batch of messages as a unit of work. Under certain conditions, the service may choose to ignore the clients wishes – not least of which because the client has sent a ton of messages and the service doesn’t want ACID transactions to last a long time as they hurt throughput. In this case the server would use an intercepting message handler to go over those messages and call SendLocal for each. In other words, the server can set up units of work as it sees fit – taking into account client preference as well.

Other advantages include the ability to break apart complex or long-running logic into an “internal pipeline”. The Timeout Manager also makes use of this facility for “holding onto” messages until some condition occurs.


The reason that integers are used as error codes is just so that you can push enums through them. This is the simplest way to get errors back to the client. More importantly, we take into account who on the client would be interested in this data.

Clients are often built using MVC with an additional Service Agent layer. Service Agents deal with translating the intent of Controllers into messages. Controllers don’t know about messaging, nor should they. However, they need to know when something fails with calls they initiated. As such, they are the final consumer of these error-code-enums, and integers are used to express them; that way Controllers don’t need to take a dependency on nServiceBus.


This method on bus is used by intercepting message handlers in order to instruct the bus not to pass the current message on to subsequent handlers in the pipeline. This is often used by authentication and authorization handlers when those checks fail. This is what makes the message handling pipeline possible.


This method is defined on IBuilder and is used by the bus when dispatching messages to handlers. The reason that this exists instead of just having the bus ask the builder to create the handler and dispatch the call itself has to do with client-side threading. You can find the full explanation here – Object Builder, the place to fix system-wide threading bugs.


NServiceBus has grown over the years in environments where I’ve had the luxury of deciding most, if not all of the design of the systems involved. As such, it has taken on just the responsibilities needed from infrastructure in order to develop robust, flexible, and scalable systems. Check out the nServiceBus site.


Don't miss my best content


Bryan Wheeler, Director Platform Development at msnbc.com
Udi Dahan is the real deal.

We brought him on site to give our development staff the 5-day “Advanced Distributed System Design” training. The course profoundly changed our understanding and approach to SOA and distributed systems.

Consider some of the evidence: 1. Months later, developers still make allusions to concepts learned in the course nearly every day 2. One of our developers went home and made her husband (a developer at another company) sign up for the course at a subsequent date/venue 3. Based on what we learned, we’ve made constant improvements to our architecture that have helped us to adapt to our ever changing business domain at scale and speed If you have the opportunity to receive the training, you will make a substantial paradigm shift.

If I were to do the whole thing over again, I’d start the week by playing the clip from the Matrix where Morpheus offers Neo the choice between the red and blue pills. Once you make the intellectual leap, you’ll never look at distributed systems the same way.

Beyond the training, we were able to spend some time with Udi discussing issues unique to our business domain. Because Udi is a rare combination of a big picture thinker and a low level doer, he can quickly hone in on various issues and quickly make good (if not startling) recommendations to help solve tough technical issues.” November 11, 2010

Sam Gentile Sam Gentile, Independent WCF & SOA Expert
“Udi, one of the great minds in this area.
A man I respect immensely.”

Ian Robinson Ian Robinson, Principal Consultant at ThoughtWorks
"Your blog and articles have been enormously useful in shaping, testing and refining my own approach to delivering on SOA initiatives over the last few years. Over and against a certain 3-layer-application-architecture-blown-out-to- distributed-proportions school of SOA, your writing, steers a far more valuable course."

Shy Cohen Shy Cohen, Senior Program Manager at Microsoft
“Udi is a world renowned software architect and speaker. I met Udi at a conference that we were both speaking at, and immediately recognized his keen insight and razor-sharp intellect. Our shared passion for SOA and the advancement of its practice launched a discussion that lasted into the small hours of the night.
It was evident through that discussion that Udi is one of the most knowledgeable people in the SOA space. It was also clear why – Udi does not settle for mediocrity, and seeks to fully understand (or define) the logic and principles behind things.
Humble yet uncompromising, Udi is a pleasure to interact with.”

Glenn Block Glenn Block, Senior Program Manager - WCF at Microsoft
“I have known Udi for many years having attended his workshops and having several personal interactions including working with him when we were building our Composite Application Guidance in patterns & practices. What impresses me about Udi is his deep insight into how to address business problems through sound architecture. Backed by many years of building mission critical real world distributed systems it is no wonder that Udi is the best at what he does. When customers have deep issues with their system design, I point them Udi's way.”

Karl Wannenmacher Karl Wannenmacher, Senior Lead Expert at Frequentis AG
“I have been following Udi’s blog and podcasts since 2007. I’m convinced that he is one of the most knowledgeable and experienced people in the field of SOA, EDA and large scale systems.
Udi helped Frequentis to design a major subsystem of a large mission critical system with a nationwide deployment based on NServiceBus. It was impressive to see how he took the initial architecture and turned it upside down leading to a very flexible and scalable yet simple system without knowing the details of the business domain. I highly recommend consulting with Udi when it comes to large scale mission critical systems in any domain.”

Simon Segal Simon Segal, Independent Consultant
“Udi is one of the outstanding software development minds in the world today, his vast insights into Service Oriented Architectures and Smart Clients in particular are indeed a rare commodity. Udi is also an exceptional teacher and can help lead teams to fall into the pit of success. I would recommend Udi to anyone considering some Architecural guidance and support in their next project.”

Ohad Israeli Ohad Israeli, Chief Architect at Hewlett-Packard, Indigo Division
“When you need a man to do the job Udi is your man! No matter if you are facing near deadline deadlock or at the early stages of your development, if you have a problem Udi is the one who will probably be able to solve it, with his large experience at the industry and his widely horizons of thinking , he is always full of just in place great architectural ideas.
I am honored to have Udi as a colleague and a friend (plus having his cell phone on my speed dial).”

Ward Bell Ward Bell, VP Product Development at IdeaBlade
“Everyone will tell you how smart and knowledgable Udi is ... and they are oh-so-right. Let me add that Udi is a smart LISTENER. He's always calibrating what he has to offer with your needs and your experience ... looking for the fit. He has strongly held views ... and the ability to temper them with the nuances of the situation.
I trust Udi to tell me what I need to hear, even if I don't want to hear it, ... in a way that I can hear it. That's a rare skill to go along with his command and intelligence.”

Eli Brin, Program Manager at RISCO Group
“We hired Udi as a SOA specialist for a large scale project. The development is outsourced to India. SOA is a buzzword used almost for anything today. We wanted to understand what SOA really is, and what is the meaning and practice to develop a SOA based system.
We identified Udi as the one that can put some sense and order in our minds. We started with a private customized SOA training for the entire team in Israel. After that I had several focused sessions regarding our architecture and design.
I will summarize it simply (as he is the software simplist): We are very happy to have Udi in our project. It has a great benefit. We feel good and assured with the knowledge and practice he brings. He doesn’t talk over our heads. We assimilated nServicebus as the ESB of the project. I highly recommend you to bring Udi into your project.”

Catherine Hole Catherine Hole, Senior Project Manager at the Norwegian Health Network
“My colleagues and I have spent five interesting days with Udi - diving into the many aspects of SOA. Udi has shown impressive abilities of understanding organizational challenges, and has brought the business perspective into our way of looking at services. He has an excellent understanding of the many layers from business at the top to the technical infrstructure at the bottom. He is a great listener, and manages to simplify challenges in a way that is understandable both for developers and CEOs, and all the specialists in between.”

Yoel Arnon Yoel Arnon, MSMQ Expert
“Udi has a unique, in depth understanding of service oriented architecture and how it should be used in the real world, combined with excellent presentation skills. I think Udi should be a premier choice for a consultant or architect of distributed systems.”

Vadim Mesonzhnik, Development Project Lead at Polycom
“When we were faced with a task of creating a high performance server for a video-tele conferencing domain we decided to opt for a stateless cluster with SQL server approach. In order to confirm our decision we invited Udi.

After carefully listening for 2 hours he said: "With your kind of high availability and performance requirements you don’t want to go with stateless architecture."

One simple sentence saved us from implementing a wrong product and finding that out after years of development. No matter whether our former decisions were confirmed or altered, it gave us great confidence to move forward relying on the experience, industry best-practices and time-proven techniques that Udi shared with us.
It was a distinct pleasure and a unique opportunity to learn from someone who is among the best at what he does.”

Jack Van Hoof Jack Van Hoof, Enterprise Integration Architect at Dutch Railways
“Udi is a respected visionary on SOA and EDA, whose opinion I most of the time (if not always) highly agree with. The nice thing about Udi is that he is able to explain architectural concepts in terms of practical code-level examples.”

Neil Robbins Neil Robbins, Applications Architect at Brit Insurance
“Having followed Udi's blog and other writings for a number of years I attended Udi's two day course on 'Loosely Coupled Messaging with NServiceBus' at SkillsMatter, London.

I would strongly recommend this course to anyone with an interest in how to develop IT systems which provide immediate and future fitness for purpose. An influential and innovative thought leader and practitioner in his field, Udi demonstrates and shares a phenomenally in depth knowledge that proves his position as one of the premier experts in his field globally.

The course has enhanced my knowledge and skills in ways that I am able to immediately apply to provide benefits to my employer. Additionally though I will be able to build upon what I learned in my 2 days with Udi and have no doubt that it will only enhance my future career.

I cannot recommend Udi, and his courses, highly enough.”

Nick Malik Nick Malik, Enterprise Architect at Microsoft Corporation
You are an excellent speaker and trainer, Udi, and I've had the fortunate experience of having attended one of your presentations. I believe that you are a knowledgable and intelligent man.”

Sean Farmar Sean Farmar, Chief Technical Architect at Candidate Manager Ltd
“Udi has provided us with guidance in system architecture and supports our implementation of NServiceBus in our core business application.

He accompanied us in all stages of our development cycle and helped us put vision into real life distributed scalable software. He brought fresh thinking, great in depth of understanding software, and ongoing support that proved as valuable and cost effective.

Udi has the unique ability to analyze the business problem and come up with a simple and elegant solution for the code and the business alike.
With Udi's attention to details, and knowledge we avoided pit falls that would cost us dearly.”

Børge Hansen Børge Hansen, Architect Advisor at Microsoft
“Udi delivered a 5 hour long workshop on SOA for aspiring architects in Norway. While keeping everyone awake and excited Udi gave us some great insights and really delivered on making complex software challenges simple. Truly the software simplist.”

Motty Cohen, SW Manager at KorenTec Technologies
“I know Udi very well from our mutual work at KorenTec. During the analysis and design of a complex, distributed C4I system - where the basic concepts of NServiceBus start to emerge - I gained a lot of "Udi's hours" so I can surely say that he is a professional, skilled architect with fresh ideas and unique perspective for solving complex architecture challenges. His ideas, concepts and parts of the artifacts are the basis of several state-of-the-art C4I systems that I was involved in their architecture design.”

Aaron Jensen Aaron Jensen, VP of Engineering at Eleutian Technology
Awesome. Just awesome.

We’d been meaning to delve into messaging at Eleutian after multiple discussions with and blog posts from Greg Young and Udi Dahan in the past. We weren’t entirely sure where to start, how to start, what tools to use, how to use them, etc. Being able to sit in a room with Udi for an entire week while he described exactly how, why and what he does to tackle a massive enterprise system was invaluable to say the least.

We now have a much better direction and, more importantly, have the confidence we need to start introducing these powerful concepts into production at Eleutian.”

Gad Rosenthal Gad Rosenthal, Department Manager at Retalix
“A thinking person. Brought fresh and valuable ideas that helped us in architecting our product. When recommending a solution he supports it with evidence and detail so you can successfully act based on it. Udi's support "comes on all levels" - As the solution architect through to the detailed class design. Trustworthy!”

Chris Bilson Chris Bilson, Developer at Russell Investment Group
“I had the pleasure of attending a workshop Udi led at the Seattle ALT.NET conference in February 2009. I have been reading Udi's articles and listening to his podcasts for a long time and have always looked to him as a source of advice on software architecture.
When I actually met him and talked to him I was even more impressed. Not only is Udi an extremely likable person, he's got that rare gift of being able to explain complex concepts and ideas in a way that is easy to understand.
All the attendees of the workshop greatly appreciate the time he spent with us and the amazing insights into service oriented architecture he shared with us.”

Alexey Shestialtynov Alexey Shestialtynov, Senior .Net Developer at Candidate Manager
“I met Udi at Candidate Manager where he was brought in part-time as a consultant to help the company make its flagship product more scalable. For me, even after 30 years in software development, working with Udi was a great learning experience. I simply love his fresh ideas and architecture insights.
As we all know it is not enough to be armed with best tools and technologies to be successful in software - there is still human factor involved. When, as it happens, the project got in trouble, management asked Udi to step into a leadership role and bring it back on track. This he did in the span of a month. I can only wish that things had been done this way from the very beginning.
I look forward to working with Udi again in the future.”

Christopher Bennage Christopher Bennage, President at Blue Spire Consulting, Inc.
“My company was hired to be the primary development team for a large scale and highly distributed application. Since these are not necessarily everyday requirements, we wanted to bring in some additional expertise. We chose Udi because of his blogging, podcasting, and speaking. We asked him to to review our architectural strategy as well as the overall viability of project.
I was very impressed, as Udi demonstrated a broad understanding of the sorts of problems we would face. His advice was honest and unbiased and very pragmatic. Whenever I questioned him on particular points, he was able to backup his opinion with real life examples. I was also impressed with his clarity and precision. He was very careful to untangle the meaning of words that might be overloaded or otherwise confusing. While Udi's hourly rate may not be the cheapest, the ROI is undoubtedly a deal. I would highly recommend consulting with Udi.”

Robert Lewkovich, Product / Development Manager at Eggs Overnight
“Udi's advice and consulting were a huge time saver for the project I'm responsible for. The $ spent were well worth it and provided me with a more complete understanding of nServiceBus and most importantly in helping make the correct architectural decisions earlier thereby reducing later, and more expensive, rework.”

Ray Houston Ray Houston, Director of Development at TOPAZ Technologies
“Udi's SOA class made me smart - it was awesome.

The class was very well put together. The materials were clear and concise and Udi did a fantastic job presenting it. It was a good mixture of lecture, coding, and question and answer. I fully expected that I would be taking notes like crazy, but it was so well laid out that the only thing I wrote down the entire course was what I wanted for lunch. Udi provided us with all the lecture materials and everyone has access to all of the samples which are in the nServiceBus trunk.

Now I know why Udi is the "Software Simplist." I was amazed to find that all the code and solutions were indeed very simple. The patterns that Udi presented keep things simple by isolating complexity so that it doesn't creep into your day to day code. The domain code looks the same if it's running in a single process or if it's running in 100 processes.”

Ian Cooper Ian Cooper, Team Lead at Beazley
“Udi is one of the leaders in the .Net development community, one of the truly smart guys who do not just get best architectural practice well enough to educate others but drives innovation. Udi consistently challenges my thinking in ways that make me better at what I do.”

Liron Levy, Team Leader at Rafael
“I've met Udi when I worked as a team leader in Rafael. One of the most senior managers there knew Udi because he was doing superb architecture job in another Rafael project and he recommended bringing him on board to help the project I was leading.
Udi brought with him fresh solutions and invaluable deep architecture insights. He is an authority on SOA (service oriented architecture) and this was a tremendous help in our project.
On the personal level - Udi is a great communicator and can persuade even the most difficult audiences (I was part of such an audience myself..) by bringing sound explanations that draw on his extensive knowledge in the software business. Working with Udi was a great learning experience for me, and I'll be happy to work with him again in the future.”

Adam Dymitruk Adam Dymitruk, Director of IT at Apara Systems
“I met Udi for the first time at DevTeach in Montreal back in early 2007. While Udi is usually involved in SOA subjects, his knowledge spans all of a software development company's concerns. I would not hesitate to recommend Udi for any company that needs excellent leadership, mentoring, problem solving, application of patterns, implementation of methodologies and straight out solution development.
There are very few people in the world that are as dedicated to their craft as Udi is to his. At ALT.NET Seattle, Udi explained many core ideas about SOA. The team that I brought with me found his workshop and other talks the highlight of the event and provided the most value to us and our organization. I am thrilled to have the opportunity to recommend him.”

Eytan Michaeli Eytan Michaeli, CTO Korentec
“Udi was responsible for a major project in the company, and as a chief architect designed a complex multi server C4I system with many innovations and excellent performance.”

Carl Kenne Carl Kenne, .Net Consultant at Dotway AB
“Udi's session "DDD in Enterprise apps" was truly an eye opener. Udi has a great ability to explain complex enterprise designs in a very comprehensive and inspiring way. I've seen several sessions on both DDD and SOA in the past, but Udi puts it in a completly new perspective and makes us understand what it's all really about. If you ever have a chance to see any of Udi's sessions in the future, take it!”

Avi Nehama, R&D Project Manager at Retalix
“Not only that Udi is a briliant software architecture consultant, he also has remarkable abilities to present complex ideas in a simple and concise manner, and...
always with a smile. Udi is indeed a top-league professional!”

Ben Scheirman Ben Scheirman, Lead Developer at CenterPoint Energy
“Udi is one of those rare people who not only deeply understands SOA and domain driven design, but also eloquently conveys that in an easy to grasp way. He is patient, polite, and easy to talk to. I'm extremely glad I came to his workshop on SOA.”

Scott C. Reynolds Scott C. Reynolds, Director of Software Engineering at CBLPath
“Udi is consistently advancing the state of thought in software architecture, service orientation, and domain modeling.
His mastery of the technologies and techniques is second to none, but he pairs that with a singular ability to listen and communicate effectively with all parties, technical and non, to help people arrive at context-appropriate solutions. Every time I have worked with Udi, or attended a talk of his, or just had a conversation with him I have come away from it enriched with new understanding about the ideas discussed.”

Evgeny-Hen Osipow, Head of R&D at PCLine
“Udi has helped PCLine on projects by implementing architectural blueprints demonstrating the value of simple design and code.”

Rhys Campbell Rhys Campbell, Owner at Artemis West
“For many years I have been following the works of Udi. His explanation of often complex design and architectural concepts are so cleanly broken down that even the most junior of architects can begin to understand these concepts. These concepts however tend to typify the "real world" problems we face daily so even the most experienced software expert will find himself in an "Aha!" moment when following Udi teachings.
It was a pleasure to finally meet Udi in Seattle Alt.Net OpenSpaces 2008, where I was pleasantly surprised at how down-to-earth and approachable he was. His depth and breadth of software knowledge also became apparent when discussion with his peers quickly dove deep in to the problems we current face. If given the opportunity to work with or recommend Udi I would quickly take that chance. When I think .Net Architecture, I think Udi.”

Sverre Hundeide Sverre Hundeide, Senior Consultant at Objectware
“Udi had been hired to present the third LEAP master class in Oslo. He is an well known international expert on enterprise software architecture and design, and is the author of the open source messaging framework nServiceBus. The entire class was based on discussion and interaction with the audience, and the only Power Point slide used was the one showing the agenda.
He started out with sketching a naive traditional n-tier application (big ball of mud), and based on suggestions from the audience we explored different solutions which might improve the solution. Whatever suggestions we threw at him, he always had a thoroughly considered answer describing pros and cons with the suggested solution. He obviously has a lot of experience with real world enterprise SOA applications.”

Raphaël Wouters Raphaël Wouters, Owner/Managing Partner at Medinternals
“I attended Udi's excellent course 'Advanced Distributed System Design with SOA and DDD' at Skillsmatter. Few people can truly claim such a high skill and expertise level, present it using a pragmatic, concrete no-nonsense approach and still stay reachable.”

Nimrod Peleg Nimrod Peleg, Lab Engineer at Technion IIT
“One of the best programmers and software engineer I've ever met, creative, knows how to design and implemet, very collaborative and finally - the applications he designed implemeted work for many years without any problems!

Jose Manuel Beas
“When I attended Udi's SOA Workshop, then it suddenly changed my view of what Service Oriented Architectures were all about. Udi explained complex concepts very clearly and created a very productive discussion environment where all the attendees could learn a lot. I strongly recommend hiring Udi.”

Daniel Jin Daniel Jin, Senior Lead Developer at PJM Interconnection
“Udi is one of the top SOA guru in the .NET space. He is always eager to help others by sharing his knowledge and experiences. His blog articles often offer deep insights and is a invaluable resource. I highly recommend him.”

Pasi Taive Pasi Taive, Chief Architect at Tieto
“I attended both of Udi's "UI Composition Key to SOA Success" and "DDD in Enterprise Apps" sessions and they were exceptionally good. I will definitely participate in his sessions again. Udi is a great presenter and has the ability to explain complex issues in a manner that everyone understands.”

Eran Sagi, Software Architect at HP
“So far, I heard about Service Oriented architecture all over. Everyone mentions it – the big buzz word. But, when I actually asked someone for what does it really mean, no one managed to give me a complete satisfied answer. Finally in his excellent course “Advanced Distributed Systems”, I got the answers I was looking for. Udi went over the different motivations (principles) of Services Oriented, explained them well one by one, and showed how each one could be technically addressed using NService bus. In his course, Udi also explain the way of thinking when coming to design a Service Oriented system. What are the questions you need to ask yourself in order to shape your system, place the logic in the right places for best Service Oriented system.

I would recommend this course for any architect or developer who deals with distributed system, but not only. In my work we do not have a real distributed system, but one PC which host both the UI application and the different services inside, all communicating via WCF. I found that many of the architecture principles and motivations of SOA apply for our system as well. Enough that you have SW partitioned into components and most of the principles becomes relevant to you as well. Bottom line – an excellent course recommended to any SW Architect, or any developer dealing with distributed system.”

Consult with Udi

Guest Authored Books

Creative Commons License  © Copyright 2005-2011, Udi Dahan. email@UdiDahan.com