Asynchronous, High-Performance Login for Web Farms
Saturday, November 10th, 2007.Often during my consulting engagements I run into people who say, "some things just can’t be made asynchronous" even after they agree about the inherent scalability that asynchronous communications pattern bring. One often-cited example is user authentication – taking a username and password combo and authenticating it against some back-end store. For the purpose of this post, I’m going to assume a database. Also, I’m not going to be showing more advanced features like ETags to further improve the solution.
The Setup
Just so that the example is in itself secure, we’ll assume that the password is one-way hashed before being stored. Also, given a reasonable network infrastructure our web servers will be isolated in the DMZ and will have to access some application server which, in turn, will communicate with the DB. There’s also a good chance for something like round-robin load-balancing between web servers, especially for things like user login.
Before diving into the meat of it, I wanted to preface with a few words. One of the commonalities I’ve found when people dismiss asynchrony is that they don’t consider a real deployment environment, or scaling up a solution to multiple servers, farms, or datacenters.
The Synchronous Solution
In the synchronous solution, each one of our web servers will be contacting the app server for each user login request. In other words, the load on the app server and, consequently, on the database server will be proportional to the number of logins. One property of this load is its data locality, or rather, the lack of it. Given that user U logged in, the DB won’t necessarily gain any performance benefits by loading all username/password data into memory for the same page as user U. Another property is that this data is very non-volatile – it doesn’t change that often.
I won’t go to far into the synchronous solution since its been analysed numerous times before. The bottom line is that the database is the bottleneck. You could use sharding solutions. Many of the large sites have numerous read-only databases for this kind of data, with one master for updates – replicating out to the read-only replicas. That’s great if you’re using a nice cheap database like mySql (of LAMP), not so nice if you’re running Oracle or MS Sql Server.
Regardless of what you’re doing in your data tier, you’re there. Wouldn’t it be nice to close the loop in the web servers? Even if you are using Apache, that’s going to be less iron, electricity, and cooling all around. That’s what the asynchronous solution is all about – capitalizing on the low cost of memory to save on other things.
The Asynchronous Solution
In the asynchronous solution, we cache username/hashed-password pairs in memory on our web servers, and authenticate against that. Let’s analyse how much memory that takes:
Usernames are usually 12 characters or less, but let’s take an average of 32 to be sure. Using Unicode we get to 64 bytes for the username. Hashed passwords can run between 256 and 512 bits depending on the algorithm, divide by 8 and you have 64 bytes. That’s about 128 bytes altogether. So we can safely cache 8 million of these with 1GB of memory per web server. If you’ve got a million users, first of all, good for you 🙂 Second, that’s just 128 MB of memory – relatively nothing even for a cheap 2GB web server.
Also, consider the fact that when registering a new user we can check if such a username is already taken at the web server level. That doesn’t mean it won’t be checked again in the DB to account for concurrency issues, but that the load on the DB is further reduced. Other things to notice include no read-only replicas and no replication. Simple. Our web servers are the "replicas".
The Authentication Service
What makes it all work is the "Authentication Service" on the app server. This was always there in the synchronous solution. It is what used to field all the login requests from the web servers, and, of course, allowed them to register new users and all the regular stuff. The difference is that now it publishes a message when a new user is registered (or rather, is validated – all a part of the internal long-running workflow). It also allows subscribers to receive the list of all username/hashed-password pairs. It’s also quite likely that it would keep the same data in memory too.
The same message can be used to publish both single updates, and returning the full list when using NServiceBus. Let’s define the message:
[Serializable]
public class UsernameInUseMessage : IMessage
{
private string username;
public string Username
{
get { return username; }
set { username = value; }
}
private byte[] hashedPassword;
public byte[] HashedPassword
{
get { return hashedPassword; }
set { hashedPassword = value; }
}
}
And the message that the web server sends when it wants the full list:
[Serializable]
public class GetAllUsernamesMessage : IMessage
{
}
And the code that the web server runs on startup looks like this (assuming constructor injection):
public class UserAuthenticationServiceAgent
{
public UserAuthenticationServiceAgent(IBus bus)
{
this.bus = bus;
bus.Subscribe(typeof(UsernameInUseMessage));
bus.Send(new GetAllUsernamesMessages());
}
}
And the code that runs in the Authentication Service when the GetAllUsernamesMessage is received:
public class GetAllUsernamesMessageHandler : BaseMessageHandler<GetAllUsernamesMessage>
{
public override void Handle(GetAllUsernamesMessage message)
{
this.Bus.Reply(Cache.GetAll<UsernameInUseMessage>());
}
}
And the class on the web server that handles a UsernameInUseMessage when it arrives:
public class UsernameInUseMessageHandler : BaseMessageHandler<UsernameInUseMessage>
{
public override void Handle(UsernameInUseMessage message)
{
WebCache.SaveOrUpdate(message.Username, message.HashedPassword);
}
}
When the app server sends the full list, multiple objects of the type UsernameInUseMessage are sent in one physical message to that web server. However, the bus object that runs on the web server dispatches each of these logical messages one at a time to the message handler above.
So, when it comes time to actually authenticate a user, this the web page (or controller, if you’re doing MVC) would call:
public class UserAuthenticationServiceAgent
{
public bool Authenticate(string username, string password)
{
byte[] existingHashedPassword = WebCache[username];
if (existingHashedPassword != null)
return existingHashedPassword == this.Hash(password);
return false;
}
}
When registering a new user, the web server would of course first check its cache, and then send a RegisterUserMessage that contained the username and the hashed password.
[Serializable]
[StartsWorkflow]
public class RegisterUserMessage : IMessage
{
private string username;
public string Username
{
get { return username; }
set { username = value; }
}
private string email;
public string Email
{
get { return email; }
set { email = value; }
}
private byte[] hashedPassword;
public byte[] HashedPassword
{
get { return hashedPassword; }
set { hashedPassword = value; }
}
}
When the RegisterUserMessage arrives at the app server, a new long-running workflow is kicked off to handle the process:
public class RegisterUserWorkflow :
BaseWorkflow<RegisterUserMessage>, IMessageHandler<UserValidatedMessage>
{
public void Handle(RegisterUserMessage message)
{
//send validation request to message.Email containing this.Id (a guid)
// as a part of the URL
}
/// <summary>
/// When a user clicks the validation link in the email, the web server
/// sends this message (containing the workflow Id)
/// </summary>
/// <param name="message"></param>
public void Handle(UserValidatedMessage message)
{
// write user to the DB
this.Bus.Publish(new UsernameInUseMessage(
message.Username, message.HashedPassword));
}
}
That UsernameInUseMessage would eventually arrive at all the web servers subscribed.
Performance/Security Trade-Offs
When looking deeper into this workflow we realize that it could be implemented as two separate message handlers, and have the email address take the place of the workflow Id. The problem with this alternate, better performing solution has to do with security. By removing the dependence on the workflow Id, we’ve in essence stated that we’re willing to receive a UserValidatedMessage without having previously received the RegisterUserMessage.
Since the processing of the UserValidatedMessage is relatively expensive – writing to the DB and publishing messages to all web servers, a malicious user could perform a denial of service (DOS) attack without that many messages, thus flying under the radar of many detection systems. Spoofing a guid that would result in a valid workflow instance is much more difficult. Also, since workflow instances would probably be stored in some in-memory, replicated data grid the relative cost of a lookup would be quite small – small enough to avoid a DOS until a detection system picked it up.
Improved Bandwidth & Latency
The bottom line is that you’re getting much more out of your web tier this way, rather than hammering your data tier and having to scale it out much sooner. Also, notice that there is much less network traffic this way. Not such a big deal for usernames and passwords, but other scenarios built in the same way may need more data. Of course, the time it takes us to log a user in is much shorter as well since we don’t have to cross back and forth from the web server (in the DMZ) to the app server, to the db server.
The important thing to remember in this solution is doing pub/sub. NServiceBus merely provides a simple API for designing the system around pub/sub. And publishing is where you get the serious scalability. As you get more users, you’ll obviously need to get more web servers. The thing is that you probably won’t need more database servers just to handle logins. In this case, you also get lower latency per request since all work needed to be done can be done locally on the server that received the request.
ETags make it even better
For the more advanced crowd, I’ll wrap it up with the ETags. Since web servers do go down, and the cache will be cleared, what we can do is to write that cache to disk (probably in a background thread), and "tag" it with something that the server gave us along with the last UsernameInUseMessage we received. That way, when the web server comes back up, it can send that ETag along with its GetAllUsernamesMessage so that the app server will only send the changes that occurred since. This drives down network usage even more at the insignificant cost of some disk space on the web servers.
And in closing…
Even if you don’t have anything more than a single physical server today, and it acts as your web server and database server, this solution won’t slow things down. If anything, it’ll speed it up. Regardless, you’re much better prepared to scale out than before – no need to rip and replace your entire architecture just as you get 8 million Facebook users banging down your front door.
So, go check out NServiceBus and get the most out of your iron.
If you liked this article, you might also like articles in these categories: Architecture | Autonomous Services | Availability | Caching | Data Access | Databases | Development | ESB | NServiceBus | Performance | Pub/Sub | Scalability | Security | SOA | Web Services | Workflow
If you've got a minute, you might enjoy taking a look at some of my best articles. I've gone through the hundreds of articles I've written over the past 6 years and put together a list of the best ones as ranked by my 5000+ readers. You won't be disappointed. If you'd like to get new articles sent to you when they're published, it's easy and free. Subscribe right here. Follow me on Twitter @UdiDahan. Something on your mind? Got a question? I'd be thrilled to hear it. Leave a comment below or email me, whatever works for you. 13 CommentsYour comment... |
May 31st, 2008 at 4:56 am
[…] performance and scalable web applications based on the principles I outlined in my previous post Asynchronous, High Performance Login for Web Farms. I’ll also be giving a more interactive session on How to Avoid a Failed SOA, and coming in […]
June 27th, 2008 at 8:33 am
What if you need to lock the account after three invalid attempts to login? You can still refuse invalid usernames on web server, but you’ll need to synchronize the number of invalid logins (valid username, invalid password) between all web servers after each invalid login, and/or write it to the database. I think it’s quite common feature.
June 29th, 2008 at 1:59 am
Daniel,
Here’s how to do it:
On each invalid login, send an InvalidLoginMessage.
The “service-side” handles that message and writes the number of attempts. If the number of attempts is greater than N, publish an AccountLockedMessage.
When web servers get that notification, they mark the account as locked preventing further attempts to login.
This solution leaves a window of time open (probably less than a second) where a user may be able to attempt to login more than N times. On the other hand, the solution is much more scalable than regular locking based solutions.
The tradeoff should be made at the system level, and not the feature level. I think that it’s more than acceptable.
October 28th, 2008 at 9:14 am
I’m a different Daniel from #2, but have a related question. I’m still trying to get my head around this concept and have two questions:
-How are foreign keys and unique constraints handled in this setup? For example, if there is a unique constraint on email in the database, but the web server allowed a user to register the same email twice. Is it expected that all such constraints should be enforced in the service in addition to the database?
-Are there, in fact, cases where “things can’t be made asynchronous”? When designing a system, do you have any guidelines for when to/not to use nServiceBus and event-driven design?
October 29th, 2008 at 7:27 am
Daniel #4,
Foreign keys and unique constraints are still enforced in the DB. The web server, by having a copy of emails locally, can avoid calling through to the DB for user registrations which are clearly invalid. Still, the web server doesn’t add users on its own volition, but submits a request through to the DB for that.
The point of this web tier caching is to reduce load on the DB, not to replace it entirely.
There are cases where “things can’t be made asynchronous”, but less than you might think. I do have guidelines around them, but they fill up a 5-day course and are very much tied to understanding the specifics of your business domain.
Hope that helps.
December 19th, 2008 at 12:33 am
Udi,
Do you’ve any such guidelines, when to and not to make things asynchronous.
Regards,
Jai
December 19th, 2008 at 2:07 pm
Jai,
Here’s some:
If you can ensure that the call will be local, you can often make it synchronous.
If you suspect that a call will be remote, you should prefer to make it asynchronous / non-blocking.
February 22nd, 2009 at 4:13 am
[…] post, he follows the design I described a while back on using messaging for user management and login for a high-scale web scenario. In his comments, he agrees with the above stating: “I certainly think that a similar […]
March 3rd, 2009 at 3:17 am
[…] DDD) pridáva praktickĂ©(nielen design) argumenty a ilustruje ich na user logon scenári – Asynchronous, High-Performance Login for Web Farms. Udi ÄŹalej uzatvára diskusiu o zmysle messagingu v aplikáciach medzi Ayendem a Gregom Youngom […]
May 4th, 2009 at 1:54 am
Back to the question about registering twice (unique constraints on email for instance): how would the GUI handle this situation?
Scenario:
1) User registers new name/email. This account does not exist on the webserver, so the user gets the immediate answer “Ok”.
2) The register-new-user message arrives in the app server where a clash is detected.
How is the user informed of this cancellation of his account?
Thanks, Jørn
May 4th, 2009 at 2:15 am
Jørn,
Since the app server actually handles the process of registering the user, the user doesn’t receive an OK until they go through the app server (and they confirm their email address).
The purpose of the web server cache is primarily to offload the reads from the database giving us near linear scalability in adding web servers for the login function. While it also serves us for first-level conflict detection, it isn’t the authoritative source of information.
Does that answer your question?
May 11th, 2009 at 12:01 pm
I am evaluating nServiceBus for an upcoming project, and I need the functionality described in this article of the BaseMessageHandler. Unfortunately, I cannot locate this reference anywhere in the 1.9 RTM I have installed. My events are currently inheriting IMessageHandler, but I need to use this.Bus.Reply(), and I can’t. Have I missed something?
May 12th, 2009 at 2:44 pm
Scott,
All you need to do is define a public property on your message handler of the type IBus, and it’ll work.
Feel free to ask these questions on the discussion group as well:
http://tech.groups.yahoo.com/group/nservicebus/messages