Thursday, 17 August 2017

Product Review - Wago Connectors

I was re-wiring my garage recently, when I got fed up screwing wires into choc blocks. I figured someone must have come up with a better way of connecting wires together, got Googling, and found these guys - Wago Connectors:

Wago make lots of different kinds of connectors, some of which are re-usable. Those are the ones I went for. For historical reasons, there are two kinds of re-usable connectors. The 222s:

And newer 221s:

Both kinds come in 2-way, 3-way and 5-way forms. (For connecting the respective number of wires together.) The 221s are slightly more expensive, and take up about 40% less space. But they do the same job of letting you join wires together. Potentially wires of different gauges (such as when connecting twin and earth solid-core to multi-core flex cable used by most appliances in the UK.)

I can highly-recommend these useful little guys. They sped up the job considerably, and have proven very reliable in use.

I don’t have a fidget spinner, so I kept a few of these connectors on my desk over the next month or so to footer with whilst coding. Opening and closing the levers repeatedly. From that unscientific "test", I can say that the 222s are quite a bit more robust than the 221s. After a few hundred “opening and closing” operations on their levers, the more expensive 221 wouldn’t stay open fully any more. It is still usable, and I could hold it open whilst inserting a wire if I really needed to. But then it becomes just as fiddly to use as a choc block. So if you're going to be installing/uninstalling and re-building a lot, I'd say go for the 222s. If weight is a primary concern (e.g., building a drone) then use the 221s or just solder and accept that greater build time and reduced ability to dis-assemble is the price you pay for less weight.

On the upside, the levers on the 221s are considerably easier to open. Though neither is particularly difficult. There is a dedicated tool for opening them that costs over £100, but really it's a ridiculously over-engineered solution that I can't image anybody needing. Even people that are installing these all day would have no difficulty opening them with just their fingers.

The first time you open one of the 222s, you’ll be unsure if it’s broken. Because its jaws initially open to about half way quite easily, then you need to use substantially more force to open the lever all the way. It can also give you a nasty “mouse trap” snap on your fingers if you’re not careful whilst you close the lever to clamp your wire in place.

Over all, I think I’ll be using the cheaper 222s where space isn’t a consideration. To that end, I bought a box of the 3-way and 2-way 222s, and a box of the 5-way 221s. (Since when I need to connect 5 wires together, that’s usually when space is tightest.)

With regard to their ratings, I'm honestly not quite sure what amperage / voltage they can take. The problem is there are two ratings on each model. (Presumably to satisfy more than one set of tests for different markets.) 

The 222s are rated at "20A 300V" on one side and "600V" on the other side. The 221s have labels showing they are variously rated at "450V 32A" or "20A 300V". Confused?, you will be? Here is a YouTube video of someone actually burning the things out to test their limits

In practical use, I've had no problems having about 10 of these things in the same switch. I've also used three in series on the same circuit.

2-way 222 connectors: £13.23 for a pack of 50 @ Screwfix 

3-way 222 connectors: £15.13 for a pack of 50 @ Screwfix

5-way 221 connectors: £13.80 for a pack of 25 @ Screwfix 

Addendum: Thelma quite enjoyed these little devices too. She reports that the 222s, being rounder, are 50% “more chasy” than the “boring” more square 221s. They therefore fly faster when she bats them with her paws to simulate spontaneous movement.

Friday, 25 December 2015

Building a Total Quality Software environment, with Continuous Integration, Unit Testing, and Dependency Injection. And Futurama.

Recently at work, I’ve been working with my colleagues to set up a Total Quality software environment. I’ve been learning a lot from my peers about topics such as VMware, RTI and Code-First EF. (I’d previously used Schema-First, but Code First brings its own advantages and challenges). What I brought to the party was some project experience in: 

  • Continuous Integration platforms (specifically in this case, TeamCity.)
  • Unit Testing and Test-Driven Development techniques.
  • Dependency Injection to support writing testable code.
  • NAnt scripting.
  • Futurama.

We’ll get to that last one in a minute. Let’s go through the others in order first.

Continuous Integration (CI)

Everygeek who’s anynerd is using it these days. But lots of development teams and companies still avoid it, imagining it to be too difficult, too time-consuming, or just not worth the hassle. (For that matter, those same fallacious criticisms can be levelled at every other item in the list above too. Except Futurama.) A decade ago people used to say the same things about Source Control; thankfully there aren’t too many teams I encounter these days that haven’t got their head around how important that is.

Some teams aren’t even sure what CI is, what it does, or what advantages it brings. They’ve always worked by developers just producing software on their own PCs. And they just deal with any time-consuming fallout when it comes to making that software work in the real world as part of the cost of doing business.

OK, so here’s the unique selling point if you’re trying to make the case for introducing this where you work. Are you ready? What CI adds to your team’s game is simply this: repeatable, verifiable deployment. 

Unit Testing and Test-Driven Development techniques 

Unit Testing has been around for a Very Long Time. I know a lot of people who are otherwise very good developers but who “don’t see the point” of unit testing. And I have been such a developer myself in the murky past. 

The misconception that unit testing is pointless generally comes down to a few fallacies:

  • They believe that their own code always works.
  • The wider team and stakeholders place more value on quantity of new features than upon quality of existing features.
  • They believe that they will always personally be around to ensure that their code doesn’t get broken in the future.

Like most good fallacies, there’s just enough truth in most of these to preserve the illusion that unit testing doesn’t provide enough advantages to the person that has to implement it. Not when compared to the opportunity costs of them learning how to do it, or the kudos of pushing out new features (that don’t work as intended.)

Part of the reason more developers don’t give it a go, is that you have to change the way you write code. Most code I’ve seen in the wild is tightly-coupled. This is a phrase that many developers are familiar with, but in my experience vanishingly few know what it means. Basically, it means that if you are writing Class A, and your class depends upon Class B to do its job, your class will instantiatiate a new instance of Class B itself. This means that if Class B stops working, all you (and Users) know is that your class “doesn’t work.” They won't care if your code is perfect, and it's just that damn Class B that let you down.

So, when doing test-driven development, developers need to add another couple of skills to their arsenal. Which brings us to… 

Dependency Injection (DI)

One type of Tight Coupling is defined above. Code is also tightly coupled when it is too closely tied to one UI. So, if you’re a developer that puts all their business logic in code-behind files or controller actions, your code won’t be testable. Because your code needs the UI to do its job, before it will be able to be verified.

Fortunately, there are frameworks and coding styles out there that help developers implement loose coupling, to make their code independently testable. 

The basic idea behind all of these is that instead of your Class A consuming Class B directly to perform some function, it consumes Interface B instead. That is, some object that Class A doesn’t instantiate itself, satisfies some interface that represents the job Class B was doing for Class A. Typically this is achieved by making the constructor of Class A look like this :

 The above pattern is known as Constructor Injection. What it gives you is the ability to swap out whatever is implementing Interface B when it comes to unit testing Class A. So, instead of the object that really does implement Interface B in live use, you can use what is called a mock instance of Interface B. That is typically some object that always gives you anticipated responses. So you can concentrate on testing Class A. That way, any errors you see can be wholly attributed to Class A.

When you write your classes using the Constructor Injection pattern demonstrated above, DI frameworks provide concrete implementations of objects that implement interfaces at runtime. So, you 'magically' find a usable implementation or Interface B available in Class A's constructor. As the developer of Class A, you don't care particularly about where that implementation of Interface B comes from; that is the responsibility and concern of the developer of Interface B and your chosen DI framework.

This is just one of the techniques that developers moving from code that "just works" need to learn if they want their code to be verifiable. It is difficult to embrace. Because frankly writing code that "just works" is hard enough. And because using these techniques opens up the possibility of developers having to recognise errors in their own code. But unit testing also brings with it a huge number of advantages: The ability to prove that a given piece of code works, not just at the time of writing but every single time you build. And it protects your work from being modified in adverse ways by subsequent developers. 

Unit testing and Dependency Injection are whole topics on their own, so I won't say more about them here. (I'll perhaps save that for future blogs.) With regard to understanding tight and loose coupling, though, I'll leave you with an analogy. If a traveller wants to get to some destination, they don’t need to know what the bus driver’s name will be, the vehicle registration, what type of fuel the bus uses, etc. They just need to know what bus stop to be at, what time, and what is the correct bus number to get on. Similarly, Class A doesn’t need to know everything about Class B or where it comes from. It just needs to know that when it requires an object to do some job, one will be available at an agreed time. Class A instantiating Class B itself is analogous to a traveller trying to build their own bus.

Last time I checked, there were something like 22 DI frameworks that you can use with .Net. The one I implemented at work recently is called Castle Windsor, which I’ve been using for a few years. In benchmark tests it’s not the fastest. It’s not the simplest. And it’s not the most customisable/powerful. But it is the one that for my money strikes the right balance between those competing factors. And it integrates particularly well with ASP.Net MVC and Entity Framework. 

NAnt Scripting 

Continuous Integration platforms on their own give you a powerful way of automating builds and deployments. However, there are advantages to be gained to farming out some of that work to a more specialised tool. NAnt is one such tool.

For any system that gets developed, there are typically 10-25 individual “jobs” that are involved in setting up a copy of the system that Testers and ultimately Users can access. e.g, for a web app you might need to:

  • Create some Virtual Directories in IIS.
  • Copy the files that the website is made of into the folders those VDs point at.
  • Customise a web config that tells the site how to access the underlying database.
  • Create the underlying database in SQL Server.
  • Populate the database with data.
  • Create an App Pool in IIS under which the site will run.
  • Grant the relevant App Pool access to the database

You’d also be well-advised to have steps that involve:

  • Running unit tests, so you don’t deploy broken code.
  • Updating Assembly Information so that each build has an identifying number. That way, bugs can be reported against specific builds.
  • Backing up any prior version so that you can rollback any of the above steps if the deployment fails.

If you put these in a script that lives in your project instead of in build steps on your CI server, you can more easily mirror steps between different branches in your builds. 


One of the things that motivates me is getting to have a bit of fun whilst I work. In the team I joined a few months ago, there has been one common theme tying all of the above threads together: Futurama.

Myself and my colleagues have set up about 10 Windows Server 2012 machines that perform various jobs. e.g., One of them is a Domain Controller. Another is our CI server. Several more act as paired web and sql servers that can be temporarily allocated to testing, by internal testers or by end users. Or they can be used by developers to test the deployment process.

Each of our VMs is named after a Futurama character and has its own distinct colour scheme. (NB: They have a fully-qualified name too, like DVL-SQLALPHA, that describes their actual role.) This helps developers stay oriented when RDP-ing around what would otherwise be nearly-identical machines. It’s also fun.  

You saw how TeamCity / Professor Farnsworth looked above. This is how one of our Web Servers, characterised after Zapp Brannigan, looks. As you can see, it's easy to tell which VM you're on, even from a distance:


There are Futurama-themed Easter Eggs hidden in other parts of our build process too. e.g., each CI build produces a log file. At the end of which, a build gets reported as “Successful” or “Failed” for some detailed reason. A recent evening in my own time, I wanted to test implementing custom NAnt functions. (NAnt is written in C#, and you can write functions in C# to augment what it does.) In order to test this with something non-critical, I augmented that custom “Success” or “Failure” method thus:

The exact piece of ASCII art that gets rendered reflects whether the build was successful or not, and is semi-random. So, you might get Hermes with a brain slug saying something dumb if the build is broken. Or you might get Professor Farnsworth announcing “Good news, everyone!” if all went as planned.

These 'features' are of course whimsical. But at worst they give developers a smile during some of the tougher moments of the job. And at best they give you a chance to test out new techniques on non-critical features. As well as giving your brain a rest between more intensive tasks.

The best teams  I’ve worked with all knew their onions on a technical level, but also knew when to have fun too. I'm glad to be working in such a team at present. e.g., I recently implemented the following function:

My colleague Ian made me chuckle when I discovered this in our code repository a few weeks later:

Saturday, 10 October 2015

Product Review - Wrappz Laptop Decals and Custom Phone Skins

Like many developers, I get my money's worth out of the laptops I buy. Sometimes it seems I use them every minute of the day. And, over the years, I've accumulated quite a collection of physical machines in addition to the various VMs I have carrying out miscellaneous tasks around the house.

I secretly love the obviously-marketed-at-women ones that have cases made of pink brushed aluminium and the like. But, also being a professional developer, I have to say that almost always that fancy case comes accompanied by last year's technology (or older.) It simply isn't a good business decision to buy them when you review the spec.

As a wise philosopher once said, I'm a Barbie Girl in a Barbie World.

So, I often end up buying machines that have phenomenally-fast dual/quad processors with acres of RAM capable of running lots of memory-intensive applications concurrently. And I switch out the standard platter drive for a 1 TB SSD. (I usually also swap out the optical drive for a second 1TB SSD. And 2TB SSDs that I've not yet had a chance to get my hands on have also now become available, so my next dev machine will have 4TB total, but that's really a different review.)

Anyway, for some reason the most performant laptops always seem to come in boring black boxes. When you acquire a few of these over the years, it becomes difficult to tell them apart. So for a few years now I've been putting decals on the back and naming the machine according to which decal adorns it. I also make the login screen and desktop background of these machines match the decal. This all helps keep you oriented when you're navigating around, RDP-ing from one machine to another.

TaylorHe Decals

Up to now, my go-to decal manufacturer has been TaylorHe. They do some very nice pre-made patterns that suit almost every taste. With my new work laptop, however, I fancied doing something a bit more bespoke. I'm a huge Breaking Bad fan, so I wanted a machine that had a theme related to that. 

Since the device that I usually take pictures with is in this photo, my friend Ian O'Friel
kindly helped me take this. Which made it a much better photo than it would otherwise have
been as he has a real eye for photography. (You can see more of Ian's fab photos here. I
particularly like the one of the old rusty gate and the South Side At Night.)
Looking around, I found a company called Wrappz that provides exactly this type of product. Not only do they produce decals with custom designs, but they also print them on a custom-sized sheet. So you don't have to trim them to fit your machine. This may seem like a small advantage, but it was nice just to be able to use them out of the box like that rather than messing around with a scalpel or scissors.

Like TaylorHe, Wrappz also do custom phone cases. So I got one of those to match the decal. (Not that I actually own a phone, incidentally - I'm one of the few people I know that doesn't use one, and doesn't miss it. I have a Samsung Galaxy S2 'phone', but it acts as a personal organiser rather than as a communication device. I only put a temporary SIM in it when I have a reason to, which is almost never.)

If you want to order some of these decals / phone cases for yourself, here are some discount codes you can use to get them more cheaply. NB: I've got no commercial relationship with Wrappz, and I haven't benefitted in any way from this review. Also, I won't know whether anyone has used these codes:
Wrappz discount codes

More Wrappz discount codes

Last small tip for those who, like me, have multiple laptops in their network to access. You can place the name of each machine in the Task Bar by creating a new Toolbar, and calling it "\\%computername%", as described here. It makes it amazingly easy to see which machine you're on, even if you have a full-screen program running, and even if you're accessing it from another physical device.

Computer Name on Task Bar

Sunday, 14 December 2014

Product Review - LED Lenser LED7299R H14R.2 Rechargeable Head Torch

I bought one of these for running during the Winter months, when you inevitably find yourself having to make some runs in the dark or twilight.

There are plenty of options out there - ranging from an offering at £5 from Tesco, right the way through to Hollis Canister diving head torches at £800. Obviously, there’s a trade off between getting what you pay for, choosing a light that’s suitable to your purpose, and not spending more than you need to.

After checking out other reviews for several different options, I opted for the LED Lenser LED7299R H14R.2 Rechargeable Head Torch. You can spend anything from £90 to £130 depending on where and when you choose to buy this model. There’s also a similar-but-cheaper model in the same range that isn’t rechargeable. (No reason that you couldn’t buy separate rechargeable batteries of course.) However, I liked the convenience of having the recharging unit built in. It can alternatively take four conventional AA batteries, which you can use as a backup.

For running, it was important that the torch had enough light output to be able to see in pitch darkness on unlit trails with occasional tree cover that blocks ambient light. It was also important that it was comfortable to run with. A lot of runners recommended the Petzl range of head torches. I can see why. They’re a lot lighter than the one I chose (whilst at the same  time being a lot dimmer - typically about a third to a quarter of the light output.) My main criticism of the LED Lenser H14 R2 is that it can feel a bit hard and uncomfortable on your head, particularly the front torch holder. A softer, more padded material behind the lamp would have made it much more useable. As is, it’s more comfortable with a beanie hat underneath, but I wouldn’t fancy trying to run with it overnight in the Summer when a hat would make you overheat.

In terms of light output, it was difficult to find reliable information. The minimum light output was fairly consistently reported by various sources to be 60 Lumens. The product box and the site where I bought it both say the maximum output is 850 Lumens. Other sources quoted as low as 260 to 350 Lumens.There appears therefore to be some confusion about what is meant by "maximum". Namely, the torch has a 'boost' setting that increases brightness for 10 seconds at a time. However, there is a second definition which is the maximum brightness that the torch is able to consistently maintain. I suspect this distinction accounts for many of the differences reported by different sources.

60 Lumens is about as good as the majority of the Petzl range. The brightest setting for the H14 R2, whatever the real value in Lumens,  is a very bright light that is uncomfortable to look at directly at the highest setting. The very highest setting (known as the "boost" setting) only stays on for 10 seconds at a time. Most of the rest of the time, I used it at the highest 'stable' setting.

On that highest constant-current setting, the light can be diffused over an area about 5m wide and 10m far directly in front of you. You can also elect to have a narrower but more intense beam. The specs say it will project light up to about 260m. I found that not to be the case, though I did stick to the “wide and bright” setting throughout my run. Perhaps the boost setting when combined with the narrowest beam would momentarily illuminate the farther "260m" distance quoted for 10 seconds at a time; I didn't test that, because such brief and narrow momentary brightness isn't relevant for my use case or many others I can imagine. I did test the range on the max consistent setting combined with a wide beam when I returned to my car. I found that whilst that setting is quite good enough for running/walking in the pitch dark by allowing you to see what's immediately in front of you, the light didn’t even make it across to the trees at the far end of the 100m or so car park I was in. I’ll try it again on the “narrow beam, temporary boost” setting during my next night run. However, whilst I suspect that the specs are technically correct and that objects will be able to be illuminated at that distance, albeit briefly, it is only with a beam that’s about 1m wide. It's for the reader to decide whether that performance meets their actual needs.

I found the light was good enough for my use case. I ran during astronomical twilight (the third darkest phase of the night; pretty much pitch black for the purposes of this test.) Without the torch, I would just about have been able to see my hand in front of my face in open ground, but not the path I was running on. On stretches covered by trees, it'd have been completely dark. As it was, I missed a pothole in the same forested location (once on the way out, and once on the way back.) I couldn’t see how I’d done this at the time, as I felt I’d been seeing the path well enough to run at a normal pace. I stumbled at the exact same spot again, however the very next day during daylight. So, it just appeared to be a particularly well-camouflaged pothole, rather than a failing of the torch. 

The final lighting feature of note in this torch is the rear red light that you can turn on to allow traffic and cyclists to see you more easily. I thought that was a nice little safety feature. Although, there's no real way to tell if it's on or off once you have the torch on, and the button is very sensitive. Other non-lighting features include a battery-power indicator (the rear LED glows red, amber or green for five seconds when you switch it on, to let you know how charged up the battery is.) I've used mine for less than an hour so far, and it's still in the green from its first charge. I'll update this review with how long a full charge lasts when I've gone through a full cycle. Lastly, you can detach the battery pack (and the front torch itself if you want) and wear them as a belt attachment. I personally prefer the light being cast wherever I'm looking, and didn't find the battery pack intrusive where it was, so haven't used this option.

The last point I want to note about this product isn't about the torch itself. It's about the user manual that comes with it. For a top-of-the-range piece of kit, the quality of the instruction manual translation leaves a lot to be desired. It's some of the worst Deutsch-glish I've ever seen. Take this excerpt for example:

It's so bad that at first I thought I might have been sent a fake item, since I couldn't imagine any self-respecting manufacturer allowing such a poorly-translated document to accompany their product. But, the supplier I used ('s) bona fides checked out. And, checking with LED Lenser's own website, it seems that they've just done a very bad job of translating the user manual of an otherwise very good product. You can read the full manual (downloaded from LED Lenser's US site) for yourself here

All-in-all, I’m glad I bought this piece of kit. It’s good enough for what I need it for. The head harness could be a little more comfortable, but it’s very usable for its intended purpose nonetheless. I feel a Petzl and other cheaper options would probably not have been bright enough for what I need. And other more expensive options would have been brighter still, but wouldn’t have been designed to wear out of water.

Not a bad purchase : 7/10

Thursday, 20 February 2014

Scalability, Performance and Database Clustering.

What the Exxon Valdez and database clusters have in common

I was recently asked to comment on the proposed design for a project by a prospective new customer. The project involved a high number of simultaneous users, contributing small amounts of data each, and was to be hosted in the Cloud. The exact details were To Be Decided, but Amazon EC2 and MySQL were floated as likely candidates for the hosting and RDMS components. (Although my ultimate recommendations would have at least considered using SQL Azure instead, given some of the time constraints and other technologies involved that would have dovetailed into the wider solution.)

The discussion got me thinking about the topic of database clustering, as it relates to performance and scalability concerns. During the course of the discussion of the above project with the client’s Technical Director, it transpired that, despite the organisation concerned having used clustering in an attempt to improve performance previously, that approach had failed.

The above discussion didn’t surprise me. It’s a misunderstanding I’ve witnessed a number of times, whereby people confuse the benefit that database clustering actually bestows. In short, people often believe that using such a design aids scalability and performance. Unfortunately, this isn’t the case. What such an architecture actually provides is increased reliability, not performance. (It’s actually less performant than a standalone database, since any CRUD operations need to be replicated out to duplicate databases). Which is to say that if one database goes down, another is in place to quickly take over and keep processing transactions until the failed server can be brought back online.

The analogy I usually give people when discussing the benefits and limitations of clustering is that it’s a bit like the debate about double hulls on oil tankers. As you may know, after the Exxon Valdez disaster the US Government brought in legislation that stated every new oil tanker built for use in US ports was to be constructed with double hulls. The aim was admirable enough: to prevent such an ecological disaster from ever happening again. However, it was also a political knee-jerk reaction of the worst kind. Well intentioned, but not based on measurable facts.

Of perhaps most relevance to the topic was the small fact that those parts of the Exxon Valdez that were punctured were in fact double-hulled (the ship was punctured on its underside, and it was double-hulled on that surface). Added to this is the fact that a double hull design makes ships less stable, so they’ll be that little bit more likely to collide with obstacles that more manoeuvrable designs can avoid . And, just like in database clustering, the added complexity involved actually reduces capacity. (In the case of ships, the inner hull is smaller; in databases the extra replication required means less transactions can be processed in the same amount of time with the same processing power.)

As with all things, the devil is in the details. You can design clustered solutions to minimise the impact of replication (e.g., if you make sure the clustered elements of your schema only ever do INSERTs, the performance hit will be almost negligible). But, many people just assume that because they are clustering that in itself will automagically increase performance, and it’s that misconception that leads to most failed designs.

I’ve been involved in a couple of projects that involved either large amounts of data in one transaction impacting on a replicated database, or large numbers of smaller individual transactions being conducted by simultaneous users. In neither case, in my experience, was clustering a good solution to the design challenges faced.

The first project I have as a point of reference was one I worked on back in 2007, that involved a business intelligence application that collected around a million items of data a month via a userbase of 400 or so. I was the lead developer on that 7-person team, and so had complete control over the design chosen. I also had the advantage of having at my disposal one of the finest technical teams I’ve ever worked with.

The system involved a SQL Server database that was used by around 30 back office staff, OLAP cubes being built overnight for BI analysis, and certain sub-sections of the schema being replicated out to users that accessed the system via PDAs over GPRS (which of course will have been replaced by 3G / 4G now). The PDA users represented the bulk of those 400 users of the system.

The design we settled upon was one that traded off normalisation and database size for the least impact on those parts of the schema that needed to be replicated out to the PDAs. So, CRUD updates made in the back office system were only transferred to near-identical, read-only tables used by the PDAs once an hour (this could be fine-controlled during actual use to aid performance or to speed up propagation of information as required). This approach meant that the affected tables had less sequential CRUD operations to be carried out whenever the remote users synched over their low-bandwidth connections. And if they were out of range of connectivity at all, their device still worked using on-board, read-only copies of the backoffice data required.

The second main consideration in the design involved a large data import task that happened once every six weeks. One of my developers produced a solution that was algorithmically sound, but that quickly reached the limitations of what an ORM-driven approach can do. In short, it took several hours to run, grinding through thousands of individual DELETE, INSERT and UPDATE statements. And if any consistency errors were found in the data to be imported (which was not an uncommon occurrence) the whole process needed to be gone through again, and again, until eventually it ran without hiccups. It wasn’t uncommon to take a skilled DBA 24 hours to cleanse the data and complete the import task successfully. Meanwhile, the efficiency of those replicated parts of the schema used by the PDAs would be taking a battering. A better approach was needed.

In the end, I opted for using SQL Server’s XML data type to pass the bulk upload data into a stored procedure in a single transaction. Inside the procedure, wrapped in a reversible TRANSACTION, just those parts of the data that represented actual changes were updated. (E.g., it wasn’t uncommon in the imported data to have a DELETE instruction, followed by an INSERT instruction that inserted exactly the same data; the stored proc was smart enough to deal with that and only make those changes that affected the net state of the system). I designed the stored proc so that any errors would cause the process to be rolled back, and the specific nature of the error to be reported via the UI. The improved process ran in under a second, and no longer required the supervision of a DBA. Quite a difference from 24 hours.

The second project that informs my views of clustered database designs was one that I wasn’t the design authority on. In this case, I was just using the database(s) for some other purpose. Prior to my involvement, a SQL Server cluster involving three instances of the database was set up, and kept in sync. The solution was designed for use by a vendor of tickets for all sorts of events, including popular rock concerts. It wasn’t an uncommon occurrence for the tickets to go on sale, and for an allocation of many thousands to be sold out in literally ten seconds flat, as lots of fans (and I’m sure ticket touts too) sat feverishly pressing F5, waiting for the frenzy to start. (And sometimes, if the concert organiser got their price point wrong, you’d find that only a few tickets were sold for an over-priced event, but that’s another story!)

In the case of this design, I never did see the failover capabilities come into play. Which is to say that each of the three SQL Server instances that replicated the same data for reliability reasons all stayed up all of the time. I had a feeling that if one ever went down for reasons of load, however, it wouldn’t have been long before the others would have suffered the same fate. And since it was an on-premise deployment rather than being cloud-based, something like a power cut would have stopped the show dead.

It’s not that common for hardware to fail just because a high number of requests are being made simultaneously. All that will happen is that some users won’t get through (and you as the site owner will never know that was the case). It’s not like the server will shut down in shock. Even the recent low-tech attacks to large online retailers like Amazon using amateur tools like LOIC didn’t damage any critical infrastructure. At best, such conditions can saturate traffic for a short while. And often they don’t achieve even that much.

As a final point, I’d note that there are far greater concerns when designing an authenticated, public-facing system, such as CSRF vulnerabilities. Any attempt to address performance concerns by using clustering will inevitably adversely affect those security concerns. Because commonly-accepted solutions to same typically rely on data being reliably saveable and retrievable across short time frames (rather than getting in sync eventually as most clustering solutions allow for).

So, in summary, whilst there’s a place for database clustering for reasons of reliability, my earnest advice to anyone considering using that design for reasons of performance or scalability is to reconsider. There are usually changes you can make to your database schema itself that will have the same or better impact on the amount of data you can cope with in a short timeframe, and the impacts that data will have on your wider design. Don’t end up like Fry from Futurama, lamenting how your design might have worked had you only used (n+1) hulls/servers rather than n :