Infrastructure Design: 'Every Time'”


Planning

Planning IT infrastructure takes experience. Perform your planning well and you can expect capacity increases limited only by your budget. It makes no difference whether your infrastructure planning is destined for the cloud or the ground, thorough planning is the difference between simply running a single command to heat up new servers or a total disaster of entangled legacy systems.

When planning, I start with the clients' requirements, then I multiply their capacity estimates by 30 per cent. I do this because I commonly see servers implemented just for now” become mission-critical boxes—patched together over time—hobbling along, just waiting for that last power cycle. I am upfront about this and am not surprised when the budget will not afford nearly 50 per cent idle capacity. Still, I feel it is better to over-design an infrastructure (though not at the cost of increased complexity) than to hope for the best. I have a tendency to build the networking layer to a high degree…servers (especially virtual servers) are cheap, high-speed switches are not. Besides, upgrading switches, unless planned carefully, can be disruptive.

From here, depending on the client's needs, I will build high capacity servers designed for a massive amount of fast storage designed to run virtual machines such as the kernel based KVM VM's, Linux 'containers', or BSD 'jails'. Of course, these servers will be fully capable of failing over to a secondary server in the case of a failure of any kind. I recommend testing the failover capacity regularly during off-hours if your business model will permit. I do this (or write complete, step-by-step documentation for your internal staff to do it) in such a way that the primary server may be re-connected at any time.

We have more to do. Once we're finished with the hardware architecture and software design (which, of course, is only glossed over here), we still need to think about monitoring. What daemons are running? How do we know they are running correctly? Are the servers responding to clients in London and Tokyo? Are we sure? Non-invasive monitoring (i.e. monitoring the server without specialized scripts installed on the server to do it) is a critical component of any large-scale server installation; it cannot be neglected.

Execution

This is where, as they say, the rubber meets the road.” I take the necessary time to think through the ordering and physical racking process. I plan the implementation; I consider which components need to be configured, in what order, and so on. I design unit tests to ensure that each phase of the installation is proven complete and that they will work as designed. Taking a modular approach to infrastructure implementation takes a bit more investment initially but the gains (immediate and long term) multiply themselves to a sum greater than their whole.

For example, once the switches are installed I will perform speed and reliability tests, naturally. Once the servers are installed, further still, I will simulate slow and noisy” network conditions to see how the servers respond and carefully record the results. Testing under less than ideal conditions is especially important when dealing with high-availability services and real-time databases.

Once the system is fully implemented I start the real, hardline” tests…and I mean to try to make the systems break. I will pull power cables, ethernet cables, and fail-over servers in rapid sucession—I test every aspect of failure from carefully triangulated unit tests to ensure speed and reliability.