Our New Private Cloud Platform: Technical Thoughts on Hyper-V Clustering and Clouds

As promised in "Our New Private Cloud Platform", I'm about to divulge all our secrets. Or at least some of them. In vague deal. I should warn you that this is a blog post aimed at technical people, who have some knowledge of Hyper-V clusters already, so if you're looking at this from a "users" point of view you may get very lost, very fast. I'm not going to explain every little detail, because quite frankly we'd be here all day. I feel I should start off by defining our "Private Cloud". Cloud is term that thrown about a lot recently by marketing staff, and for that reason technical staff need to use it in front of boards and in front of the decision makers. To us techies it may be frustrating, however it's the world we live in. If you're uninitiated it's a very broad term that covers:
  • Infrastructure as a Service (IaaS) - Server and networking hardware, possibly server OS, such as Amazon's AWS,
  • Software as a Service (SaaS) - Software provided by a remote system, such as Google Apps (Mail, Docs, etc.), or Dropbox,
  • Platform as a Service (PaaS) - Normally software infrastructure for developers to rapidly build software, such as force.com, Google's AppEngine, or parts of Azure.
Whilst we also provide SaaS products, in this instance our small "Private Cloud" we see as an IaaS offering. It's a small, 2 node, Hyper-V cluster that we use to run our own and customer's systems. As a rough outline our cluster consists of:
  • 1x HP Procurve 2810-24G as our main switch,
  • 1x Juniper SRX210 acts as our firewall and gateway device for some portions of our network,
  • 1x IBM x3250 M3 acts as our physical Active Directory Domain Controller, and also hosts our Data Protection Manager (DPM) 2010 and System Center Virtual Machine Manager (SCVMM) virtual machine under Hyper-V,
  • 2x IBM x3550 M3 act as our Hyper-V nodes,
  • 1x IBM DS3512 acts as our shared storage,
  • 1x QNap TS-459U+ acts as our short term backup storage,
  • Several USB hard drives for off-site backup that are routinely swapped.
We're aware that there are some issues with this design; single switch, single firewall and only 2 Hyper-V nodes. However the importance here is why we chose some of these things and why we don't care as much right now (this was a significant investment for our small company);
  1. Granted all hardware does die. In the event that a switch does we can get one on-site reasonably quickly if we needed to, however we're yet to have a HP Procurve die on us since we've started business,
  2. Single firewall is something that we worry about, but we've chosen Juniper as they are easily clustered,
  3. Provided that you don't over subscrbe 2 Hyper-V Nodes should be sufficient, however additional nodes can be introduced to the cluster easily in the future.
So whilst we are aware of the problems, I believe that we've engineered the system in such a manner that we're able to introduce new hardware easily, upgrade the existing hardware, and provide some additional redundancy, including multiple switches with multi-chassis LACP links. We've not built this system to compete with Amazon's amazing AWS, however we have built it with 3 goals in mind:
  1. Extensibility,
  2. To use as a small reference design,
  3. To virtualise our own systems more redundantly. The fact we're able to host customer's systems as well is a nice perk.
I won't take you through the process of setting up your Hyper-V cluster, but I will cover a few bits and bobs that we feel a techy should be aware of before walking into a project like this, but might forget when looking at the big picture. Clustered Shared Volume, or CSV, is the magic that makes the shared storage work. It's a clever file system that allows multiple nodes to share the same storage. We're yet to deploy a CSV using FC so we're unsure if this is true for FC as well, however in the instance of both DAS and iSCSI what happens is the following;
  1. The master node takes control of the storage,
  2. All other nodes are notified of this, and effectively redirect all storage requests for the shared storage to the master node, over the network.
It should be clear from this that your choice of network card and switch are very important. CSVs are not supported by Microsoft for any other use other than Hyper-V clusters. So don't go getting any ideas. Jumbo frames on your networking gear is a must. Generally speaking a Jumbo Frame is any ethernet frame that exceeds 1500 bytes, however they're commonly also used as a naming convention for frames of 9000-9600 bytes (+/- 14 bytes for the header, depending on your switch(es)/NIC configuration language). If you don't remember how IP and ethernet interact I suggest you go and refresh your memory very quickly. You should recognise the importance of having Jumbo Frames enabled very quickly; it should provide higher performance in situations where large payloads are being transmitted frequently. At present we're using Microsoft's DPM 2010 to backup. The major gotcha that we didn't see was that DPM 2010 on a Domain Controller is basically a no-no:
For a DPM server that is installed on a domain controller, only protection of data sources local to the DPM server is supported. You cannot install agents on other computers to configure protection.
SCVMM (System Center Virtual Machine Manager) 2008 R2 needs some polish. We've had to dive into the database once already. Don't be afraid of it. Other than that the project went exceedlingly smoothly. There are a few features that I wish Hyper-V had, in comparison to VMWare and Xen. And I really do wish that there were more, cheaper, graphics cards out there for RemoteFX. However, that's just something to plan for as a future project.
Hyper V Snapshots and their uses