Let’s talk about something often overlooked in this space (or really, in general these days). Infrastructure!

Kaww.io was originally born on fully cloud native infrastructure, which I do embrace, but it does have its drawbacks. If you know what you’re doing as a server/network engineer, going into a colocation can be much better in terms of scalability and reliability.

That’s not to say I’m leaving the cloud altogether. Building a fortified, highly available headquarters allows me to branch out into global stratum servers in the cloud (Which will be next after this is deployed and migrated to!).

The Network Layer

For the edge layer of my network, I chose to go with Cisco ASA 5545-X’s. These are next-generation firewalls, but in general have good throughput stats, vpn options, and redundancy.

These are configured as an active/active stateful pair, meaning I can use all the resources of both machines at once, but also if one fails, the TCP state is replicated to both of them so my miners will not see a disconnection in the event of a failure of one of these firewalls.

The firewalls also act as my internal layer 3 device for routing between vlan’s, for reason’s I’ll get into below.

I’ll get into the compute but next, but I wanted to call out the switching layer. I’m not using external switches, because my compute is in a converged system – meaning the switching is internal to the chassis. I mentioned above that I am using the firewall layer for internal routing. These Dell FN410s switches are perfectly capable of handling L3 in theory, but under the hood these are Force10 switches, and I’ve had a number of problems with them over the years in regards to routing. As Layer 2 switches however, they’re bullet proof.

If I ever expand to more than a single chassis in this datacenter, I may go with some external Cisco Nexus switches and move L3 onto those. For now, this will do.

Each of these switches are line-rate 10Gbps per port, and are fully redundant with redundant uplinks to the Cisco firewalls.

The Compute

Rack space is a premium cost. I didn’t really want to use more than 4U altogether for this deployment, but still be fully redundant and pack a punch in terms of performance.

Enter the Dell FX2 Platform:

The Dell FX2 is built with exactly what I planned in mind. The only piece of this system that is not redundant or easily replaceable is the midplane, which is a copper pass through to the backend (meaning, its extremely rare for it to fail). Everything else about this system is incredibly redundant.

It is only 2U (though a very long chassis). In my configuration, I chose two full-width blades. Each blade includes:

  1. 48 Xeon Cores @ 3Ghz (Total of 96 Cores in the Chassis)
  2. 512GB RAM (Total of 1TB in the chassis, upgradable to 3TB)
  3. 6.4TB of SSD storage (Total of 12.8TB)
  4. 2TB of NVMe storage (Total of 4TB)
  5. Expandability via 12Gbps SAS external enclosures.

I’m running VMware ESXi as the hypervisor. Each node runs HAProxy in a failover configuration to load balance between the nodes. Each node also runs multiple RVN Nodes, stratum servers, API servers etc.

If a blade, switch, firewall, power supply etc fails, everything still operates with zero downtime without human interaction.

VMware’s vMotion allows me to migrate VMs from one node to the other if I have to take one out of service at a time for hardware maintenance, replacements, upgrades etc.

Monitoring will of course alert me that something happened for me to address, but I can sleep soundly through the night and address the issues the next day.

Cost

Many of you will say “Well cloud hosting may be expensive to scale, but this is a larger up front cost along with colocation costs!”, and you’d be right – for right now at least. But between mainnet and testnet, I’m using about 26 VMs with 8GB RAM each (About 20% of my Physical RAM in this chassis).

Even at the cheapest providers like Linode, that’s about $1000/mo in compute without any other addons, so the ROI on equipment like this is quite fast. Not to mention, I can add VMs without incurring any more cost to my colocation – the limit is what the hardware can handle, which is very upgradable.

Thanks for reading!

Did I help? If you’re feeling generous, buy me a coffee by sending RVN to RHEH92NguBjaxXQPsM1bedkqqTXKr9EZcM

Follow me on social media: