Migration and Expansion
Ashes to ashes
Upon murdering the motherboard in the DL360p I went and searched for parts.
Finding boards in the range of $120 and up I wasn’t exactly thrilled. Also not fully knowing if motherboard or hard drive backplane was dead, I didn’t have a huge want to throw money at parts. So searching for bare minimum running servers I went…
Expecting maybe 1 CPU and 32gb of RAM for something reasonable, I don’t know how exactly but across my ebay feed a complete Dell R720 popped up with 128gb of ram and 2 xeon e5-2670s, and 8 blank drive trays for ~$200, some $20 or so for an HPE P410 RAID controller and another ~$55 for rails, and in theory I had a system I could throw my drives in and have a working system again. A few days later I had an R720 and rails at my door.
Reborn
HPE P410 installed and brought up RAID array, but going through the risk of losing RAID array because of controller failure, I felt like moving to a slightly safer from my perspective, filesystem based RAID config with ZFS or BTRFS. So in the process of murdering a server motherboard, I was able to expand RAM by a healthy 128gb, processor by maybe not a lot but got a few extra cores.
Starting migration
Migrating to ZFS in proxmox was time consuming at first, quickly I looked into 2.5gbps, 5gbps and 10gbps.
Starting the backing up process, I was not looking forward to 2+ weeks of rsync running to backup and restore everything, a quick search around ebay and about $100 later, 2 10gbps melanox connectX-3 cards, some 10gbps brocade transceivers and some LC to LC multimode OM3 was at my door at the end of the week, and about 30-40 minutes of installation and configuration later…rsync was absolutely flying, taking my back/restore time down to a matter of a couple days to backup/restore 10tb.
Rsync was running pretty much nonstop for 6 days doing 20-30mb/s backing up VMs and containersto my desktop’s BTRFS storage pool. When 10gbps cards showed up rsync was showing ~48 and would finish late Sunday.
Stopped rsync, shutdown server and desktop, installed cards, ran fiber, put everything back together and booted, setup a basic ip link between 2 and restarted the rsync daemon and transfer. mind blown Transfer was running at 120-150mb/s and finished in about another 7-8 hours.
RAID to ZFS
By this point I had a p410 running the array, in my waiting for parts I flashed the internal raid controller to IT mode and played with ZFS on some old drives on the other 4 drive bays. So after backup to desktop finished I was able to pretty quickly wipe and setup the array drives in a ZFS array Restoring VM and container data was equally mind blowing as it was finished some 24 hours laters.
Moving to ZFS subvols
In order to get snapshots I had to have the containers and VM’s “on a ZFS filesystem”, not just a directory on a ZFS filesystem, so I went through the painfully slow processing of migrating all my disks one by one (doing more than 1 would throw eventual errors that I never got around to diagnosing, but my assumption was some too many files open type of problem. I still haven’t migrated the media server…perhaps over the network instead?
Expansion
A few weeks later for some reason I felt like looking up the motherboard for the DL360p on ebay again and to my surprise several popped up that were under $100, first time only finding a handful and all being well over or close to $100 before shipping and tax. Happen to catch one that was up for $50, and even had raid cache card, CPU and RAM (most of the other boards I found were just the motherboard). I figured for that price I could chance it, and about a week later I was proved right and had a working DL360p again. Unfortunately it had a dead RAID controller, throw an old LSI card flashed to IT mode in and she’s running like a champ.
HA and Qorum
In order to keep qorum for High Availability I setup a raspberry pi to run as a qdevice in the cluster. Enable replication and HA on vital service containers and now if my server needs any updates everything just keeps on working as it reboots, containers and VMs are migrated and brought up on the DL360p with seemingly no noticable downtime but there’s probably a minute or 2, which as long as it’s not something absolutely vital like the main gateway, I don’t mind so much. Pretty cool though.
For something truly vital though, something like UCARP is needed and in the future I’d like to mess with it and opnsense.