TL;DR: We will have a large complete downtime at the beginning of January where we move to a new server. The exact date is not defined yet, as it depends on personal availability.
Many of you have encountered issues in the last months due to our server being overloaded when more than 1.000 users logged in. We spent dozens of hours trying to optimize and tweak the server and it’s services with little to no success.
During our investigations on server performance we saw that both CPU and disk IO was at it’s limit. This is a very unusual pattern as a server is usually either blocked by CPU or IO. But FAF follows no pattern as we have one machine running all kinds of services at once. Trying to relieve disk or CPU with larger memory caches did not help.
Our server was rented in late 2016 as an auctioned server at Hetzner. Even though it was fast enough, it was not the fastest machine available back then:
- Intel Xeon E3-1270 v3 @ 3.50GHz [4 cores + Hyperthreading = 8 logical cores) which was released in 2011
- 32GB of RAM
- 2x 2 TB HDDs in a raid 1 (mirroring)
So as you can see CPU with 4 cores is somewhat slow, many gamers have much faster machines already. Also an HDD in a server is no longer state of the art especially for running databases (but it solved a particular issue which I will explain later).
While we were investigating something very positive happened over the year. The new AMD Ryzen 3 processor family has been launched and basically rolled up the desktop PC segment with fastest performance for a very low price. Even though the Ryzen 3s are actually not aiming at server segment, our current hosting provider Hetzner just recently launched a new product line based on these new CPUs at a very fair price.
So for just a few bucks more per month (45€ per month total) we would get a new machine with the following specs:
- AMD Ryzen 5 3600 [6 cores + Hyperthread = 12 logical cores]
- 64 GB of RAM
- 2 512GB SSDs in a raid 1 (mirroring)
So that means an additional 50% of logical cores plus the technical improvements of the last 8 years! Without having exact comparisons it should basically double CPU performance. Moving from HDD to SSD will also have a huge boost on database performance. The additional RAM is a nice goodie, but not really needed for now.
So all ready, set & go for the server move? Unfortunately there is one drawback. As you have seen the disk size shrank from 2 TB to 0,5 TB. Looking at our servers today we will need slightly more than 0,5TB. The biggest part of that is our replay vault and part of that seems to be duplicated, so some additional investigations required there, maybe we can reduce that. In worst case we would drop parts of the oldest replays which are most probably broken anyway. Another part of the solution is adding a new compression to the replay format, which would reduce replay size by ~20% (more information here).
Some of you may ask why this has to be done in the holiday season – the favorite time of the year to play FAF. The answer is fairly simple: I can only do it when we have vacations ourselves.
A lot of preparations will be done beforehand, so that we can keep the downtime as small as possible. Some tasks cannot be prepared however. And that is exporting the database on the old server and importing it to the new server and also updating the DNS records to the new servers. In theory the latter is done in seconds, but they way DNS works it might take up to 24 hours until everybody will be redirected to the new servers.
I’m aiming for the transition to happen somewhere between 3rd and 5th of January. This date is just an estimation.
Feel free to discuss this topic in the forums here. (Unfortunately the forum won’t be available during the move.)