System Administrator - What kind of job is that?
Sep 05 2011 CEST
n the featured article of June, we have introduced you to several employees and departments of CipSoft, but had left out one department that is very important. It is the department that ensures that Tibia and our other games are running, and that all CipSoft employees have secure and well functioning computers on their desks. We are talking about our system administration. They are always in the background, but provide the ground work. Without them, nothing would work.
Our system administrators are a pretty cheerful crowd, who gladly share their annual cake that they receive for the System Administrator Appreciation Day every last Friday in July each year.
We community managers have sat down for a chat with the lead system administrator, who was so kind to share some internal information with us. His ingame character that he uses for work is called Angarvazar. You might see him once in a while online on your game world. If you do, you can be certain that the system administrator himself is checking if everything is still working fine and as intended.
Read on to find out what he had to say...
Team Structure
First of all, we asked Angarvazar about his team. The system administration for CipSoft consists of four employees. Daily work in the system administration can be divided in two different parts. One part concerns responsibilities inside the office. For example, making sure that all employees have secure computers to work with, maintaining the internal network of CipSoft, updating software, and so on. Part of this responsibility is also offering support for all employees. For example, if we community managers come across a problem with the computer on our desks, we call them for help. One of the four system administrators is mainly responsible for this part of the job, however, all of the others can help out if needed. The other three are mainly responsible for working together with our data centres, for maintaining and configuring our servers there. They are specialised in different servers, however, all of them are qualified in all fields. What is important to say is that they are not assigned to one specific product of CipSoft, but that they equally work for all three products. They are a bit like a small service provider within and for our company.
Tasks
The usual tasks of our system administrators can be divided in three categories. First of all, there are the daily tasks. This concerns tasks that come up regularly, like making backups, getting an overview of the current server status, checking that all hardware is working as intended and monitoring everything that is going on with our servers. The second category consists of the most delicate tasks: dealing with the urgent problems. The variety of problems that can occur is huge. There are hardware failures, software bugs, network troubles, etc. Everything that comes up and interferes with the operation of our services has to be dealt with immediately. There are some problems that our team can solve by themselves, but many problems are third-party responsibilities on which they hardly have any influence. For example, problems at our service provider. Also other network problems that happen completely outside of the data centre may influence our services in a negative way and we cannot really help it. The third category are projects. Our system administrators are constantly trying to improve our setups, software and hardware wise. For example, when it is time to deploy new game server hardware, they will begin a new project that is broken down into smaller work steps. It could look like this:
Step 1: Checking existing offers, comparing them and evaluating them to find out which product is best suited for our needs. Step2: Buying the wanted hardware. Step 3: Testing the hardware and configuring the server. Step 4: Deploying the server and getting it to the data centre.
Players usually do not notice anything about this work. It is done completely in the background. Only when the new server is finally being used, players might see some consequences of it.
Projects can only be worked on if there are no urgent problems that need to be dealt with first and all daily tasks are completed. So it is possible that projects are postponed. Maintaining our service is always more important than any upcoming changes.
A great difficulty with setting up a server is that you can only test your work within the given infrastructure. As you can probably imagine, we do not have our data centres cloned here in the office in Regensburg. So there is always a small risk when a new server goes live, for example. While it was tested inhouse and running well, it then has to function properly in the infrastructure of the data centre, too. Not only the infrastructure is different, though, but also the conditions. When a new server is tested inhouse, you cannot simulate hundreds of players playing, players who are creative and who try out all sorts of things that nobody could anticipate. So obviously errors can occur, even though our team is trying to prevent them from happening to the best of their possibilities.
Like all employees of CipSoft, also system administrators have regular working hours. However, part of their job is also an emergency service that is available 24/7. An automatic monitoring of our servers notices when our service is disturbed, or does not run properly. This device will then notify one of the system administrators who then has to check the error report right away, be it night or day, work day, weekend or holiday. As you might already know, characters like Lokana Aldora are part of this automatic monitoring. They check every couple of minutes if all game worlds function properly. If the login of such a character does not work, it is very likely that players are experiencing problems as well. So if it is noticed that Lokana cannot log in correctly, a system administrator starts his search for the problem. The error needs to be clearly identified, since a login problem can have plenty of causes. Only when we know for sure what caused the problem and which consequences it had for players, we can start informing the community. The four of them take turns in being responsible for this emergency service. So if something goes massively wrong, one of our system administrators will be the first to know. It may be an interesting fact that till the year 2009 Guido, Stephan, Steve and Durin still took turns themselves in offering this emergency service.
Technical Infrastructure of Tibia
All in all, we have about 150 servers that are under the care of our four system administrators. The largest part are the 77 Tibia game worlds. Then there are approximately 12 servers used by TibiaME.Apart from the Tibia and TibiaME servers, there are also about 10 web servers, 2 mail servers, 9 database servers, 5 login servers for Tibia only, 5-6 testing servers for all products (for Tibia you know them as Testa and Testera), and a couple of replacement servers. We work with a fail over philosophy. That means that we always have a couple of servers ready to use, in case one server breaks and cannot be fixed immediately. This gives us enough time for fixing the broken server. You need to know that fixing a server might take up to several days, for example if a needed replacement part cannot be delivered any sooner.
The remaining hardware are switches and firewalls, for example. Their configuration is also part of the job of our system administrators. Hardware firewalls filter network traffic and provide protection against many DDoS attacks. You should know that most DDoS attacks do not have any effect on players at all, mainly due to those firewalls. Sadly, some attacks do get through to our servers, though. In such a case cooperation with our service providers is necessary. Such an attack needs to be analysed thoroughly in order to find counter measures to prevent it in the future.
CipSoft uses data centres in Germany and in the United States. Several years ago we had tested using a data centre in Brazil with hope to provide a more lag free service for our Brazilian players, however, unfortunately the network connection was not good enough and it did not help to improve the situation for our players from Brazil.
As you can probably imagine, our system administrators are not on a trip to the United States every other week to maintain the servers there. Working together with the data centre abroad works remotely. The closer a data centre is to our office, the higher is the chance that our system administrators will go there themselves. So they can handle many tasks themselves in the data centre in Germany. Data centres and several vendors of servers also offer different services and support. So our system administrators do not always have to be on site when a harddrive stops functioning properly, for example. Somebody working at the data centre can replace this for us, too.
So after all this business talk with Angarvazar, we asked a last question that is often interesting for Tibia players. Do system administrators of CipSoft play Tibia? The answer is yes. Angarvazar proudly announced that the highest Tibia character in the system administrator team is level 56!
We community managers hope that you enjoyed all the information that Angarvazar so willingly shared with us. We surely found it impressive to find out how all this works internally and what system administrators do all day long.