What do you think is your most valuable asset? Data, you say? You’re right! Storage costs, computing costs, and license costs have all dropped throughout the past decade, thanks to increased consumption and cloud technologies, among other factors – while the value of your data increased.
Backing up Data
A way of protecting data is creating backups, so you can restore them in case of incidents, such as hardware or software failures, malware attacks, or unexpected natural disasters. To avoid such situations, memoQ server allows you to back up data and stay protected against data loss. An average memoQ server holds large volumes of data of diverse nature — it can include hundreds of thousands, or even millions of files, and tens or hundreds of gigabytes of data. This is why backing up data takes time. Backing up data in a memoQ server requires stopping the server, even if this is only about creating a snapshot of your current data and making a copy of it. If you were to run a backup on a fully-functioning server, the server performance would be jeopardized by the backup’s disk and CPU operations. In extreme cases, backing up a server might not finish until the start of the next session. Many companies are not familiar with data backups – some think it takes too much time, others simply do not do backups regularly. We are aware that not all of our memoQ server customers use the backup feature and this means they might be exposed to data loss.
memoQ server Backup Improvements
To make backing up data easier and help you stay protected, we have made improvements to reduce the time it requires to create a backup (let us call it backup time), and we are happy to announce that we have achieved significant results. I am sure you would like to have concrete figures or percentages showing how much faster it is to backup your server now! Well, it is not so straightforward, and the benefits you will experience will depend on your setup. Let’s take a closer look at this:
- You will see up to 80% decrease in backup time if your company has a memoQ server with data stored on classic SATA HDD disks or SAS disks, or if your server runs on virtual machines deployed on such disks. If backing up your server took up to 10 hours; you could now expect less than 5 hours.
- In SSD environments, the gain is smaller. You might experience up to a 10% drop in server backup time.
- As a general rule, smaller server instances with few projects and documents will see less gains than larger instances with many projects.
Behind the Scenes
Geek alert. Read on only if you are interested in technical details!
Let’s see more details about the processes that led to increased performance in the backup experience. All in all, it was the result of three improvements. Read on for details.
Quickening up access to volume shadow copies
To minimize the impact of backing up your data, memoQ server uses Microsoft’s Volume Shadow Copy Service (VSS) that creates a snapshot of your disk. This snapshot allows you to work and make changes to the data in the disks, while at the same time, you can still consume and process the very same data as if it had remained unchanged. Let us put it in simple words: Imagine an artist is working on your portrait painting and takes a picture of you, so you do not need to remain standing still for the rest of the day until the work of art is ready!
The files in the snapshot can be accessed via external code libraries, while the files in the “regular” file system can be accessed natively. To create a backup, memoQ needs to access all of the files in the snapshot and make a copy of them.
When memoQ requested access to a file in the snapshot, our old VSS library built up a catalog of all the files, and passed them on as a giant in-memory array to memoQ. When a memoQ server had thousands of hundreds of files, building up this catalog could take hours for two reasons: first, because it became many gigabytes large. Second, because all the data had to be kept in memory, and Windows had to use the paging file heavily. This led to an insane memory usage, required a lot of computing power, and created a further impact on the server’s performance.
Lately, we updated the library, and now it quickly returns files one by one, instead of building up that gigantic catalog (using an iterator instead of the array). This updated library is being used by memoQ server versions 8.4 and 7.8.13. On servers with HDD storage, this change reduced the backup time by about 50%. The performance improvement is much lower for servers with SSD drives, since they can handle paging files a lot faster. However, if you do not have much RAM in your server, the overall performance gain may be significant even for SSD-based instances – as memory consumption also drops significantly.
From now on
Another factor of backup time is the progress bar for the backup task. Until now, during backups, memoQ server had to iterate over all of the files twice: The first time to calculate the total size of each task and the second one to measure progress status on each of them. We observed that this double calculation accounted for 10% of the backup time.
Backup time also depends on external factors, like the burden other applications put on the system, and the possibility of network congestions. This is why the actual progress status cannot be used reliably to determine the time required to finish the task — progress status may be at 87%, but the system may suddenly heat up, slowing down the process.
We were at a crossroads. How could we improve backup time without affecting the progress status information? We certainly could not keep iterating over every single file; this was the reason why the process was so slow.
After some research, we found a solution. We are now able to create a useful map that compares folders, and instead of iterating over all of the files, we now only scan large files and observe their total number. This allows us to make a good estimation of progress status. Our measurements showed that using this kind of estimation produces a 10% drop in backup time.
Omitting What’s Not Needed
While doing all of this research, we also found that a big chunk of backed up data consists of deprecated log files and other dead data. Believe me, memoQ server logs collect data over years that may result in tens of gigabytes.
Best practice says these logs should be deleted regularly, but not everyone follows the rule. And because we know this may keep going for a while, from now on, memoQ servers in version 8.4 only include relevant log files in the backups. Don’t panic! We always include those you may need to debug issues or to have Kilgray Support fixed them for you. Specifically for the memoQ server log, only the two latest backup files are included.
Of course, the benefit of this improvement depends on how many old log files are sitting on your server. Those of you who clean up your disks on a regular basis will see little improvement, but those who keep forgetting to delete these files will get more benefits.
memoQ is among the world's leading translation management systems. The favorite computer-assisted translation tool of many translators around the globe.