My personal blog can be found here: https://www.catmonad.xyz/blog.

FerrisCraft 5 Backup Tool Devlog (August 27th)

I had an annoying epiphany today. It turns out that I don’t need to worry about delta encoding for the FC5 backup tool any time soon.

Let’s take a step back. This is the first devlog I’m publishing on my site, and I should explain what this backup tool is for, before we get into the weeds.

What Are You Making?

I’m glad you asked.

Historically, at FerrisCraft, we haven’t done the best job of keeping world backups. We don’t have any backups which survive from FC1, nor am I aware of any which survive from FC2 or FC3. I believe it is possible that there are surviving FC2 backups, but I don’t know who has them.

[Update: I have been told of three people who have copies of FC2 backups.]

[Update (August 28th): I have now acquired a copy of an FC2 backup from one of the aforementioned people.]

In FC4, I pushed hard to keep backups, and we do have many more surviving backups, but the tooling I made then to cope with hardware limitations on my own mirror of them failed me pretty badly. Fortunately, Bool carried the day by throwing more hardware resources at the problem, and that is why we have copies of the world from its final days running.

This time, I’d like to do better.

I Asked What, Not Why.

Ah, right. Sorry, I got sidetracked.

I’m, essentially, building a version control system for Minecraft worlds. The initial idea for the database was to construct a content addressed store of Minecraft chunks, and to perform delta encoding wherever a benefit could be detected. This would reduce the storage used dramatically, bringing it to just the records of mutations, plus some various fixed overheads which would likely be unimportant compared to the gains.

Turns out, that would work, but I don’t need to go that far. Knowing what I do now about the structure of Minecraft worlds, as well as the empirical data I gathered from my time using bup for FC4 backups (the problem was always me, bup, not you. you’re a great tool! but I had trouble bending you to my weird use case), I can make a decent guess that deduplicating identical chunk versions via the content addressed store will get me all the space savings I need on this project, no per-object delta encoding necessary.

It’s important to me that I always solve problems I actually have, instead of building solutions in search of problems. If I don’t stick to this rule, my motivation dries up fast.

So, effective immediately, I am removing delta encoding as a blocker for the launch of FC5.

I may still explore delta encoding later, and will ensure the database is able adaptable to future efforts, but this leaves me with a much more comfortable workload. Cheers!


For those curious, the bulk of the world data I’m concerned with is saved in what are known as “region files”, a detailed description of which can be found at wiki.vg, which is generally a great resource.

The salient fact is that a “region file” contains a list of chunks saved as zlib compressed byte arrays. Knowing this and that compressed data tends to look random (it has high “entropy” or “information density”), plus quickly glancing over some numbers to back it all up, has lead me to guess that the delta encoding done by bup ends up doing basically the same deduplication that saving chunks with a content addressed store that actually understands the region format will.