Store Like an Engineer. Chapter 3: Blocks
Not everything should be stored as a holistic bundle. Small is beautiful. Modular is agile. With proper supervision, storing in blocks goes a long way. But read the fine print.
Last weekend was dedicated to the cornerstone of Personal Data Lakes storage type: files. Here it is, in case you’ve missed it:
Today, we are concluding the “Store Like an Engineer” trilogy with the last instalment, which is dedicated to blocks.
Block Storage
In the world of computers, blocks are where it all began. Files had to be saved on bulky, physically spinning magnetic hard drives as small fragments: blocks.
You might remember hard drive defragmentation in the old days of Windows hegemony. After a few deletions, editions, and movements, you’d end up with bits and pieces of different files in various places on the disk, much like your kid’s room always tends towards entropy no matter how much you clean it. Life does its thing.
However, gradual fragmentation (the result of the entropy) was predominantly a problem for content that needed to be retrieved in its entirety or to be read sequentially, such as a video or an audio file. In other words, block storage was not the right tool for this content type.
Text files, which can be edited with pinpoint precision, greatly benefited from chunking. If all you needed were to change a coma, there would be no need to load all the portions of the text into memory and store everything back again. It’d also be easier to take incremental snapshots of your text since you only need to back up the difference between two document versions. Essentially, you’d only care about blocks that changed since you last looked.
Many smaller atomic files could occupy just one block, which would be the most efficient way to store them. So, despite the aforementioned data erosion issues, block storage is a neat way of managing things and information.
Making any system more tightly interconnected makes it less resilient. Sometimes, you only need to replace a light bulb, not the entire chandelier. This is exactly why all lightbulbs have standardised attachments. Keep things loosely coupled. Store them in smaller blocks.
We’ve previously discussed how subdivisions of documents that used to be bound to big, immutable ledgers led to tremendous gains in information management efficiency. You might find it “à propos”.
Atomic Notes
One of the holy grails of PKM is transforming your second brain into a web of highly interconnected atomic notes. One idea = one note. Therefore, instead of repeating yourself, you’ll simply refer to it from another note, much like you could refer to a specific unique atomic block on a hard drive from different places.
Software programmers are advised to follow the DRY principle, “Don’t Repeat Yourself.” This principle stipulates that code repetitions should be “refactored” into self-sufficient atomic black boxes. These boxes can then be used in different places in the program where the same functionality is required. All consumers of the black box automatically benefit from future improvements of the black box.
Keeping your notes atomic is similar since you can either point to the same note from different places or dynamically transclude (include) them into other notes, avoiding repetitions.
The idea of neatly segregating all of your ideas is great, and it should be pursued. In my experience, however, it’s easier said than done. Part of it is because of personal style, and the other part is because identifying the core idea in a long text to “refactor” it into its concise, self-sufficient entity is somewhat of an art form, a knife honed by experience. You must teach yourself to keep your notes styleless, unopinionated and dry. This way, they’ll fit into future contexts, most of which you can’t even predict. It’s a great challenge to put oneself through, though.
Taking atomic notes is the most active form of note-taking. As your knowledge vault grows, new notes make it in, other notes get pruned, and some notes get updated. As a result, many existing notes require connection updates. Dedicated software applications can handle some updates, but you must groom the other part. This is especially the case if your notes vault is analogue. One of the previous issues of The Mechanics of Knowledge Management discussed the importance of regular grooming.
You need a controller that keeps things semantically connected. Although tons of tools for thought claim to help us do the heavy lifting, in most circumstances, the onus of keeping your personal knowledge graph up to date is still on you, and that’s how you want it. Strategic chunking requires conventions and energy, but it’s worth it.
Pros
You don’t have to retrieve everything to complete something—just the right piece at the right time.
The same block can be referenced from different places (the DRY principle).
It’s easier to modify the whole the block is part of.
It’s easier to back up.
Cons
You need additional, regular supervision to “stitch” blocks together and “unstitch” them if necessary. It’s an active type of knowledge management.
You have an opinionated size limit per block. For example, if you choose classic paper index cards for your analogue note-taking system, writing just one word on it leaves you with a lot of unused space. However, if your atomic idea overspills the card by one word, you’ll have to use two instead. The second card, in its turn, wastes some space, too. Some knowledge managers consider it a feature rather than a bug, claiming that it forces you to be unapologetic, “to the point,” and “trim the fat” when committing the note to your vault.
Conclusion
As we’ve seen in the issue dedicated to The Iron Triangle, there are no silver bullets.
Knowledge storage is no exception. In the intimacy of your second brain, you’re responsible for picking the right tool for the right job. To do so, you need to consider the nature of the information, your storage and retrieval patterns, the goals you’re trying to achieve, the optimisations you’re aiming for and external constraints you have to work around.
But being forewarned is being forearmed, and with the tried and true information storage paradigms from the data engineering world we’ve discussed in the last three issues of this newsletter, you’ve been given a critical mass of explanations to help you make the right choice every single time.
Recognising which tool to pick depends on the context and knowledge engineers improve over time. It’s ok to make wrong choices and have to reorganise your vault. It can be a painful process which solidifies the understanding of storage strategies for the future. The future, where these patterns become second nature.