Store Like an Engineer. Chapter 1: Objects

Computer science and software engineering have been dealing with information storage for decades, inventing paradigms that are just as useful in a "household" PKM as they are in an enterprise setting.

Jun 02, 2024

black and silver usb flash drive in brown box — Photo by Lia Trevarthen on Unsplash

Although you’ll always carry thoughts and ideas in your brain, most will be externalised, resulting in consolidated tokens you’ll need to store and retrieve. This might sound easy and will work initially, but it will choke you down the road if done incorrectly. This issue will be further exacerbated if you routinely collect large amounts of data, such as photographs, audio/video recordings, etc.

Large files can pile up quickly, and navigating them becomes increasingly challenging. Source: liamjaydesigns.com

The good news is that engineers have been dealing with large silos of uncontrollably bloating data for decades and have several tricks up their sleeves. You can simply “lift and shift” them to your personal knowledge management setup. What works on an ultra-wide enterprise scale also works on a smaller “household” scale.

Storage is a full-time job in the corporate world. Many books have been written about it, tons of businesses cropped to monetise data storage pains, and a lot of computer science brain power was dedicated to inventing what I’m about to explain.

The Mechanics of Knowledge Management always promise to address Personal Knowledge Management (PKM) through the engineering lens without overwhelming lingo. Although that’s exactly what I’m about to do, the topic is too large to fit into a single newsletter issue. Therefore, this time, I’ll divide it into three logical chapters.

Three Main Storage Types

We can distinguish between three core storage classes or types:
object, file and block.

This issue is dedicated to the first one on the list: object.

Note: Although it applies to the digital realm just as much as it does to the physical one, using the analogy of tangible everyday items always gets the point across better, so I’ll stick with that this time as well.

Object Storage

You can think of object storage as a big cardboard box into which you’re throwing everything you’d like to keep. The box is big enough and can expand. You’re not being selective, and you don’t segregate or classify anything. It’s a flat “organisational” structure with no hierarchy concept.

There are no restrictions on what items can be put in the box. It could be a magazine, a VCR cassette, an entire notebook, or an index card.

In engineering parlance, this is called multimodal data, and the box is often called a bucket. It is a foundational component for corporate Data Lakes. In our case, it’ll be the foundation of what I like to call a Personal Data Lake or PDL.

In the cloud, objects are thrown into virtual buckets. Source: g2.com

Metadata

Although the abovementioned box will have no concept of compartmentalization per se, metadata associated with every item in it can assist with item retrieval and, optionally, future organisation.

Metadata means data about the data; all objects in the box will have it. It might be something inferred by default or your explicit annotation. An item’s metadata is an ensemble of key/value pairs, such as the date a picture was taken or its location.

Continuing the cardboard box example, implicit metadata could be the item’s type: a photograph, a tape cassette, or a ticket. Examples of explicitly provided metadata would be scribbles on the back of the photograph, a cassette label sticker with the album’s title, or the receipt attached to the flight ticket.

A crucial key/value pair is the object’s unique identifier, or ID. If possible, you will uniquely identify every item visually. You can use a naming convention printed on labels if it isn't. In a digital realm, a unique identifier will be generated for you and mostly abstracted away.

Perhaps one of the most important metadata types is a tag. Tagging allows for organisation even when the structure is flat, as in the object storage class. You could stick a Post-it with a specific colour or a special keyword to segregate objects. You could also use multiple Post-it notes to attach tags to the same object. Tagging is one of the most flexible organisation techniques you can imagine.

Use Cases

“The first activity is to search your physical environment for anything that doesn’t permanently belong where it is, the way it is, and put it into your in-tray. You’ll be gathering things that are incomplete, things that have some decision about potential action tied to them. They all go into “in,” so they’ll be available for later processing.”
Getting Things Done: The Art of Stress-Free Productivity
David Allen

“I took all the files they’d migrated over and moved them all to a new folder titled “Archive” plus the date (for example, “Archive 5-2-21”). There was always a moment of fear and hesitation at first. They didn’t want anything to get lost, but very quickly, as they saw that they would always be able to access anything from the past, I watched them come alive with a renewed sense of hope and possibility.”
Building a Second Brain
Tiago Forte

You could opt for an object storage class for different reasons:

You’re a “save now, sort later” type. In this case, your object storage solution acts as an inbox, the initial phase upon which all productivity and knowledge management systems are built. David Allen’s Getting Things Done, or GTD1,
Tiago Forte
’s CODE2 or Marie Kondo’s home-organisation techniques3, to name a few, always begin with dumping everything into a single staging area. Object storage is great for this purpose.
You want to keep it just in case but don’t want to spend energy organizing it. This is any productivity/organisation system’s reference/resource container. In the abovementioned GTD, it’d be called references or the “sometimes/maybe” jar, and the R in
Tiago Forte
’s PARA stands for Resources. You’ll have to decide for yourself whether the cost of a lost opportunity based on a potentially relevant file outweighs the overhead of the storage, also known in the business as TCO or Total Cost of Ownership.
The number of things to store and the rate at which they influx into the box make any organization prohibitively expensive. Imagine you’re subscribed to many magazines and receive many letters. Your kids create a dozen drawings a day, and you keep recordings of your surveillance camera. You could throw it all in the box without organizing.

Pros

Straightforward.
Cheap.
Single source of truth, or a unique “inbox”.
Allows for metadata annotations.
Handles large, theoretically unlimited piles of stuff.

Cons

You can’t selectively protect or treat objects. If one has access to the box, all of its items are automatically accessible.
You can’t alter an object. Changing an object results in a new, modified copy.
You can’t extract a part of an object. You either retrieve the entire notebook or nothing at all.
Browsing and retrieving items in the box can be suboptimal (slow). There is a massive black hole of things to dig through, and you might be tempted to abandon the effort sooner rather than later.

Next Up: Files

Our next stop is something most are familiar with–file.

Next week, we’ll examine this storage paradigm's differences, pros, and cons and see how it builds on the shoulders of rudimentary object storage.

In the meantime, why don’t you do yourself a favour and transfer some stuff into your Personal Data Lake?

Allen, D. (2015). Getting Things Done: The Art of Stress-Free Productivity. Penguin.

Forte, T. (2023). The PARA Method: Simplify, Organize, and Master Your Digital Life. Simon and Schuster.

Kondō, M. (2014). The Life-changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing. Vermilion.

The Mechanics of Knowledge Management