After you have scanned all of your documents, your computer contains a large part of your life’s paper trail.
When it was paper, you thought very little about backing it up – making copies of it, storing it off site, etc. Now, as digital data, you should consider changing that attitude.

Back up your data

Why? Well, there are a couple of different facets to this issue.

Paper is considered more robust than digital data

This might sound a little counter-intuitive in first glance, but think about it – how often does a piece of paper spontaneously disappear? It might be misplaced, or forgotten, but as a matter of principle, a piece of paper will be where you’ve last put it, short of an actual disaster as a fire, a flood, or an uncontrolled pet…
Hard disks, on the other hand…
Until not long ago Hard Disk vendors gave a 3-year warranty on their products, and around 10 months for their OEM counterparts.
OEM hard disks mean the hard disks you get when buying a new computer. The same hard disks where your documents now lie…
Even if you did buy a hard disk straight from the vendor, be aware that lately vendors have updated new product warranty to… around 10 months…
Why? Apparently durability is not the first priority for hard disk consumers, and in turn, vendors. Speed, volume and price comes first. Because of the rapid advances in these features (in accordance with Moore’s law), You buy a new hard disk every couple of years anyway…

We are now much more conscious of disaster recovery

After 9/11 the high-tech society became much more aware of ‘Disaster recovery‘. This means that more of us know about periodic backups, and off-site copies.
More recently, cloud backups became more available, making data redundancy a household name… of sorts…

Backing up data is a lot cheaper than backing up paper

All you need to do is press F5 (or Ctrl+C, Ctrl+V, depending on the software you use), and you have a copy of tons of data, as opposed to copying paper, which is tedious, and hardly cost-effective, especially the kind of paper we hoard, whose value is not great to begin with.

Backup types

Simple redundancy

My first suggestion is to simply take all the scanned documents, as they are, and create another copy of them. You can compress and bundle them into a ZIP file.
Since processing and sorting our documents include deleting files after they are processed, having a copy of the ‘Zero State’ ensures that we can recover any files we accidentally delete (‘fat fingers’ syndrome).

Different media redundancy

To protect your data from hardware crashes, it is prudent to have a copy of your data on a different medium. This could be another computer you may have (your child’s laptop, or you wife’s machine), or an external drive you can now buy by the Terabyte.

Geographical redundancy

Have a copy of your data periodically updated to a computer outside your house – your mom’s computer. This might sound a little like an overkill, and for most personal data, it is. But for important data, and business data, it is very prudent. There are products that easily and freely enable exactly this (namely Crashplan).

Cloud redundancy

It seems that cloud vendors for personal use are added daily, and they go to great lengths to get you as their customer. Dropbox, SkyDrive, and Google Drive are just a few of those services, almost all of them give you between 2-7 Gigs of cloud storage for free, and include desktop software which synchronizes the cloud storage with a local folder.

Security and privacy

If you are not paranoid about privacy – you can stop reading this post here, and move on to sorting your data – after all, no one will steal your laptop when you take it with you to the coffee shop, you trust your mother to keep your data safe, and you believe that the cloud vendors keep your password to themselves…
If, however, you think you should be a little more security conscoius…

Cloud security issues

All cloud services provide password-protected accounts, so, on the surface, it seems that your data is safe with them. However…
Last August, Dropbox had a major password leak, which resulted in many users’ passwords to be reset. Since most self-respecting vendors don’t save users’ passwords in clear text (but rather in some form of hash), these leaks compromise weak passwords a lot more than they do strong passwords. I will not get into a discussion about what a strong password is (but if your password is ‘monkey’ – it is not strong…), but in general I would suggest using a password manager with a password generation feature (such as LastPass or KeePass).
But having a strong password may not be enough. Most cloud vendors (all of the above included) use Server-Side Encryption, which means that, technically speaking, they could recover (and look into) your data even if you don’t provide the password. Furthermore, an increasing number of vendors maintain a clause in their Terms of Service that allows them, in ‘extreme cases’ (such as court orders), to recover end-user data without their consent.
How can you tell if your service can read your data without you providing the password? If they have a feature of ‘Reset Password’ or ‘Forgot Password’ – it means that they don’t need your password in the first place…

Client-side encryption – TNO

So what can we do?
We need a system in which we can Trust No One (TNO). One way to do this is to make sure all data leaving our computer is already encrypted.
There are some services that offer this as part of their product. Crashplan (above) is one of these (this will also take care of those mommy trust issues…). Other such services are Spider Oak and Jungle Disk.
If you still prefer to use one of the major cloud vendors, there are free services that offer encryption only and will allow you to send their output to the cloud, rather than the raw data.
Two of those services are TrueCrypt* and BoxCryptor. TrueCrypt has some extra-paranoid features, such as ‘plausable deniablity‘, which allows you, in times of need, to deny you even have encrypted data on your computer… On the down-side TrueCrypt’s output is one gigantic file, whose last update time is not updated by default – two features that are not cloud friendly.
BoxCryptor, on the other hand, encrypts files in its folder as individual files, which means that if you update a single file, only it will be resynchronized to the cloud, instead of the whole encrypted data. This ‘leaks’ information, such as file count, file sizes and file names (in the paid version BoxCryptor will encrypt your file names as well), but for our purposes – especially as it is most likely that your files are named doc1253.jpg, doc1254.jpg, etc. – this should be secure enough.
Are there downsides to the TNO approach? Sure – for one – if you forget your password – you will not be able to restore your data…

So now you have your data backed-up and secure. Now, all you need to do is sort it.

Update 12/2016
Since I’ve wrote this post, TrueCrypt had some unusual developments, and it was mysteriously discontinued… Although for the use-case discussed in this post (encrypting a folder), it is still considered OK (as in no weaknesses were found) – since it is not actively maintained, when such a weakness be found – there would be no-one to patch it.
If you still like the feature set this product offers, you can look here for a blog post describing TrueCrypt’s current situation, as well as list a few alternatives you might want to switch to.
Thank you Sophie for pointing this out to me.

2 thoughts on “Going Paperless – backing up

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s