the cloud I’d like to see

Here's what I want for Christmas, Kwanza, Hannukah, Eid, and Bloomsday: all my files in the cloud, encrypted by me, accessible from any program on any of my computing devices, cached locally as needed, and served by providers that I choose and can freely migrate between.

I've been copying and synchronizing my home directory between machines since the mid-1990s. I remember @nelson, in 1996, telling a story in Sherry Turkle's class about the dual effort of packing up both his physical stuff and his digital stuff for a cross-country move.

In many ways, we've come a long way since 1996. Today I store music, email and code in the cloud, thanks to iTunes, gmail and git. And I use Dropbox and Google Docs and iCloud to share files with other people and between my various devices.

But I still have a couple hundred gigabytes in my home directory that I pull across to every new computer I set up. I need at least that much storage on anything I buy to use as an every-day machine. And I don't have a good way to access the files in my home directory from phones, tablets, or machines that I'm using temporarily.

This seems fixable. In fact, between them, iTunes Match and Dropbox get enough things really right that, building on what they've done, a “perfect” personal cloud infrastructure is fairly easy to sketch.

In iTunes, the basic action of playing a song is always the same, but the program's interface makes it easy to see whether the song file is cached locally or whether it will have to be pulled across the network. There are options to fetch groups of songs you expect to want to play later. Timeouts and failures are handled reasonably well. Mostly, things just work, which is a significant and laudable achievement on the part of the iTunes developers.

Dropbox does a terrific job of integrating with the native filesystem on my Macintosh. I can use Dropbox to keep a directory backed up in the cloud and synchronized between machines. Files are versioned. Uploads and downloads happen in the background and there's good feedback about when transfers are likely to complete and how much data is moving around. I can set basic sharing permissions and generate public download links. Again, things just work; Dropbox ships excellent software.

But I don't want to put my whole home directory in Dropbox, for a couple of reasons.

First, I'm moderately paranoid about privacy and security. I have files that I'm not willing to let someone else store for me unless I encrypt them myself, first.

But it's too much trouble to manually encrypt and decrypt everything I put into my Dropbox folders. Emacs is the only program I use regularly that understands how to encrypt and decrypt files automatically, on demand. I use emacs for a lot of things but, sadly, not for everything I do every day. So I need the encryption and decryption to be built in at the filesystem layer, right between my local storage and the network.

Second, Dropbox doesn't have iTunes-like selective local caching on the Macintosh. My home directory these days is big enough that my files don't all fit on even a mid-range laptop hard drive. And they certainly don't all fit on my phone or tablet.

I want to have least-recently used caching behavior on by default and taking up a configurable amount of local storage, plus easy pre-download manual controls for files I know I'm going to need even when I don't have a network connection available. (Or when I might not have a fast and cheap network connection available.)

Caching like this requires filesystem-level integration, too, to hide the details from applications. This isn't too hard to do on linux. It's probably fairly hard to on the Macintosh, unless you work for Apple.

These are the basics, encryption and caching.

They both imply that we'll need to store all filesystem metadata locally (at least in a naive implementation). Encryption implies one thing more; we have to build on top of a standard protocol so that I can audit the source code of the client programs I use.

I want to be sure that no unencrypted data ever leaks off my machine. I might choose to trust a client provided by a reputable and well-run company, like Dropbox. But I shouldn't have to do so.

Defining a standard protocol is a good thing in other ways, too. Standards open the door to using multiple service providers, moving between providers, or running my personal cloud infrastructure myself.

And standards mean that individual applications can be extended to natively support cloud storage. Specific workflows can be decoupled from the semantics of the underlying filesystem. (As iTunes does, sort of, with music.)

In fact, we can use this opportunity to redefine our expectations for our filesystem beyond the basic requirements of network storage.

I'm a curmudgeon, and a long-time unix hacker, so I do tend to think in terms of directory heirarchies. But that's not the only way I think. I'd love to have time built into my filesystem in a deep way, and keyword tagging, too. And versioning. We can stand on the shoulders of projects like Lifestreams and WinFS.

And, of course, we need full text search. Which requires plugins and is complicated by the requirement for encryption.

Finally, I'd like to be able to give out a token that lets anyone, from any device, securely share a file, a directory, or a tag. With encryption built into our cloud storage standard, we can take the next step and build a certificate-based capability system (with support for timeouts and additional authentication layers).

So that's my list for the personal cloud storage solution I'd love to use every day: filesystem integration, encryption, caching, a standard protocol, versioning, time as a first-class construct, tagging, secure sharing, and full-text search.

 
320
Kudos