Insight: some thoughts

November 22, 2007

Apologies to anyone this confuses, but this post is just a collection of random thoughts about DBFS (now called Insight); what it might do, how it might be implemented… all sorts of things. Some of these will be taken from The Book of the Project, and some have developed as I do more research.

So one of the critical ideas has really taken shape since reading Hans Reiser’s The Naming System Venture � Future Vision whitepaper. Essentially, I want to allow at least a cut-down form of ordering and grouping. In the whitepaper, Reiser points out that there are two types of data sets: ordered and unordered. The current identification for files can be viewed as an ordered set, delimited with slashes. Unordered data, referred to as a “grouping”, provides a way of loosely associating terms with files. For example, performing a search for the terms Christmas, presents, chimney and man would look like:

[Christmas presents chimney man]

Perhaps a more real-world example would be searching for photographs from a visit to a Christmas market during a trip to Dublin:

[type/photo Dublin Christmas market]

This represents the ordering that the filetype should be a photo, as well as the grouping of the various terms. In groups, the terms can be re-ordered without affecting their meaning, so:

[market Christmas Dublin type/photo]

would be an identical query. At this point, I’ll introduce a more complex query and explain how I’ve chosen to simplify the system, at least for now. Let’s say that you have the query:

[market Christmas Dublin type/photo]/Dave

In order to try to find photos somehow related to me. I’m not sure semantically exactly what this would mean, so I will simplify the syntax to disallow this. In this case, groupings may only be the final term in an ordering, but orderings may be many levels deep, i.e.:

[type/music/mp3 title/"We Will Rock You"]

would find only MP3 versions of the song, rather than (say) OGG or WMA. One thing that I want to add to this, however, is a way to organise the results of a query. It’s all very well saying:

type/music/wma

to find all WMA-format music (for conversion, perhaps), but this could return hundreds of files, if not more. What is really needed is some type of return format specification, based on metadata that will be associated with the file. I’m not sure exactly how this will work, but perhaps something like (broken to fit on page):

type/music/wma <(artist|"Unknown Artist")/(album|)/
(track&" - "|"")(title|"Untitled track").wma>

To represent the idea that the files should be organised further by creating sub-paths:

The value of the artist ordering (or the string "Unknown Artist" if none)
The value of the album ordering (or skip creating the subdirectory)
The value of the track ordering, combined with the string " - ", or nothing
The value of the title ordering, or the string "Untitled track" if none

Note that there would have to be a way to deal with duplicate filenames with this syntax, for backward compatability. This would probably involve appending a number to the end of the filename.

It would be nice to then associate these queries with permanent legacy locations, e.g. /dbfs/music is actually the stored query (broken to fit on the page):

type/music <(artist|"Unknown Artist")/(album|)/
(track&" - "|)(title|"Untitled track").wma>

Although to be honest the format syntax needs to become more powerful, and maybe less obscure. Then again, a nice graphical tool could be written to construct these filters…

A Blog Less Ordinary

Insight: some thoughts

Leave a Reply Cancel reply