<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
>

<channel>
	<title>A mind less ordinary &#187; University</title>
	<atom:link href="http://www.dmi.me.uk/blog/category/personal/uni/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dmi.me.uk/blog</link>
	<description></description>
	<lastBuildDate>Wed, 23 Nov 2011 16:01:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
		<item>
		<title>A content-based file manager</title>
		<link>http://www.dmi.me.uk/blog/2009/06/04/a-content-based-file-manager/</link>
		<comments>http://www.dmi.me.uk/blog/2009/06/04/a-content-based-file-manager/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 22:12:38 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[Braindump]]></category>
		<category><![CDATA[file manager]]></category>
		<category><![CDATA[ideas]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/?p=124</guid>
		<description><![CDATA[I've been thinking recently about Insight again, and I've been considering part of the problem with naming and uniqueness. Names in a traditional file system are made unique based on a full path to the file, but most people think of a file name as just the final component. This would then cause a problem [...]]]></description>
			<content:encoded><![CDATA[<p>I've been thinking recently about Insight again, and I've been considering part of the problem with naming and uniqueness.</p>
<p>Names in a traditional file system are made unique based on a full path to the file, but most people think of a file name as just the final component. This would then cause a problem with the move to Insight, as a file could appear in multiple directories, and its only distinguishing feature would be the final component of its path. This is counter-intuitive and can cause all sorts of problems.</p>
<p>Consider makefiles, for example. They rely on a standard named file (<tt>Makefile</tt>) appearing at various levels in the hierarchy in order to work. Obviously, you would want different makefiles at different levels and in different projects, but Insight as it stands has no way to handle this.</p>
<p>I then started thinking about what makes a file unique. In the end, I came up with two things: name and content. This covers the makefile case (same name, different content) as well as the backup case (same content, different name). It then occurred to me that, in the general case, all you need to distinguish a file is its content, and then actually finding it can all be left up to metadata.</p>
<p><span id="more-124"></span>If files are then thought of as containers for data that happen to have a unique internal identifier (which never needs to be exposed to the user, although it can be accessed as, say, the file's inode number) then the idea of a content-based file manager comes into play. These examples work best with visual media, particularly images, but there is no reason in principle that this could not be extended.</p>
<p>Imagine searching for a file. You know it's a photo, but you have a large collection of them. With a digital camera and a large-capacity memory card, who needs to ever delete a photo? We'll assume that you've dilligently tagged the photos with metadata as you've imported them from the camera, through some easy batch process.</p>
<p>On the tagging point: a lot can be taken from the metadata stored by the camera (date/time, resolution, orientation, black and white/colour, perhaps GPS co-ordinates) and with the right tools, more can be inferred (auto-tagging faces, buildings, perhaps recognising common events like football matches, converting GPS co-ordinates to places, ...). As time goes on, people will need to do less and less manual tagging.</p>
<p>Anyway, back to the file manager. You know you are after a picture or a set of pictures. Normal thought processes will probably follow a path similar to: "Yeah, I wanted to show dad those <strong>photos</strong> from that <strong>holiday</strong> in <strong>Paris</strong> that we had <strong>two months ago</strong>. I think he'd particularly like the ones we got of the <strong>Louvre</strong>, as well as the ones <strong>with me in</strong>, of course." I've highlighted various key words that can be translated directly to metadata searches. Notice how these all involve a narrowing down of the query.</p>
<p>To convert these to filters, we then have:</p>
<ul>
<li>type: <strong>photo</strong></li>
<li><strong>holiday</strong></li>
<li>location: <strong>Paris</strong></li>
<li>date: <strong>two months ago</strong></li>
<li>at least one of:
<ul>
<li>location: <strong>Louvre</strong></li>
<li>person: <strong>Me</strong></li>
</ul>
</li>
</ul>
<p>This could also be represented by a query:</p>
<blockquote>
<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">type:photo AND holiday AND location:Paris AND date:-2m<br />
AND (location:Louvre OR person:me)</div></div>
</blockquote>
<p>Breaking it down in this way feels fairly technical and wordy, however. I'd much prefer a visual view.</p>
<p>Imagine a black field, speckled with points of light representing your photos:</p>
<p><img class="aligncenter size-full wp-image-131" title="Content File Manager 1" src="http://www.dmi.me.uk/blog/wp-content/uploads/2009/06/content-file-manager-1.png" alt="Content File Manager 1" width="450" height="340" /></p>
<p>You filter by "holiday", and (because it learns based on previous searches) it then groups by location. The ones which have been filtered out fade into nothing, and the photos group into labelled blobs and enlarge slightly:</p>
<p><img class="aligncenter size-full wp-image-132" title="Content File Manager 2" src="http://www.dmi.me.uk/blog/wp-content/uploads/2009/06/content-file-manager-2.png" alt="Content File Manager 2" width="450" height="340" /></p>
<p>You filter by date, and as you drag the slider, irrelevant items fade away and relevant ones enlarge:</p>
<p><img class="aligncenter size-full wp-image-133" title="Content File Manager 3" src="http://www.dmi.me.uk/blog/wp-content/uploads/2009/06/content-file-manager-3.png" alt="Content File Manager 3" width="450" height="340" /></p>
<p>Then you add the final filters and set the photos up for viewing, perhaps as a slideshow... and you're done!</p>
<p>Pretty neat, I think.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2009/06/04/a-content-based-file-manager/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>In-place array uniq in C</title>
		<link>http://www.dmi.me.uk/blog/2008/07/10/in-place-uniq-in-c/</link>
		<comments>http://www.dmi.me.uk/blog/2008/07/10/in-place-uniq-in-c/#comments</comments>
		<pubDate>Thu, 10 Jul 2008 21:43:55 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/?p=64</guid>
		<description><![CDATA[I've been developing Insight even though the uni project has come to an end, because it's fun! I also want to make it more stable and eventually release it under an open-source licence of some kind. There will be an update coming soon, I promise! I now have Internet, so I can write up some [...]]]></description>
			<content:encoded><![CDATA[<p>I've been developing <strong>Insight</strong> even though the uni project has come to an end, because it's fun! I also want to make it more stable and eventually release it under an open-source licence of some kind. There will be an update coming soon, I promise! I now have Internet, so I can write up some things...  Anyway, one of the interesting things I wanted to do for Insight was an in-place form of <tt>uniq</tt> for an array, ideally without any additional memory allocation. It seems that this is something nobody else has done yet! So I set about doing it myself...  For those of you who are unfamiliar with the Linux/UNIX command <tt>uniq</tt>, it takes a sorted list and removes any duplicates. This is almost exactly what I'm trying to do, with one caveat: I need to keep the "discarded" duplicates.  What happens is that I have an array containing a number of strings, and these have all been dynamically allocated via <tt>malloc()</tt> or <tt>calloc()</tt>. If I just remove or overwrite their pointers, they'll vanish and cause a memory leak. While I've now fixed a large number of leaks thanks to <a title="Valgrind debugger/profiler" href="http://valgrind.org" target="_blank">Valgrind</a>, I'm trying to actively avoid any possibility of adding them.  Read on for the details... <span id="more-64"></span></p>
<h2>The Idea</h2>
<p>The basic idea is that we have a sorted list of items that may have some duplicates, say:</p>
<pre>{A<sub>0</sub>, A<sub>1</sub>, B<sub>0</sub>, C<sub>0</sub>, C<sub>1</sub>, E<sub>0</sub>, F<sub>0</sub>, F<sub>1</sub>, F<sub>2</sub>, G<sub>0</sub>, H<sub>0</sub>, H<sub>1</sub>}</pre>
<p>and we want to remove all of the duplicates, so we get two sets:</p>
<pre>{A<sub>0</sub>, B<sub>0</sub>, C<sub>0</sub>, E<sub>0</sub>, F<sub>0</sub>, G<sub>0</sub>, H<sub>0</sub>} {A<sub>1</sub>, C<sub>1</sub>, F<sub>1</sub>, F<sub>2</sub>, H<sub>1</sub>}</pre>
<p>Note that it doesn't matter which of the duplicates ends up in which set, just that the items in the first are all unique and that we don't lose any. Also, <tt>A<sub>0</sub></tt> and <tt>A<sub>1</sub></tt> have the same value but the subscript will help to distinguish exactly which of the <tt>A</tt>s I'm talking about.  Now we have the problem, let's look at some solutions.</p>
<h3>Lossless solution - additional lists</h3>
<p>So the first solution uses two temporary lists, <tt>unique</tt> and <tt>duplicate</tt>. We start with two pointers into the original list: <tt>p</tt> and <tt>q</tt>. We set <tt>p</tt> to point to the start of the list, and <tt>q</tt> to point to the element after <tt>p</tt>. We then copy the item <tt>p</tt> points at to our <tt>unique</tt> list. Then, while we still have more items to examine:</p>
<ul>
<li>If the items pointed to by <tt>p</tt> and <tt>q</tt> have the same value, then we copy the item <tt>q</tt> points at to the <tt>duplicate</tt> list and advance <tt>p</tt> and <tt>q</tt>.</li>
<li>If they are different, we copy the item at <tt>q</tt> to our <tt>unique</tt> list, and advance <tt>p</tt> and <tt>q</tt>.</li>
</ul>
<p>We keep doing this until we get to the end of the array, and then copy the <tt>unique</tt> list to the start of the array, the <tt>duplicate</tt> list after that (both overwriting the previous contents) and set the number of unique items to the length of the <tt>unique</tt></p>
<p>list.</p>
<h3>Lossy solution - one list</h3>
<p>We can do a similar thing in-place, although this will result in data loss. The essential idea is:</p>
<ul>
<li>We start in the same way: <tt>p</tt> pointing to the first element, and <tt>q</tt> to the second</li>
<li>While <tt>q</tt> has the same value as <tt>p</tt>, advance <tt>q</tt> along the list</li>
<li>When they are different, advance <tt>p</tt>, copy <tt>q</tt> to <tt>p</tt>, and advance <tt>q</tt></li>
<li>Once <tt>q</tt> goes past the end of the list, we're done, and the offset of <tt>p</tt> is the number of unique items.</li>
</ul>
<p>Now this is quite simple, and only requires one pass through the list, but it results in the loss of information, which is unacceptable in this case.</p>
<h3>Lossless solution - one list</h3>
<p>For this solution, we can also do it with one pass through the list and two pointers.  The basic intuition is that we're going through the array, accumulating a rotating block of stuff that we don't want. At any given point:</p>
<ul>
<li>Everything between the start and <tt>p</tt> (not inclusive) is the unique list (so far)</li>
<li>Everything between <tt>p</tt> and <tt>q</tt> (not inclusive) is an unwanted duplicate</li>
<li>Everything between <tt>q</tt> (inclusive) and the end is yet to be processed.</li>
</ul>
<p>At the end, we return the number of unique items at the start of the list; everything above that is a duplicate.</p>
<h2>For integers</h2>
<p>Here is some pseudo-Java that performs my algorithm for a set of integers.</p>
<pre>
<div class="codecolorer-container c twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><div class="c codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333;">int</span> set_uniq<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> set<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; <span style="color: #993333;">int</span> count <span style="color: #339933;">=</span> set.<span style="color: #202020;">length</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; <span style="color: #993333;">int</span> tmp<span style="color: #339933;">,</span> p<span style="color: #339933;">=</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> q<span style="color: #339933;">=</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span><br />
&nbsp; <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>p <span style="color: #339933;">&lt;</span> set.<span style="color: #202020;">length</span> <span style="color: #339933;">&amp;&amp;</span> q <span style="color: #339933;">&lt;</span> set.<span style="color: #202020;">length</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>p <span style="color: #339933;">&lt;</span> set.<span style="color: #202020;">length</span> <span style="color: #339933;">&amp;&amp;</span> set<span style="color: #009900;">&#91;</span>p<span style="color: #009900;">&#93;</span> <span style="color: #339933;">!=</span> set<span style="color: #009900;">&#91;</span>q<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; p <span style="color: #339933;">=</span> p <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>q <span style="color: #339933;">&lt;</span> set.<span style="color: #202020;">length</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>p <span style="color: #339933;">&lt;</span> q<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tmp <span style="color: #339933;">=</span> set<span style="color: #009900;">&#91;</span>p<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; set<span style="color: #009900;">&#91;</span>p<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> set<span style="color: #009900;">&#91;</span>q<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; set<span style="color: #009900;">&#91;</span>q<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> tmp<span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; q <span style="color: #339933;">=</span> q <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">else</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>count <span style="color: #339933;">&gt;</span> p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; count <span style="color: #339933;">=</span> p<span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>q <span style="color: #339933;">&lt;</span> max <span style="color: #339933;">&amp;&amp;</span> set<span style="color: #009900;">&#91;</span>p<span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> set<span style="color: #009900;">&#91;</span>q<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; q <span style="color: #339933;">=</span> q <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; <span style="color: #666666; font-style: italic;">// q hit the end and is still the same as p</span><br />
&nbsp; <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>count <span style="color: #339933;">&gt;</span> p<span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; count <span style="color: #339933;">=</span> p <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span><br />
<br />
&nbsp; <span style="color: #b1b100;">return</span> count<span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span></div></div>
</pre>
<h2>The general code</h2>
<p>The arguments to the general <tt>set_uniq()</tt> function are very similar to the arguments to standard <tt>qsort()</tt>:</p>
<ul>
<li>The (sorted) array to work on</li>
<li>The number of items in the array</li>
<li>The size of each item in the array</li>
<li>A comparator function that returns a negative integer if the first argument is less than the second, zero if they are equal, and a positive integer if the first argument is greater than the second.</li>
</ul>
<p>So, without further ado, my code:</p>
<pre>
<div class="codecolorer-container c twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="c codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333;">int</span> set_uniq<span style="color: #009900;">&#40;</span><span style="color: #993333;">void</span> <span style="color: #339933;">*</span>set<span style="color: #339933;">,</span><br />
&nbsp; &nbsp; <span style="color: #993333;">size_t</span> count<span style="color: #339933;">,</span><br />
&nbsp; &nbsp; <span style="color: #993333;">size_t</span> elem_size<span style="color: #339933;">,</span><br />
&nbsp; &nbsp; <span style="color: #993333;">int</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">*</span>cmp<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">void</span><span style="color: #339933;">*,</span> <span style="color: #993333;">const</span> <span style="color: #993333;">void</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>p<span style="color: #339933;">,</span> <span style="color: #339933;">*</span>q<span style="color: #339933;">,</span> <span style="color: #339933;">*</span>max<span style="color: #339933;">;</span><br />
&nbsp; <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>tmp<span style="color: #339933;">=</span><span style="color: #000066;">calloc</span><span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> elem_size<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #808080; font-style: italic;">/* TODO: check for failure */</span><br />
&nbsp; p<span style="color: #339933;">=</span>set<span style="color: #339933;">;</span><br />
&nbsp; q<span style="color: #339933;">=</span>set<span style="color: #339933;">+</span>elem_size<span style="color: #339933;">;</span><br />
&nbsp; max<span style="color: #339933;">=</span>set<span style="color: #339933;">+</span><span style="color: #009900;">&#40;</span>count<span style="color: #339933;">*</span>elem_size<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>p<span style="color: #339933;">&lt;</span> p<span style="color: #339933;">-</span>set<span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; count <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>p<span style="color: #339933;">-</span>set<span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span>elem_size<span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>q p<span style="color: #339933;">-</span>set<span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; count <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span> <span style="color: #339933;">+</span> <span style="color: #009900;">&#40;</span>p<span style="color: #339933;">-</span>set<span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span>elem_size<span style="color: #339933;">;</span><br />
&nbsp; ifree<span style="color: #009900;">&#40;</span>tmp<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; <span style="color: #b1b100;">return</span> count<span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span></div></div>
</pre>
<h2>Conclusion</h2>
<p>This code is released under a Creative Commons licence (see below). A similar idea will work well for other solutions, like set difference. I'll post those later if anyone really cares.  Also, there is a better way of swapping the items by using the fact that:</p>
<pre>
<div class="codecolorer-container c twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="c codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">a <span style="color: #339933;">^=</span> b<span style="color: #339933;">;</span><br />
b <span style="color: #339933;">^=</span> a<span style="color: #339933;">;</span><br />
a <span style="color: #339933;">^=</span> b<span style="color: #339933;">;</span></div></div>
</pre>
<p>swaps a and b without needing a temporary variable. If you just go along the data in 32-bit chunks, performing those XOR operations to swap the data. This actually works out to be 30% or so faster (if I remember correctly).  Finally - general code like this is always going to be slower than code that's been specialised for a particular purpose. Calling this on integers, for example, will be much slower than coding an integer-specific version. I should also point out that you can just overwrite integers, as you don't need to free them <img src='http://www.dmi.me.uk/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' title="icon wink photo" /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/07/10/in-place-uniq-in-c/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Insight: An update</title>
		<link>http://www.dmi.me.uk/blog/2008/06/09/insight-an-update/</link>
		<comments>http://www.dmi.me.uk/blog/2008/06/09/insight-an-update/#comments</comments>
		<pubDate>Mon, 09 Jun 2008 16:29:55 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/?p=56</guid>
		<description><![CDATA[Just a very brief update for the last few days: Files can now be opened and read - writing and deletion coming soon! File import does tend to confuse ln, as it expects the new destination to be a symlink rather than a file. Nothing I can really do about this for now though! I've [...]]]></description>
			<content:encoded><![CDATA[<p>Just a very brief update for the last few days:</p>
<ul>
<li>Files can now be opened and read - writing and deletion coming soon!</li>
<li>File import does tend to confuse ln, as it expects the new destination to be a symlink rather than a file. Nothing I can really do about this for now though!</li>
<li>I've manually assigned some tags to files, and queries now work beautifully (apart from subcategory union, which I hope to tackle later tonight).</li>
<li>Having said that, only simple conjunctive queries work, as there's no support for negation or disjunction. Yet.</li>
<li>Inode set functions are now in place and working.</li>
</ul>
<p>One more update before I get back to coding... Insight now has an official logo!</p>
<p><img class="aligncenter size-full wp-image-57" title="Insight logo (blue)" src="http://www.dmi.me.uk/blog/wp-content/uploads/2008/06/insight_blue_letters.png" alt="Insight logo" width="300" height="100" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/06/09/insight-an-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Insight: Where am I now, and where next?</title>
		<link>http://www.dmi.me.uk/blog/2008/06/05/insight-where-am-i-now-and-where-next/</link>
		<comments>http://www.dmi.me.uk/blog/2008/06/05/insight-where-am-i-now-and-where-next/#comments</comments>
		<pubDate>Thu, 05 Jun 2008 23:18:29 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/?p=55</guid>
		<description><![CDATA[So I've been in Deep Coding Mode™ for quite a while. What have I got to show for it? Well, the short answer is that Insight is now a functioning file system... for a given definition of "functioning". As of this morning: It can successfully import files from the rest of the hierarchy Tags (i.e. [...]]]></description>
			<content:encoded><![CDATA[<p>So I've been in Deep Coding Mode™ for quite a while. What have I got to show for it?</p>
<p>Well, the short answer is that <strong>Insight</strong> is now a functioning file system... for a given definition of "functioning".</p>
<p><span id="more-55"></span></p>
<p>As of this morning:</p>
<ul>
<li>It can successfully import files from the rest of the hierarchy</li>
<li>Tags (i.e. directories) can be created and removed at any level.</li>
<li>Tags appear and disappear more or less as you expect (i.e. if you already have a tag in your path, it won't show up in listings again).</li>
<li>Tags that are synonyms will show up as symbolic links to the actual target tag (but currently do not obey the rule in the point above, i.e. if their target has been used in the path, they may still appear).</li>
</ul>
<p>So far, so good. Now for the limitations:</p>
<ul>
<li>Files cannot currently be opened, read from, written to, or deleted.</li>
<li>Files must be imported in a strange manner: as absolute symbolic links. They then show up as regular files, although they are actually just links to the originals elsewhere in the filesystem.</li>
<li>Tags cannot be assigned to files (or removed from them)</li>
<li>Files can therefore only be imported at the root level</li>
<li>Queries have no effect on file listing, and so listings just show files in limbo</li>
<li>Of course, there is no subcategory union either.</li>
</ul>
<p>But I am working on all of these things. At the moment, the main thing is sorting out the internal inode lists. Once those are done, then it should be quite straightforward to do tag assignment/removal and import directly into tags. Plan of action, therefore:</p>
<ol>
<li>Implement inode insertion/deletion</li>
<li>Implement inode set functions (intersection, union, difference)</li>
<li>Re-implement query tree builder from path. Currently only deals with building a basic conjunctive query tree and assumes that all components are tags. Should:
<ul>
<li> Take a path</li>
<li>Canonicalise it</li>
<li>Check path components (left-to-right) to ensure tags exist</li>
<li>If last part is an incomplete tag, treat appropriately</li>
<li>If last part is a complete tag, then fine</li>
<li>If last part does not resolve as a tag, then hash it and see if it translates to a known inode</li>
<li>If not, or if any tags in path do not exist, then path is invalid</li>
<li>If it is a valid inode, then add QUERY_IS_INODE node to tree</li>
<li>Otherwise return query tree</li>
</ul>
</li>
<li>Implement query processing:
<ul>
<li>Given input set of inodes, produce an output set at each node of the query tree.</li>
<li>In trivial case with top-level <tt>IS_ANY</tt> node, output set is the set of limbo inodes, with internal negation flag set to false</li>
<li>With an <tt>IS</tt> node, the output set is the recursive union of the inodes belonging to that tag and its subtags, with internal negation flag set to false</li>
<li>With an <tt>IS_NOSUB</tt> node, the output set is the set of inodes belonging tag, with internal negation flag set to false</li>
<li>With an <tt>IS_INODE</tt> node, the output set contains a single element: the inode.</li>
<li>With an <tt>IS_NOT</tt> node with a subquery, the output set is identical to the subquery resultset, with an internal negation flag inverted</li>
<li>With an <tt>IS_NOT</tt> node with a tag, the output set is the same as for an IS node, with an internal negation flag set to true</li>
<li>An <tt>AND</tt> node output depends on the negation flags of its subqueries:
<ul>
<li>Both false: output is the set intersection of its subqueries, with negation flag clear</li>
<li>Both true: output is union of subqueries, with negation flag set</li>
<li>Otherwise: output is set difference, with the negation-true set removed from the negation-false set, and the negation flag cleared</li>
</ul>
</li>
<li>An <tt>OR</tt> node output depends on the negation flags of its subqueries:
<ul>
<li>Both false: output is union of subquery results, with negation flag clear</li>
<li>Both true: output is intersection of subquery results, with negation flag set</li>
<li>Otherwise: output is <strong><span style="color: #ff0000;">???</span></strong></li>
</ul>
</li>
<li>Probably very likely to be an error if the negation flag is found to be set at the top level.</li>
<li>Also have to think about how to build a tree from a bracketed expression. But later. Much later.</li>
</ul>
</li>
<li>Output of query processing is an inode set.</li>
<li>Maybe low-overhead query processing just to see if an inode would match the query?</li>
<li>Implement open/read/write as pass-through operations on the inode symlink targets.</li>
<li>Implement symlinking directories as creating synonyms.</li>
<li>Add <strong>LOTS</strong> of checks.</li>
<li>Note: also have to track inode reference count, so that when it gets to zero the inode is added to the limbo list. Once removed from there, it is removed from the filesystem completely.</li>
</ol>
<p>These should be quite straightforward to do (I hope), especially as I know more or less exactly what I'm doing. Deadlines are closing in, however, and I have a report and presentation and demo to write yet. Hopefully I can get much of this done by Tuesday, then can spend the day doing bits of my report.</p>
<p>I must say that I do love developing this. It's just so amazing to be developing a file system and see it work!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/06/05/insight-where-am-i-now-and-where-next/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Insight: The current plan</title>
		<link>http://www.dmi.me.uk/blog/2008/05/19/insight-the-current-plan/</link>
		<comments>http://www.dmi.me.uk/blog/2008/05/19/insight-the-current-plan/#comments</comments>
		<pubDate>Mon, 19 May 2008 12:51:03 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/?p=54</guid>
		<description><![CDATA[As it comes time to work on my project again, it's time to take stock and work out what my plan of action should be. Looking at the code I have already, I think it's clear that there is no way I will be able to write a kernel-level file system driver within the 2-3 [...]]]></description>
			<content:encoded><![CDATA[<p>As it comes time to work on my project again, it's time to take stock and work out what my plan of action should be.</p>
<p>Looking at the code I have already, I think it's clear that there is no way I will be able to write a kernel-level file system driver within the 2-3 weeks I have left. Fortunately, I had more or less expected this (as writing kernel code would be likely to take quite a while and be quite complex!) so I'm retreating to my fallback position: a FUSE-wrapped program that will interface with the metadata store.</p>
<p>Also on the cards is the use of <a href="http://check.sourceforge.net/" target="_blank">Check</a> as a C unit testing framework for my tree code - if I have time. At the moment, getting something to work is far more important than proving it is correct or works in all cases.</p>
<p>Finally, I need to come up with and write the demo programs, and I will shortly be posting about this and then asking the Twitterverse for ideas <img src='http://www.dmi.me.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' title="icon smile photo" /> </p>
<p>Time to enter Deep Coding Mode.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/05/19/insight-the-current-plan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>No more exams!</title>
		<link>http://www.dmi.me.uk/blog/2008/05/16/no-more-exams/</link>
		<comments>http://www.dmi.me.uk/blog/2008/05/16/no-more-exams/#comments</comments>
		<pubDate>Fri, 16 May 2008 10:00:09 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Life]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/?p=52</guid>
		<description><![CDATA[So my exams have finally finished... it's a very surreal feeling. Still got the project to go, but my time at university is even closer to being over. I don't think it's really sunk in yet. Once the exam was over, a group of us headed to get some drinks, and came away with four [...]]]></description>
			<content:encoded><![CDATA[<p>So my exams have finally finished... it's a very surreal feeling. Still got the project to go, but my time at university is even closer to being over. I don't think it's really sunk in yet.</p>
<p>Once the exam was over, a group of us headed to get some drinks, and came away with four bottles of cheap <span style="text-decoration: line-through;">champagne</span> sparkling white wine ("We've just finished our exams and want quantity over quality. What have you got?") and headed to the Union to consume it. Sadly things got slightly out of hand there, but I eventually headed back home after completely failing to work out how I could get to Camden (because I managed to completely forget the Tube existed).</p>
<p>Came back after the post-exam celebrations to find these stashed in the kitchen:</p>
<p><a href="http://www.dmi.me.uk/blog/wp-content/uploads/2008/05/dsc01157.jpg"><img class="aligncenter size-medium wp-image-53" title="Old Rosie cider" src="http://www.dmi.me.uk/blog/wp-content/uploads/2008/05/dsc01157-300x225.jpg" alt="Six 2L bottles of Old Rosie scrumpy" width="300" height="225" /></a></p>
<p>My cider has arrived!</p>
<p>So, next stop: project. The due date now seems incredibly close!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/05/16/no-more-exams/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>The End Is Nigh</title>
		<link>http://www.dmi.me.uk/blog/2008/03/07/the-end-is-nigh/</link>
		<comments>http://www.dmi.me.uk/blog/2008/03/07/the-end-is-nigh/#comments</comments>
		<pubDate>Fri, 07 Mar 2008 11:00:41 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Life]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/2008/03/07/the-end-is-nigh/</guid>
		<description><![CDATA[... and that's it. No more lectures. Ever. Apart from revision lectures, but they don't count in the same way. This feels like a bit of an anti-climax. It's hard to believe that four years at university are almost over! It's going to take some getting used to. Of course, it's not all over just [...]]]></description>
			<content:encoded><![CDATA[<p>... and that's it. No more lectures. Ever. Apart from revision lectures, but they don't count in the same way. This feels like a bit of an anti-climax. It's hard to believe that four years at university are almost over! It's going to take some getting used to.</p>
<p>Of course, it's not all over just yet - there are still exams to go.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/03/07/the-end-is-nigh/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Insight: FUSE</title>
		<link>http://www.dmi.me.uk/blog/2008/03/04/insight-fuse/</link>
		<comments>http://www.dmi.me.uk/blog/2008/03/04/insight-fuse/#comments</comments>
		<pubDate>Tue, 04 Mar 2008 11:27:11 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/2008/03/04/insight-fuse/</guid>
		<description><![CDATA[Well, I'm now ready to start playing around with FUSE and see where it gets me. I'm getting quite concerned now by how little time is left... we got our exam timetable yesterday, and it doesn't seem too bad... although we do have two exams on the same day at one point. Still, I've got [...]]]></description>
			<content:encoded><![CDATA[<p>Well, I'm now ready to start playing around with FUSE and see where it gets me. I'm getting quite concerned now by how little time is left... we got our exam timetable yesterday, and it doesn't seem too bad... although we do have two exams on the same day at one point. Still, I've got a few weeks to go...</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/03/04/insight-fuse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Insight: Work so far&#8230;</title>
		<link>http://www.dmi.me.uk/blog/2008/02/14/insight-work-so-far/</link>
		<comments>http://www.dmi.me.uk/blog/2008/02/14/insight-work-so-far/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 18:07:48 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/2008/02/14/insight-work-so-far/</guid>
		<description><![CDATA[Insight is progressing, but slowly. I'm working on the first prototype, which should essentially prove the concept of the indexing system. It turns out that writing a B+ tree is more awkward than I had thought, which is slightly depressing, as that is meant to be the easy bit! Then again, once the tree is [...]]]></description>
			<content:encoded><![CDATA[<p>Insight is progressing, but slowly. I'm working on the first prototype, which should essentially prove the concept of the indexing system. It turns out that writing a B+ tree is more awkward than I had thought, which is slightly depressing, as that is meant to be the easy bit! Then again, once the tree is done, it should be plain sailing for a little while.</p>
<p>I had my second marker meeting last Wednesday, which went well. I do need to find a way to explain Insight very quickly, however. People tend to look blankly at me and say "Why would I need this?" to begin with, then they might say something like "What about Spotlight/other indexing service?" or "Why in the file system?" but once I fully explain some of its potential, they get quite excited.</p>
<p><span id="more-39"></span>One of the many things I will need to think about is how to represent synonyms. Say, for example, that you wanted <kbd>course.dist-alg</kbd> to be equivalent to <kbd>course."Distributed Algorithms"</kbd>. You would need to have some way of stating that, and of choosing a primary representation. Again, you might want <kbd>type.photo</kbd> to be equivalent to <kbd>type.image</kbd> and <kbd>type.picture</kbd>. I think the way to do that would be using an alias flag in the index tree, and having the first few bytes in the inode area as a pointer to the primary representation. Whether I'll implement this in time or not is another matter <img src='http://www.dmi.me.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' title="icon smile photo" /> </p>
<p>Another thing I was thinking about: dynamic values in queries. Say, for example, you want to have a pretty complex query. For example, you're a lecturer and you want a certain folder to always display the slides for the current lecture time. You want to do something like:</p>
<blockquote>
<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/lecture slides/course:{ SWITCH([now().day,now().hour], [[['Mon',11],'multi-agent'], [['Mon',12],'multi-agent'], [['Tue',14],'dist-alg'], [['Thu',10],'graphics'], [['Thu',11],'graphics'], [['Thu',16],'dist-alg'], [['Thu',17],'dist-alg'], [['Fri',10],'multi-agent'], [['Fri',11],'multi-agent'], [['Fri',12],'multi-agent'], [['Fri',14],'graphics']], '') }</div></div>
</blockquote>
<p>... although that's a fairly long-winded way of going about it. This represents dynamic code (delimted by braces) executing the SWITCH() function, which takes three arguments. The first is an item to find, the second is a list of two-element lists, and the third is the default to use if nothing is found. The first element is pattern-matched against the first argument of SWITCH(), and the second element will be returned by the function if the first matches.</p>
<p>Of course, this is just very rough, random and unstructured code, but it hopefully shows off some of the future possibilities for Insight.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/02/14/insight-work-so-far/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Insight: First report done!</title>
		<link>http://www.dmi.me.uk/blog/2008/01/23/insight-first-report-done/</link>
		<comments>http://www.dmi.me.uk/blog/2008/01/23/insight-first-report-done/#comments</comments>
		<pubDate>Wed, 23 Jan 2008 19:42:29 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Insight (semantic filesystem)]]></category>
		<category><![CDATA[Life]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://www.dmi.me.uk/blog/2008/01/23/insight-first-report-done/</guid>
		<description><![CDATA[Well, after much hard work, criticism, and more: the first report is in! I'm actually only writing this post two weeks later because once it was done, I just had to take some time to catch up on everything else I've been letting slide while working almost non-stop on this report. This report has outlined [...]]]></description>
			<content:encoded><![CDATA[<p>Well, after much hard work, criticism, and more: the first report is in! I'm actually only writing this post two weeks later because once it was done, I just had to take some time to catch up on everything else I've been letting slide while working almost non-stop on this report.</p>
<p>This report has outlined a number of my thoughts and decisions, and one of the last ideas to be considered for inclusion in the report was the issue of backwards compatability with existing systems. I have decided to follow the examples given in the paper by Gifford et al. on semantic file systems. I'll be describing this in more detail soon... I promise!</p>
<p>Anyway, I've decided to make the <a TITLE="Insight Outsourcing Report" HREF="http://www.dmi.me.uk/blog/wp-content/uploads/2008/01/insight-outsourcing.pdf">Insight Outsourcing Report</a> available here for anyone that's interested. Once I get the time to deal with the site as well, I'll give it a more obvious home.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dmi.me.uk/blog/2008/01/23/insight-first-report-done/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
	</item>
	</channel>
</rss>

