GtkOverlay

You can abuse GtkOverlay to do some pretty neat things. Here is a widget that obscures portions of the underlying GtkTextView based on matching search results. Ideally the overlay would also draw some fancy bevels around the transparent areas.

gb-search-overlay.c

Update: Now with bevel.

Seasons Change

Today was my last day at Catch.com. I spent the last 20 months there building the back-end and sync algorithm for their collaborative notes platform. In the process I built a couple of async drivers for MongoDB (twisted and glib) and various tools around the database. 10gen took notice and so I'll be heading there to work on MongoDB in March. (Software on github/gitorious/cgit is a great résumé).

Until then, I have a few weeks of relaxation that will likely be used to hack on some things from the developer hackfest. I spent some time prototyping tools with clang that I would like to finish. It was largely based on editor integration for some ideas I've had. Once that work becomes useful I'll take a stab at JavaScript code completion if nobody else does. Well, at least the GIR resolving side of things. Jasper pointed out Reflect.parse() which looks useful to extract an AST in spidermonkey. Does JSCore have anything similar?

Just like everyone else, I ended up getting that horrible FOSDEM flu. If you live near anyone that got it, bring them some soup and hugs. They'll be down for a couple of days.

Compiling with checks and assertions

Some people like to complain about g_return_if_fail() and g_assert() being left in when compiling release executables and libraries. I have learned that it can actually have a positive effect on performance, as it can start pre-fetching that data into cache-lines. To get the performance I lost by compiling with -DG_DISABLE_ASSERT and -DG_DISABLE_CHECKS I had to resort to the GCC builtin function __builtin_prefetch(addr). (Cast checks will occur a bit more overhead, but I'm not using that here).

Most interesting. Just something to think about.

Data Structures: Trie

Over the holidays I often enjoy learning a new data-structure by implementing it. This year is no different, and so I implemented a Trie (pronounced "try").

http://en.wikipedia.org/wiki/Trie

TL;DR

What is a Trie?

A Trie is a tree structure where each node of the tree contains a single character from an inserted string key. The first character of the string key is in the root node, and the second in a child, and so forth until you reach the end of the string key. Each node can point to multiple children and therefore the prefix of similar strings share nodes in the tree.

In the example this is illustrated with the two string keys "gnome" and "gnu".

What are they used for?

A trie is a useful data-structure when you have a very large data set of keys you would like to search through for a particular key. Each node you traverse in the tree gets you closer to your target string. However, one fantastic addition is that it allows you to find all of the other strings keys sharing this prefix without looking at the strings that do not.

For example, if I wanted to see all of the strings that started with "gn", I would get the results "gnome" and "gnu" based on the above example.

What did I build?

So I set out to build one of these for myself. However, I wanted to add a twist on it. I wanted to try to make a variant of the data-structure that fit well into the cache-line of my computer.

A cache-line is a short run of data that your CPU can access in the L1 cache. On most (all?) x86_64 machines, this is 64 contiguous bytes. Every time you access main memory, the CPU accesses the run of 64 bytes that contains your target memory address and loads that into the CPU. Therefore, if you can optimize your data-structure to align with these cache-lines, you can get a good performance boost!

What is needed to build this structure?

To build this data-structure, we need a couple of things. We know that it is a tree-like structure, so nodes will be required. We know that we need to have key/value pairs, so we need a place to store a pointer to the value provided with each string key. We know that we need pointers to the children nodes because this is a tree (and I learned that a pointer to the parent is also quite useful).

Additionally, since I plan to align these to cachelines, we need the ability to allocate more memory for more keys as they are added to each node.

How did I layout the nodes in the Trie?

I took a novel approach using dynamically sized structures. This allowed me to design each structure to be exactly one cacheline on x86_64. But since we need to be able to grow as more children are added to a node, the node also serves as the head to a linked-list. Each chunk in the link list contains a small array for up to 6 keys, up to 6 pointers to children nodes (saving a little space for overhead), and a pointer to the next chunk in the linked-list.

  1. struct _TrieNode
  2. {
  3. TrieNode *parent;
  4. gpointer value;
  5. TrieNodeChunk chunk;
  6. };
  7.  
  8. struct _TrieNodeChunk
  9. {
  10. guint8 flags;
  11. guint8 count;
  12. guint8 keys[6];
  13. TrieNodeChunk *next;
  14. TrieNode *children[0];
  15. };

By making the chunk a dynamically sized struct, I can place it inside of the node (sacrificing two pointers) and still fit four. That is pretty good considering it also buys me one less allocation and better cache locality when following pointers from the parent nodes.

How does Move-to-Front help cache locality?

I read a paper recently that mentioned that "move-to-front" can significantly raise your "cache-hit" ratio for L1 cache. A "cache-hit" is when your CPU needs to access a piece of memory and it is already in the L1 cache (meaning you do not need to fetch it from main memory). So I decided to do this for mine as well.

Move-to-Front moves a key that was just accessed to the first key in its containing node. It turns out that many workloads end up accessing the same node multiple times. Therefore, having that data embedded in the target node results in very large improvement in lookups, inserts, and removals per second!

In the use case I was working on, lookups took only 40% of the original elapsed time! Inserts and and removals were also quite dramatic, both taking about 70% of the original elapsed time. I attribute the difference between the lookup case and the insertion/delete case to malloc and free overhead.

Above test is for test_trie_gauntlet located here.

How does the memory allocator affect runtime?

I started by using plain old g_malloc0() and g_free(). (I'm still using those now, actually). But I wanted to see how I could change the allocator to improve the performance of the data structure.

Since we are doing almost exclusively 64-byte allocations, this seems like a perfect problem for the GSlice allocator (a SLAB based allocator that keeps around free'd memory to re-use for structures of the same size). However, that ended up being a lot slower. Perhaps that had to do with the attempt to be lock-free in some cases, and therefore flushing the cachelines that I'm working so hard to keep full with relevant data.

Bummer.

Next up, manage my own linked-list of free'd memory and re-use that instead of calling g_malloc0(). Additionally, I can allocate in one big chunk a few thousand structures at a time (and even more importantly, memset() them to zero in a single call).

Another Big Win!

How does that compare to a string array for this problem?

One of the interesting comparisons for this test was to see if auto-completion for a source editor widget could be faster with a Trie as compared to a string array. If you have an unsorted string array you must scan through all strings to find matching prefixes. Otherwise you can binary search them (and then walk the array until you find a string that does not match the prefix).

When is a string array faster?

When using unsorted string arrays and my sample set of about 16,000 function symbols, the searching the string array was faster for up to 3-4 characters.

When is a Trie faster?

Once you have reached about 5 characters, the Trie would be faster. The neat property of the Trie is that the more input you provide it, the faster it gets. So by time you are about 10-15 characters deep in your symbol name, the amount of time it takes to run is close to irrelevant (for my use case).

How could I make the string array faster?

Make it sorted, and binary search it for matching prefixes (and then walk until you get input that does not share the prefix).

Conclusion

In not sure either the Trie or the String Array provide quite what I'm looking for yet. I've found that tab-completion in bash is what most people are familiar with when it comes to completion. So providing a similar experience might be in my best interest. For those unfamiliar, the experience is one that when you press TAB, the completion is provided up to the next character that can provide disambiguation.

I'm considering the idea of using a string array up to a certain number of characters and then the Trie after that, but I'm not sure its worth the doubling of memory overhead. Both data-structures could be mmap()d however (in fact some CTags completion providers do this). On one hand, the Trie could give me a pointer to symbol information (like docs, parameters, etc), but a string array could do the same with a second array).

A pretty clear win would be to merge nodes with a single child into one node so that instead of having one character key in the node you could have a couple characters of the key in a node. This would move more into the realm of a Radix tree.

So I guess it requires further science. I'll likely follow up at a later time as I play with this more.

Manpages

A friend of mine said they had the perception that GNOME technologies appeared unfinished due to the nature that manpages were not installed for the APIs. While I didn't particularly agree with that sentiment, I did decide I would make manpages for them.

The gnome-manpages can be found here.

The resulting manpages look very similar to what you look at in devhelp, except in a terminal. They work with man -k keyword as well as Shift+K in VIM for typedefs, macros, functions, and structs. I do hope to provide configure patches to generate these in the various projects at some point soon.

Searching with man -k

Viewing documentation from VIM

GNOME University @ FOSDEM ‘13?

Is anyone interested in a GNOME University meetup while at FOSDEM '13? Leave a comment if you are so I can get an idea of how many.

I would imagine we could use that time for asking questions, guidance, individual help regarding your free software projects and what not. jhbuild also seems to be a common problem for which we can work through.

bloom filters

Everyone seems to be talking about bloom filters the last couple of years. So last night I implemented one to see what all the rage was about. I'm sure you can find better implementations, but you can find it at https://github.com/chergert/bloom-glib/blob/master/bloom-filter.h.

Also, if you want the opposite of a bloom filter, (direct-mapped cache), you can find that at https://github.com/catch/postal/blob/master/postal/postal-dm-cache.c.

It's likely you are smarter than I and can improve on them, so pull requests accepted.

PSA: Testing GObject Lifecycles

It's a good idea to test that the lifecycle of your object is what you expect when writing your unit tests with GTest (and I hope you do). Here is a quick way to do that.

  1. static void
  2. my_object_test (void)
  3. {
  4. MyObject *obj = my_object_new();
  5. /* play with obj */
  6. g_object_add_weak_pointer(G_OBJECT(obj), (gpointer *)&obj);
  7. g_object_unref(obj);
  8. g_assert(!obj);
  9. }

PSA: no more subdriectories in includes

If you are not writing GLib or Gtk, you probably don't need subdirectories in your header paths. I, too, do this often, and I think I'm ready to stop.

#include <gtk/gtk.h>

For example, in the above example, we include gtk/gtk.h. This was necessary (at least at one time) because we would have gtk/*, gdk/*, and possibly other headers all within this top-level directory (/usr/include/gtk-3.0).

For the rest of us, having such a separation of headers is likely not necessary. And if it is, you can just have a toplevel header that can include files from those subdirectories. So for example, my quite annoying choice to do

#include <mongo-glib/mongo-glib.h>

should just be

#include <mongo-glib.h>

I'll likely fix these in the near future unless someone has a compelling reason that I've overlooked.

Update: Mongo-GLib

Mongo-GLib was an attempt I did to write a MongoDB driver for GObject. I actually use it quite a bit at work now. Especially the MongoBson and MongoBsonStream structures for reading MongoDB backups off disk. It turns out to be quite useful for analyzing your backups as they stream to disk.

However, last night I added a new hack to the code-base as I work on cleaning up the API and making it more usuable. You can now implement your own MongoDB server. This could be useful if you want to, or need to, store data in a system other than MongoDB but want to export the data in MongoDB format. It allows you to piggyback on top of the client-side failover of the MongoDB drivers as well. It's almost always better to use a tested failover design than make your own. Also, I hope to use this to implement more comprehensive unit tests, checking both sides of the wire protocol.

My goal for using this is to be able to store certain pieces of data in Redis (as it had data-structures optimized for my use case) but would like to provide failover and documents using MongoDB.

It is just starting to work, so there is plenty more to do, but I thought it was worthy of a blog update.

  1. from gi.repository import Mongo
  2. from gi.repository import GLib
  3.  
  4. def handleRequest(server, client, message):
  5. bson = Mongo.Bson()
  6. bson.append_string("hello", "world")
  7. message.set_reply_bson(Mongo.ReplyFlags.NONE, bson)
  8. return True
  9.  
  10. Server = Mongo.Server()
  11. Server.add_inet_port(27017, None)
  12. Server.connect('request-query', handleRequest)
  13.  
  14. GLib.MainLoop().run()

It works from JavaScript as well.

https://github.com/chergert/mongo-glib