Catalina

I've written another library, Catalina. It started as an example for using the threading library Iris and turned into what I think is a useful library. Catalina is an object data-store for glib and gobject. It provides access through a natural key/value pair interface.

Transparent serialization is supported to and from storage for types that can be stored in GValue's. A tight binary format is provided with the library. It supports basic types such as integers, doubles, floats and strings as well as GObjects in an endian-safe manner. However someone should go double check to call my bluff (and verify its correctness). A JSON serializer would be a quick hack if someone was interested.

In addition to serialization, Catalina supports buffer transformations to and from storage. Included is CatalinaZlibTransform which can apply compression using zlib. It will avoid compression on buffers smaller than the watermark property. This will help on data-sets that are occasionally small and compression would in fact enlarge them.

Catalina is an asynchronous data-store by design. The optimal way of accessing it is the same.

Everything is built upon Trivial DB (TDB) from the samba project. It was chosen over Berkeley DB because of its license. Like Catalina, it is LGPL and does not impose extra restrictions on linking applications such as BDB.

However, the one downside to using TDB is its lack of concurrent transactions. This means that if you have multiple threads doing work and updating storage the transactions would interleave. Since we are using iris, we can use message passing as a way to manage concurrent transactions. (This is done by queuing messages until the commit phase.)

Here is a short example using Vala to asynchronously open, serialize and store a bunch of "Person" GObjects. All the while compressing each buffer with zlib. Don't be scared by the mutex/cond, it's there to negate the need of a main loop.

I intend to add indexes soon, however that is going to take a bit of planning.

So there you have it, my newest hack.

git clone git://git.dronelabs.com/catalina

Learning from others mistakes and successes

I seriously hate writing overly long blog posts, but this turned into one. You are forewarned.


What can Linux and the Free Desktop learn from recent marketing campaigns by Apple and Microsoft? Let's quickly take a look at a few of the campaigns over recent years from Apple.

  • "There's an app for that"
  • Seamless hardware support with built in drivers
  • Built in applications for digital media (iLife)
  • Does not crash (quite debatable it seems)
  • No malware or viruses

I was surprised how well they were able to comfort users about switching to OS X. The same qualms exist for Linux and in very similar ways.

Rather than worry about migrating existing applications to OS X, (iPhone really, but it still applies,) Apple comforted the user in knowing that anything they want to do can be achieved. With Debian, for example, there are tens of thousands of applications. Do we have an app for that? Probably.

The first commercials that came out for OS X talked about how hardware just worked when you plugged it in. No extra installation of drivers or finding installation cd-roms was needed. Of course, now that more hardware vendors are supporting the platform, it is no longer the case. Linux has an advantage here due to frequent release cycles. The consistent releasing of new software and drivers gives a leg up for supporting current hardware sooner. Granted, someone still needs to be writing those new drivers. But if GkH is right, then Linux also has more drivers than any operating system ever written.

Apple talks about their iLife applications a lot. They are good and all but we have acceptable alternatives for them. Providing a full Office compatible product is quite important and you don't see either bringing that up. Granted, I would love to see an application as sleek as Apple's Keynote or Pages.

They also made hardware that developers wanted to play with such as the airtunes device. Has anyone made an airtunes-like device (airport express) with just F/OSS software. I'd think that pulseaudio could do most of what is needed.

Each of the framework libraries perform a single task well. Yet, they all still integrate together. For example, an application can control external windowing animations. Say that I'm writing a book reader and when the user turns the page I want the page to actually tear off the application window and fly across the screen. This is just not possible in a practical way today. Now that X has compositor support, shouldn't it be available to the application to provide custom control? I would love to make Marina have a native newspaper interface and do exactly that. This is just an example, many facets of the system layer need fresh innovation.

There are tools to write to make our daily lives easier. Streamlining development will only make our time-to-market sooner.

How is Microsoft reacting to the marketing campaigns from Apple? They have a few failed attempts at using celebrities such as Seinfeld. But more recently, are the "Laptop Hunters" ads. These are quite funny as you will notice they get laptops that don't match what they claimed to have wanted at all. Most importantly, though, they are attacking Apple on price and trendiness. I guess they tout gaming on PC's too. Gaming, however, is a strange problem since the total market share of PC gamers relative to PC users is quite small. It's also shrinking as the Xbox, Wii, and PS3 continue to expand their coverage. Regardless, they are both beat on price.

Additionally, I thought the slogan "Life without walls" was funny since without walls you can't have windows.

Many pundits, myself included, have talked about how netbooks can totally change the game. The iPhone was similar in the phone market. Do you think it would have been as successful without the developer platform and thousands of applications?

So finally, how can we replicate the positive results Apple had? What is missing from our platform today (can linuxhator kick our asses into shape)? What are our weaknesses (and how can we fix them to become strengths). What story do we have to tell developers? What do we really enjoy about our platform?

Ethos 0.2.0

I just shipped the bits and release email for Ethos 0.2.0. Ethos is a shared library I put together for adding plugins to your application.

Currently, Ethos is focused on plugins for applications written in C or Vala since they provide the common denominator to enable the largest variety of scripting languages. However, that's not to say you couldn't provide a thunk layer for something else.

Pugins written in C, Vala, Python, and JavaScript are supported. However, Mono support is mostly finished and will be added in a follow up release. Each plugin language is supported through a separate shared library, so your process will not be polluted with languages that are not being used.

I'd like to thank the giftwrap hackers for helping me dog-food and get ethos ready for an initial release.

# dronelabs.com ppa for releases
deb http://ppa.launchpad.net/audidude/dronelabs/ubuntu jaunty main
deb-src http://ppa.launchpad.net/audidude/dronelabs/ubuntu jaunty main

Short on talk, Long on screenshots

I branched Thomas Wood's GObject generator code the other day and started adding some desired features.

  • Select a license including LGPL-2, GPL-2, MIT-X11, Apache 2.0, or no license
  • Generate gtk-doc in-code documentation
  • Generate and install GObject properties including proper switch casements for basic glib types
  • Generate getter and setter methods for GObject properties
  • Generate methods and include guards in the stubs (such as g_return_if_fail)
  • Generate and install GObject signals including default handlers in the class VTable
  • Write coding using a Dialect class which can be inherited for adding different code styling or output language
  • Installable using pythons distutils

You can see some sample output as foo-person.c and foo-person.h.

git clone git://git.dronelabs.com/gobject-gen
cd gobject-gen
git checkout -b codewriter origin/codewriter



Planet GNOME Introduction

Like Paul, I would also like to thank Jeff, Lucas, and Vincent for adding my blog to Planet GNOME.

I, too, have been a long time GNOME user. Until a month or so ago, I had been working at MySpace.com on various open-source projects including Mono. Before that, I was a Medsphere employee working on their Gtk# medical application. Recently, I quit my job at MySpace to spend as much of the summer as I can writing free software from my apartment on the beach the Santa Monica, California.

While I'm not going to mention in much detail the projects I'm prototyping over the summer (I prefer to show code, not talk) readers may be interested in the following.

Iris

Iris is a toolkit to help programmers write applications that take advantage of concurrency on multi-core platforms. Users of python-twisted will feel at home with IrisTask. It provides much of the same callback/errback concepts. Users of of Apple's NSOperation will find it comfortable as it provides object inheritance as an option. And finally, users of CRR will enjoy powerful message passing in which upon it is all built.

It includes a sprinkling of lock-free data-structures and a work-stealing scheduler that is roughly 8x faster than GThreadPool in my test-cases performed on quad-core and dual quad-core machines. Bindings are in the works for JavaScript, Python, Vala, and Mono. One of my goals is to have fully asynchronous applications that cross the language vm barrier.

Ethos

Ethos is a LGPL-2 library for adding plug-ins to your application. It is modeled similar to the plug-in infrastructure of GEdit since they have had incredible success at converting users into application extenders.

Ethos has two main purposes. First, it should simplify the effort required to have plug-ins in an application. But more importantly, it should give a consistent way to add plug-ins for any application using it. If we can leverage the application scripters for one application for much more of the desktop, I think we can see applications get new features faster. Ethos includes bindings for JavaScript, Python, and Vala. Mono bindings are almost complete.

There is an additional library, ethos-ui, which provides a re-usable GtkWidget for managing plug-ins during runtime.

Marina

Marina is a RSS and Syndication reader I started writing a while back when the Liferea authors mentioned wanting to do a rewrite. Unfortunately, I had gotten so busy with work and other projects that I had to put it on hold for a little while. I just started updating it to use Iris and Ethos so I have a practical test application.

Storage for marina is currently done in BerkeleyDb for better or worse. The positive is that we can use DB_RECNO keys in the b-tree resulting in row-offset to record lookups. It uses my BdbListStore which includes a LRU for fast access when attached to a GtkTreeView. This turns out useful as it allows the consumer to dial memory consumption vs. speed.

So those are my primary projects, feel free to look around my blog. There are plenty of posts on writing custom gtk+ widgets and what not. I'm on github here and you can follow me on twitter here.

Ethos - Simplifying Plugins for GNOME Applications

Plenty of desktop applications have extension frameworks so that users may extend functionality for their own purposes. However, many seem to invent their own plug-in system or use a framework tied to the applications source code language. For example, there is pkg_resources for python and Mono.Addins for mono. Both are great tools when confined to their respective language. Where they do not fit well is in applications that want to simultaneously enable various language communities such as JavaScript, Python, Ruby, .NET, C, and Vala.

I really liked the plug-in system from gedit, but it is licensed under the GPL. It would also require a lot of changes to be re-usable outside of gedit. I should know, thats what I did in my syndication reader prototype, Marina. The framework is simplistic and has been incredibly successful at converting users into application extenders. If GNOME desktop applications can provide a single framework for extending applications, the barrier to entry becomes much lower for "repeat extenders". We instantly re-purpose those application scripters into new applications.

Therefore, I put together a library named ethos which contains a system similar to gedit's but under an LGPL-2.1 license. It also supports multiple plug-in languages including Python, JavaScript, Vala, and C. Mono support is partially there as well. I need to learn the ruby-gnome2 framework so I can add ruby support.

In addition, there is the ethos-ui library which provides a reusable Gtk widget for configuring plug-ins during runtime. The lovely screenshot above demonstrates this widget. In it, you will see plugins written in C, Python, JavaScript, and C# all living within a single process and virtual-machines co-operating in harmony.

git clone git://git.dronelabs.com/ethos

iris Python Bindings

I started to polish up the generated python bindings for iris today. They aren't perfect but starting to come together. Compare to the Vala example to see how it's similar.

  1. from iris import *
  2. s, e, c = WSScheduler(), Port(), Port()
  3. Arbiter.coordinate(
  4. Arbiter.receive(e, lambda msg: doStateChange(msg), scheduler=s),
  5. Arbiter.receive(c, lambda msg: doHttpRequest(msg), scheduler=s))
  6. # ...

Steve Bjorg of MindTouch also implemented a Work-Stealing scheduler for .NET in the MindTouch Dream library. I'm pretty sure it runs on Mono too. You can take a look at it here.

I’m out! *expletive deleted*

During my long vacation from work, I thought about the current state of MySpace and whether or not I wanted continue my tenure there.

I had been hacking full time on an open source project there for the last year (a project I've been a part of for years now, in some sort or another). After succeeding in making this open source project a viable opportunity for the company, we ended up not moving forward with it. It's too bad, I was doing some of the best work of my career. I'm sorry if this sounds vague, but if I get the go ahead, I'll share more later.

So, instead of going back to work today, I decided to quit. I'm so happy I made this decision that I have the biggest grin on my face right now. I don't know where I'm going next, but I'd really like to enjoy the summer here in Los Angeles without any responsibilities. Besides, living at the beach means every day is a vacation.

Fortunately, I had plenty of time to experiment with multi-core optimization and decent freedom to use the languages of my choice. Hopefully the python, C and C# I left behind will be maintainable by others :-)

One of the projects I've started in the mean time is cache system. Expect its initial release to have:

  • Transparent cache, cache misses will fetch the data from your storage tier
  • Python, Javascript, Ruby, C, Vala, and C# support (on both the cache servers and load balancer)
  • Cache object versioning and intelligent merging
  • Related cache data locality, related data gets cached together (I'll write in the future why this is so important for smooth scaling)
  • Manhole like twisted for administrating a live cache
  • Dynamo style data mirroring

So to my fellowed MySpacer's: For those about to rock, I salute you!

Iris Update

Today I finished off a couple more bug fixes for Iris. It now has the Coordination Arbiter which is similar to CCR's Arbiter.Interleave(). I'm quite happy I ventured down the path of doing GTask based on message passing. It's already been much easier to find bugs and triage them.

So like the previous post, you can use the coordination arbiter with the work stealing scheduler for extra hotness.

So what does the Coordination Arbiter do? It allows you to manage the delivery of messages (which culminate in a executable work-item) from multiple sources based on their concurrency requirements. The arbiter can control message delivery from 3 types of message receivers.

  1. Exclusive -- Only one of these can be executed at a time
  2. Concurrent -- Up to your desired concurrency level can be executed at a time
  3. Teardown -- Think of this as a dispose, everything going forward stops

No messages from different groups can be executed simultaneously. This can help you write your code without having to manage lots of locks (granted the arbiter has the locks, but at least you can tune in a single place).

Lets take a look at a quick vala snippet.

  1. Arbiter.coordinate (
  2. // the "exclusive" receiver
  3. Arbiter.receive (
  4. null, // new WSScheduler (),
  5. (msg) => {
  6. if (msg.what == CONFIG_PORT)
  7. this.port = msg.get_int ("port");
  8. }),
  9. // the "concurrent" receiver
  10. Arbiter.receive (
  11. null,
  12. (msg) => {
  13. // process some http request or whatever
  14. // kids are doing these days
  15. }),
  16. // the "teardown" receiver
  17. null);

I always liked the gen_server behaviour in Erlang OTP. So my plan is to add IrisService which helps you write re-usable, concurrent, application services.

For those interested, the task implementation uses the coordination arbiter with a single exclusive receiver. It makes it much easier than managing locks within the task code.

Also, you can follow me on Twitter and fork the code on github.

Vacation! and Work-Stealing Schedulers

For the first time in years, I'm taking a vacation for something other than a holiday. It already feels great. Also, I decided to shave my head in preparations for summer.

I've been busy working on a new prototype that will eventually become the future of libgtask. Currently, it's prototype name is iris. I haven't decided if I'll continue using the name gtask since many of the new data structures would cause obvious collisions within GLib.

At the core of iris is message passing and basic concurrent constructs. There many reusable lock-free data structures such as:

These types of data structures are always interesting when you don't have a garbage collector. Typically, you end up using free-lists to prevent dereferencing freed memory in the case of a poorly timed pre-emption.

However, where things get really fun are when you run into the ABA problem. You end up doing fun things like using the lower 2 bits of properly aligned pointers to store version information. You can see my hack'ish attempt at that with gstamppointer.

I just started working on this a few weeks ago in my time away from work and it's been incredibly fun. The work-stealing scheduler landed today and it's quite fast for my very few test cases. It performs especially well in recursive work-loads, which happens when work-items yield new work-items. This use-case tries to hit the fast path for the work-stealing queue which pushes the work item onto the local threads queue. The placement is also important so the active thread can try to process it while the item is still hot in the cpu's cache.

There are currently three scheduler implementations. The default, IrisScheduler, uses a GAsyncQueue at its heart for dispatching. I know what you're thinking, and you're right, it has a lot of contention between threads. In addition is a lock-free scheduler, IrisLFScheduler, and the aforementioned IrisWSScheduler. The lock-free is not ideal, but it does have a specific task set its good for (lots of work-items created from outside of a scheduler thread and no regard for power consumption).

So lets quickly look at a simple use-case for the recursive work-loads and compare schedulers. The test consists of 1,000 messages being created, dispatched, scheduled, and executed. However, each message of these 1,000 generates in turn 1,000 more messages to be created, dispatched, scheduled, and executed. So 1 million in all.

Here is the result of the test with the default scheduler. Not fast enough if you ask me. My goal is millions of messages per second per core. (Well to be more specific, its millions of callbacks per second per core, but it all starts from a message, so yeah.)

chergert@chergert-desktop:~/Projects/iris/examples$ time ./recursive
** (process:6911): DEBUG: Done pushing items
** (process:6911): DEBUG: Waiting for items to complete
** (process:6911): DEBUG: Signal received, all done

real 0m3.062s
user 0m2.664s
sys 0m3.176s

So now we will change just one line of our code to use a different scheduler, the work-stealing scheduler.

  1. //scheduler = iris_scheduler_new ();
  2. scheduler = iris_wsscheduler_new ();

And again, run our use-case.

chergert@chergert-desktop:~/Projects/iris/examples$ time ./recursive
** (process:7491): DEBUG: Done pushing items
** (process:7491): DEBUG: Waiting for items to complete
** (process:7491): DEBUG: Signal received, all done

real 0m0.336s
user 0m1.180s
sys 0m0.048s

Ah, the beauty of a multi-core box that actually uses its cores in a nice, evenly manner. So we went from 3.062 seconds to 0.336 seconds. A 9.1x speedup, not bad for a single line of code!

For those that are curious, the numbers for this use-case are on a quad-core Intel Q6600 @ 2.40GHz. I've run the test on a low-voltage dual-core, with similar results if you take into account there are only 2 cores instead of four. The 8-core box I tested with had similar results to the quad-core, but just a hair slower. That could be from either a bit more contention for me to solve, or the fact that its a slower clock-speed.

So if you had any doubts, there you go. A bit of proof on how killer lock-contention can be. And if you are wondering what it is I'm going to be writing that requires all this, you'll have to wait a bit :-)