Yosemite – August 2009

I go to Yosemite a few times a year because its possibly the best hiking in California. Drew and Jordan went with me and took some amazing photos. There was a large forest fire at the time which made hiking Half Dome impossible due to smoke inhalation.

Regardless, they got some really good photos during our time there. Photos are here.




My take on Tornado and "modern" web servers

I am happy to see Facebook release Tornado, their Python based web-server that was developed to run the FriendFeed service. They have done their best to optimize the stack within the Python process. However, I'm inclined to believe that it is more of the same problem from which many HTTP toolkits suffer.

Since Tornado is written in Python it is typically restricted[1] by the Global Interpreter Lock. The GIL, for short, is a technique often used by dynamic languages such as Python and Ruby 1.9 to simplify development of the languages runtime as well as providing certain features to be implemented safely in the language. For example, it is the GIL that makes multi-swap features atomic in Python (a, b = b, a). The question is "How do we scale Tornado"? The answer is the same as most GIL based systems. You scale using processes, typically, equal to the number of concurrent threads your system can execute.

Update: The previous paragraph reads differently than I intended. Tornado doesn't really suffer from the GIL as its single threaded. What I meant to get across that is if you needed to scale to all the cores of your system you need either threads or processes. Future paragraphs explain this in better detail.

Scaling in this model, however, presents a few challenges. Each process ends up requiring connections to your database and cache tiers. So now instead of N connections you will ultimately grow to N*N_PROCESSES connections. Scaling the number of cores will inversely effect your data tier performance which is typically your bottleneck to begin with.

I think the need to design the system in this direction is ultimately due to the real web-servers letting us down. Tornado handling data in an event-based fashion should not be news in general. Most web-servers have been doing this for a while. Cherokee[2] does it quite well, Apache's mpm-event does it, and Python Twisted's reactor has done it for years. However, all of these systems let your infrastructure developers down in a critical component, the module API.

Once you have a socket accepted you need to do something with it. Web-servers tend to differentiate themselves in this area but I don't think they have gotten it right. Apache provides different models such as MPM which is a combination of worker processes and threads. This was done primarily to work around inefficiencies in massive numbers of threads per process. "Why would you need a massive number of threads" you might ask? It's due to the design-flaw that the module API's are designed around executing synchronously. After the bottom-half (the socket handler) accepts a new request, it is passed to a handler. The handler calls the configured modules to process the request and blocks while this happens. Ultimately, this means you are wasting a thread regardless of your application having the resources to immediately process the request. When your system cannot keep up with requests you create a massive number of threads/processes up to your configured bounds which slows down the threads that actually are able to process a request. Obviously this exacerbates the problem.

What needs to happen is a module API that supports a fully asynchronous and event-based processing of requests from the socket handling bottom-half through the application-based upper-half. A primary example of where this can be useful is exemplified by "partition-based" data such as most user-content systems. If you accept too many requests on a single server for a data partition you will slow down the average response time per request. This is due to the contention you create on your database and/or cache connections. So while you may want to accept the request, you may not want to start processing it until a number of other requests in that partition range are completed.

Once we have switched to a server that can work in this model we will run into the problem of how to handle all those cores efficiently. Thankfully, Apache styled MPM models wont be necessary anymore. If applications can be processed in an asynchronous model then they will be able to yield the thread on IO and other shared resources that are not yet available. Doing so will mean the optimum number of threads will be equal to the number of cores you have to process.

For this to work optimally with language runtimes dependent on a GIL we need to use a little-known technique to have multiple runtimes within a single process. If you have multiple copies of the runtimes shared library each subsequent call to dlopen() will result in new memory zones for the loaded library. In such cases, you can even have different versions of the same runtime loaded within one process (python2.3, 2.4, 2.5, etc). We can load a runtime per core so that each worker has it's own runtime with the GIL disabled (since its single threaded). As I'm sure you noticed this is an implementation detail.

A few things are preventing this from becoming an immediate reality. There needs to be a standard for languages to be able to integrate into the web-server so that shared resources can be stored outside of each runtime. For example, one of our python workers might yield on getting a database connection from the servers connection pool. Now the worker can pause that request until the connection is available and process other requests. This should feel natural to those who have worked in co-routine based systems.

One of the problems that interests me is getting the request from the bottom-half to workers in the upper-half as efficient as possible. Typically this is done with a thread pool. Most thread pools are implemented with a single Queue and various worker threads that pop their future work item off the queue. What research has shown is that this creates a considerable amount of lock-contention as you scale the number of cores. For each additional worker you create a new contender for the lock to retrieve an item from the queue. Nir Shavit et al pioneered a concept called work-stealing which greatly reduces this problem. It consists of a queue per worker and distributing work items in a round-robin fashion among the workers. To prevent worker-starvation, when a thread runs out of work-items, it will look to its neighbors to try to steal an item off their queue. Some fancy footwork is performed to reduce the contention by stealing from the opposite side of the queue and implementing the fast path as lock-free. Additionally, you can choose neighbors that share your die on NUMA. I was so interested in this and the challenges of implementing lock-free algorithms in C (they are easier to implement with garbage collection), that I implemented it in Iris[3], my concurrency toolkit for glib.

The primary thing we've learned from web-servers over the recent years is that to handle high-load scenarios you need to do everything you can to reduce resource usage when forward-motion cannot be performed. By continuing to build web and application frameworks within the container of the synchronous module API we cannot hope to get the massive improvements in performance that we all desire. We need to consider the high-performance web-server as part of our application which means writing HTTP container standards.

[1] I imagine that someone will make this work on IronPython or Google's LLVM based Python.
[2] http://www.cherokee-project.org
[3] http://git.dronelabs.com/iris

Connecting people and Mentoring

Is there anything out there to help connect newer programmers with experienced programmers in GNOME or Linux in general? A big-brother/big-sister type thing. It's quite rewarding to both those learning and those with experience to help each other when they have common goals and passions. It allows for broader dissemination of information, more testers, more eyes to find bugs and a sense of shared vision.

Maybe a yellowpages of people and their interests?

Faster code completion

The python code completion engine in MonoDevelop is now much faster. It's was a simple fix and one that I think gets overlooked too often in these "dynamic" days.

Thankfully, python makes it easy to index information based on a fully-qualified name. Everything can basically be represented in package format such as "xml.etree.ElementTree.ElementTree.isinstance". Of course, the simplest way to find anything matching a prefix is to just use the SQL LIKE operator. It's fast enough, and gets the job done.

Now, where it gets tricky. I need to be able to limit the result set to only show items that do not have a '.' after the prefix of my LIKE query. SQLite does not have a built in function to help us do this, and if it did, it would be the wrong way.

Since the data in each row will never change (other than a complete row update) we know that we have the ability to safely pre-compute information about the row. Remember how the data is in a package format? Each of those '.' indicate a new depth which we can use for indexing.

So the simple fix is to pre-compute the number of '.' in the fully qualified name and simply add your desired depth to the query. Less data returned in your data reader means less memory allocated by the VM, less memory to free and less memory fragmentation. Good times.

The new field will mean your completion database needs to be upgraded. No fear though, that's done for you automatically.

MonoDevelop Python Tutorial 1

I put together a quick walk-through of how to setup the Python add-in for real Python development. You will need a recent check-out of MonoDevelop trunk.

Python add-in for MonoDevelop (Updated)

I wrote a python add-in for MonoDevelop last year. Unfortunately, I have a hard time sticking to a single project and therefore it suffered a bit. However, we are getting ready for feature freeze for MonoDevelop 2.2 and so I kicked my ars in gear to fix a few things.

Namely, the code completion is much more bearable. Most of the other niceties come from general improvement in MonoDevelop over the last year. You can thank the incredibly talented folks that run the project for that.

For readers that don't know what monodevelop-python is about, let me show you a couple of the features quick.

MonoDevelop knows where you are within the file. It provides a couple drop-down menus to help you jump around in large files.

It knows when you have typed incorrect syntax. The python add-in will continually check your syntax as you type so you always know immediately when you have syntax errors.

Like PyFlakes? So do I, so the add-in will also lint through common mistakes and underline those in yellow.

What good is code completion if its limited to your current file? Seriously, how useless was that before? Well, now, if you configure the path to your site modules monodevelop will parse those files too and build a code completion index to help you along. It's great for exploring API's! Unfortunately, it doesn't yet work well against libraries like pygtk where most of the data is provided dynamically at runtime from a shared library.

There is plenty more to do for this add-in and I hope I can continue to have time to improve it. But life is crazy like that, and you never know. I'm really quite interested in what people use for a debugger in python. I'd like to add one, but if "pdb" is the best that's universally available then I'm not sure how awesome it can get. Go ahead and leave a comment on what you would like to see and what debugger suggestions you have.

Oh, and here is a little screencast displaying what the add-in looks like live http://www.youtube.com/watch?v=vW-lNuoSNv0.

Visualizing gtk+ exposures postmortem

As of yesterday, I'm 25. It was a fun day of hiking, drinks, video games, and district9.

In gdkrecord, I added recording of GdkWindow sizes and exposure areas so that I can troubleshoot some pesky drawing problems with a few custom widgets.

And HTML5 Video below or at http://www.youtube.com/watch?v=XdOLjd52R4E.

Usage:

./gdkrecord-tool exposures myrecordfile.out

Gdk Event Loop Profiling

I put together a gtk-module yesterday to record events from the main loop for post-analysis. I'm sure this has been done before. You can find it at http://git.dronelabs.com/gdkrecord/.

GTK_MODULES=gdkrecord ./myprog

By default the output file is pid-prgname.gdkrecord. You can override this with

GDKRECORD_FILENAME=myoutfile

Inside of the tools directory is gdkrecord-tool. It can dump the information in plain text, xml, json, and csv as well as generate graphs like the one below.

./gdkrecord-tool graph myprog.gdkrecord

Which will generate myprog.gdrecord.png. It only adds information on a few event types right now, but I'll continue adding information extractors as I need them.

First prototype

To start following through with my commitment of prototyping I set for myself, I put together a TreeMap widget this weekend. After seeing Miguel's in Moonlight, I had to write one too. Turns out they are pretty fun. It uses cairo and pango for drawing.

Implemented in pygtk here and in C# here.

And Youtube Video Example or screenshot.

On writing throw-away, prototype code

I don't think I do enough quick prototyping. So to help me change that, I'd like to start putting together a "short list" of things I think should be explored. I'm not talking about a full fledged project with mailing lists and forums like typically happens. Just some code showing how things could be; allowing us to see if we like it. And of course, be willing to throw it all away at the end of the day.

1) A FUSE filesystem driver for installing applications. The end result would be where I can drag an application (a .deb, .rpm, or whatever an ISV wants to ship) onto an "Applications" folder that is nothing more than a FUSE driver to hook into package-kit. For extra credit, allow shoving some html/css/js/svg into a .deb or .rpm so that when it is "opened" it shows neat little backgrounds with drag-n-drop support for installation (I'm thinking nautilus+webkit). The .deb/.rpm could even include a meta file for where the upstream source is so the package manager can maintain application updates.

2) Compiz (or Mutter) plug-in that allows applications to alter their state during run-time. It want it to be as simple as:

win = gtk.Window()
...
win.compositor.explode('slow')
or
win.compositor.shrink('fast')

Also, if we go beyond this and allow some of the GL animations to be pushed onto a thread within the application, it will be possible for apps to provide effects that make sense within the context of the task at hand. For example, I'd love gnome-do to have some sort of cute animation when leaving the screen rather than just gtk_widget_hide().

3) Compiz (or Mutter) plug-in that shows Modal windows similar to "sheets" on OS X. As I've been playing with this the last few days (from my gtk+ patch), it's become pretty obvious to me that it's good from a usability perspective.

4) Performance tuning is always a pain in the ass. So I'd like to have a "perfkit" that can encapsulate all the tuning utilities out there. Whether its strace, ltrace, oprofile, systemtap, or some python script your co-worker wrote. This would also give us somewhere to store application-level counters in a sane way. Imagine being able to record input events from GDK and then play them back on subsequent profile runs while you then profile other aspects of the application. Strangely enough, if you slow down the sample rate, you basically have kick ass remote monitoring sans-snmp.

FWIW, I've already started hacking on this but got side-tracked by a few other things (including up some contract work so I can continue to ... you know ... eat).

5) Rails, sea-side, and other pioneering web-frameworks focused a lot on input validation. This sort of thing is missing from the gtk+ toolkit and would be a lovely addition to 3.0. I started on a quick idea around this using two concepts; validators and modifiers. Validators give us a way to check validity of content, and modifiers will modify a series of widgets upon validation failure or success. Source.

Update:

6) The future of web-squared (or whatever they call it now) is all about data. That's quite obvious I'm sure. However, why do I need to retrieve various bits data from a service just to transform it into some other format I want to display. These Web-Services seem arbitrarily limited. Why can't we push small snippets of code to the service to run within a sandbox to format the data as I want. It's really not any different from map/reduce I guess.

So how about server side java-script? It provides a simple language that could have "lazy-loaded" data within the scope of the script. It could be multiplexed into a co-routine framework for time-slice enforcement. The scripts could be pushed as simple HTTP POST's. And even more-so, it's a language that web-geeks already know.

For example, I could have access to all my personal info via a webservice on my server. And within it, "user" would be a lazy loaded variable. Maybe I want to get access to my email inbox with "user.email.inbox.each(function(m) {});".

I'm curious how much bandwidth this would save too.