Errbit Performance Improvement Results

My battle with Errbit performance is over for the time being. This concludes an effort I began in June to improve throughput on error inserts and error searching as the database grows over time. If you’re interested in reading about the effort leading up to this point, here are the related posts:

The short version of the story is that I tried all kinds of ideas, but failed to notice the actual improvements due to an issue with the special purpose test rig I created specifically for measuring these improvements. Once I found and fixed the test rig issue, it was clear that my efforts had paid off. Unfortunately, it was not clear which ones had the biggest impact because I had hoped this final post would be an evidence-based exploration into where the performance issues lived.

Instead, I only have evidence that the sum total of my effort lead to a real and measurable performance impact and I can speculate as to where the largest gains were had. But first, let’s look at the overall impact in the two areas where we have data, starting with error insertion:

My best guess as to why we’re seeing this improvement is twofold. First, the number of mongo queries required to insert an error are down to a minimum of five rather than a minimum of nine. Secondly, inserting an error no longer requires instantiating a Mongoid document for every line in the backtrace. In fact, the model representing a backtrace line no longer exists at all. There could be other explanations, but I’m satisfied with the results as they are. Although I’d like to know where the improvements came from, I’m not inclined to spend the time to figure it out at this point.

Next, we’ll look at error searching:

This is exactly the kind of result I was looking for. What began as seemingly linear performance degradation now looks a lot more like constant time. It isn’t actually that good, but the performance degradation between zero and 100k records is barely perceptible.

I’m convinced the meat of this improvement came from switching to mongo’s built-in full-text search mechanism. It makes sense that using mongo’s full-text search implementation would be much more performant than doing a multi-index string search.

Hopefully our users will notice and appreciate these performance gains. Keep in mind, the results shown above ran with a single thread and a single process. Depending on your hardware, you should get better throughput in a real deployment by running multiple processes and hopefully multiple threads in the future.