A recent client of mine needed an app to help him build bite-sized CSV files
from a large PostgreSQL table. The problem was simple enough and it takes
little time to write a simple Rails action to query a table, generate CSV from
the objects in memory and flush it out to the client. Writing an app to do this
one thing using a traditional Rails action is a matter of just a few hours.
But our client wanted to run queries that could potentially return 10, 20 or
even 100 thousand records. When dealing with large numbers of records,
performance can suffer because the application has to spend a lot of CPU time
taking all those in-memory records and transforming them into a bunch of
in-memory strings for the CSV file. Doing this in the application and entirely
before sending the response means the app consumes a lot of memory and a lot of
CPU time. Eventually, these responses would come through, but when you start
talking about 30+ second response times, you can run into trouble from both
users who don’t want to wait so long for responses and application environments
where resource use and extended response times are unacceptable or maybe even
disallowed.
Since I was querying a pretty large Postgres table (200M+ rows) with a fairly
involved query type (geographic proximity), I spent a lot of time debugging the
query before realizing the problem was really in my own app. After I realized
what was going on, I set about looking for a better way to build the CSV and
send it to the waiting client. I found two things. First, I found that Postgres
can directly generate CSV from any query and stream it back on the socket. And
second, I found that Rails can stream the response coming from Postgres,
directly to the end-user waiting on the other end of the HTTP connection using
ActionController::Live.
Here’s how it works. I’ve taken all the application-specific content out of
here so you can more clearly see this technique:
class SearchController < ApplicationController
include ActionController::Live
def run
response.headers['Content-Disposition'] = 'attachment; filename="filename.csv"'
response.headers['Content-Type'] = 'text/csv'
conn = ActiveRecord::Base.connection.instance_variable_get(:@connection)
conn.copy_data "COPY ( #{query} ) TO STDOUT WITH CSV HEADER;" do
while row=conn.get_copy_data
response.stream.write row
end
end
ensure
response.stream.close
end
end
To tell Rails you want to stream responses from this controller, include
ActionController::Live at the top. Basically, this tells Rails you want to use
chunked encoding for your HTTP responses. And in your action, your response
object now has a special stream property which is an IO-like object
representing the outward-facing HTTP response. Anything you write to the stream
is sent immediately to the user agent.
That’s why it’s important to set any headers you need to set before writing any
of the response body. In this case, I’m using the Content-Disposition header so
browsers know to treat the response as a file download.
I am using a bit of a hack to grab hold of the raw Postgres connection
underlying the ActiveRecord connection because I don’t know a better way.
copy_data is a method the Postgres gem provides which invokes an SQL COPY
command. It typically would copy query results to a file, but since I’ve
specified “TO STDOUT” I’ll be able to read the response right here from the
Postgres connection using get_copy_data. As a bonus, I can ask for the results
in CSV format and not have to worry about converting it myself. Now that
Postgres is generating CSV for me, all my action needs to do is read the lines
from the Postgres socket and write then to the HTTP response stream.
The results shocked me. Queries for even large amounts of data were
imperceptibly fast. The download starts so quickly its not even worth measuring
and the data transfer bottleneck is certainly my own middle-tier cable Internet
connection.
Continue reading »
My battle with Errbit performance is over for the time being. This concludes an
effort I began in June to improve throughput on error inserts and error
searching as the database grows over time. If you’re interested in reading
about the effort leading up to this point, here are the related posts:
The short version of the story is that I tried all kinds of ideas, but failed
to notice the actual improvements due to an issue with the special purpose test
rig I created specifically for
measuring these improvements. Once I found and fixed the test rig issue, it was
clear that my efforts had paid off. Unfortunately, it was not clear which ones
had the biggest impact because I had hoped this final post would be an
evidence-based exploration into where the performance issues lived.
Instead, I only have evidence that the sum total of my effort lead to a real
and measurable performance impact and I can speculate as to where the largest
gains were had. But first, let’s look at the overall impact in the two areas
where we have data, starting with error insertion:
My best guess as to why we’re seeing this improvement is twofold. First, the
number of mongo queries required to insert an error are down to a minimum of
five rather than a minimum of nine. Secondly, inserting an error no longer
requires instantiating a Mongoid document for every line in the backtrace. In
fact, the model representing a backtrace line no longer exists at all. There
could be other explanations, but I’m satisfied with the results as they are.
Although I’d like to know where the improvements came from, I’m not inclined to
spend the time to figure it out at this point.
Next, we’ll look at error searching:
This is exactly the kind of result I was looking for. What began as seemingly
linear performance degradation now looks a lot more like constant time. It
isn’t actually that good, but the performance degradation between zero and 100k
records is barely perceptible.
I’m convinced the meat of this improvement came from switching to mongo’s
built-in full-text search
mechanism.
It makes sense that using mongo’s full-text search implementation would be much
more performant than doing a multi-index string search.
Hopefully our users will notice and appreciate these performance gains. Keep in
mind, the results shown above ran with a single thread and a single process.
Depending on your hardware, you should get better throughput in a real
deployment by running multiple processes and hopefully multiple threads in the
future.
Continue reading »
Last week, I continued my
exploration into Errbit slowness. I eliminated a few ideas and added a few more
conjectures to the list of optimization targets. While testing some ideas, I
began to suspect that my test
rig was not working as designed.
There was a flaw in the way I was using
Typheous that was causing the test
process to run slower than it should and even worse, report inaccurate
statistics.
So I reworked the test rig to manage its own threads and take its own request
time measurements. The test process now runs much faster and on v0.4.0 reports
response times that are a little less than half of what I had logged
previously.
Testing against my latest
mongoid5
branch, where I’ve been accumulating performance optimization ideas reveals
that performance has actually improved significantly. It’s unfortunate that
because my test rig had been inaccurate as I was making changes, I can’t easily
show which changes correspond to how much of this improvement.
My focus up until this point has been mainly in creating error reports, and
these numbers reflect a pretty significant improvement. It’s nice to finally
see some real improvement after all this research.
Continue reading »
Round One
One of the first things I did when tracking down performance issues was
motivated by a lot of experience with Rails apps. A very common issue I’ve seen
with Rails applications comes down to a simple misuse of active_record. Errbit
doesn’t use active_record, but it does use Mongoid which presents something
rather like an active_record interface for mongo-powered applications.
After spending some time with Errbit and some cursory analysis, I suspected the
performance problem was in the data layer (making unnecessary and expensive
queries). So I set out to record all mongo queries made while inserting
notices.
Here’s the list of queries I collected from one notice insertion:
namespace=errbit_development.apps selector={"api_key"=>"acae731a1bd8cd2b0c0946f2a952027b"} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 31.7640ms
namespace=errbit_development.backtraces selector={"fingerprint"=>"a6e12c6962939b955c0bf325c0cff0936ae0e98f"} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 2.1269ms
namespace=errbit_development.backtraces selector={"_id"=><BSON::ObjectId:0x70012658875760 data=55748bb163726f67e4000004>} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 1.8985ms
namespace=errbit_development.errs selector={"fingerprint"=>"ece6d74c1056b16c50ed3ba2f03dfc0bf6e5e7d2"} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 0.4275ms
namespace=errbit_development.$cmd selector={:insert=>"notices", :documents=>[{"_id"=><BSON::ObjectId:0x70012660942640 data=5574c1a963726f7f89000000>, "message"=>"Whatever the mind of man can conceive and believe, it can achieve", "error_class"=>"ArgumentError", "backtrace_id"=><BSON::ObjectId:... flags=[] limit=-1 skip=0 project=nil | runtime: 0.6309ms
namespace=errbit_development.problems selector={"_id"=><BSON::ObjectId:0x70012665027820 data=55748bb563726f67e400008c>} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 0.3846ms
namespace=errbit_development.$cmd selector={:update=>"problems", :updates=>[{:q=>{"_id"=><BSON::ObjectId:0x70012665383440 data=55748bb563726f67e400008c>}, :u=>{"$inc"=>{"notices_count"=>1}}, :multi=>false, :upsert=>false}], :writeConcern=>{:w=>1}, :ordered=>true} flags=[] limit=-1 skip=0 project=nil | runtime: 0.4282ms
namespace=errbit_development.$cmd selector={:update=>"problems", :updates=>[{:q=>{"_id"=><BSON::ObjectId:0x70012665383440 data=55748bb563726f67e400008c>}, :u=>{"$set"=>{"messages"=>{"cd4c2e0346474d0b2eb7f1a630309239"=>{"value"=>"Whatever the mind of man can conceive and believe, it can achieve... flags=[] limit=-1 skip=0 project=nil | runtime: 0.5181ms
namespace=errbit_development.apps selector={"_id"=>"55748b8463726f67e4000001"} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 0.3517ms
While working on these queries, I noticed that Durran Jordan had recently
announced a new chapter of mongoid
development.
Upgrading to the latest version seemed like a nice, easy thing to try. So I did
that hoping to see a cheap improvement in Errbit. If anything, the performance
measurements were slightly worse than with Mongoid 4, but not enough to warrant
rolling back. I decided to stick with Mongoid 5 while continuing the
investigation.
I noticed by looking at this list that some of these queries are in fact
unnecessary. By making fewer queries, it should be possible to improve
performance because each query blocks the running thread from doing any other
work until the response comes in from the socket.
After a bunch of work, I was able to reduce the query count from nine to five:
namespace=errbit_development.apps selector={"api_key"=>"3777b363b3c03bc4e146f1e2523bf70b"} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 0.4294ms
namespace=errbit_development.$cmd selector={:findandmodify=>"backtraces", :query=>{"fingerprint"=>"b8ff41db0f445d80a3a9c3ec91a2ecd9a06b1b56"}, :update=>{"$setOnInsert"=>{:fingerprint=>"b8ff41db0f445d80a3a9c3ec91a2ecd9a06b1b56", :lines=>[{"number"=>"425", "file"=>"[GEM_ROOT]/gems/activesupport-... flags=[:slave_ok] limit=-1 skip=0 project=nil | runtime: 2.0866ms
namespace=errbit_development.errs selector={"fingerprint"=>"155745cfe8d1e28f7485ce3236da5438f11d49df"} flags=[:slave_ok] limit=0 skip=0 project=nil | runtime: 0.4470ms
namespace=errbit_development.$cmd selector={:insert=>"notices", :documents=>[{"_id"=><BSON::ObjectId:0x69947441683300 data=557e0c9b63726f35570004c7>, "message"=>"Life is about making an impact, not making an income", "error_class"=>"ArgumentError", "request"=>{"url"=>"http://example.org/verify... flags=[] limit=-1 skip=0 project=nil | runtime: 0.3948ms
namespace=errbit_development.$cmd selector={:findandmodify=>"problems", :query=>{"_id"=><BSON::ObjectId:0x69947441715500 data=557dfdd763726f1d6d000031>}, :update=>{"$set"=>{"app_name"=>"test", "environment"=>"development", "error_class"=>"ArgumentError", "last_notice_at"=>Sun, 14 Jun 2015 23:2... flags=[:slave_ok] limit=-1 skip=0 project=nil | runtime: 0.6351ms
By making use of findandmodify, I was able to combine the operations to find
and modify backtraces and problems from five queries into just two. And I was
able to shave one more query from the end by caching the result of querying the
app record.
Expecting this to make a big difference in performance, I took some new
measurements. But in my next set of tests, I found no measurable impact.
Round Two
Not to be deterred, I kept playing with Errbit until I noticed something
interesting that I hadn’t before. While running my tests, I saw that the
request log seemed to flow in bursts, even though my server is running a single
process and a single thread. This burst-pause-burst behavior made me suspect
Ruby GC was somehow involved, so I instrumented Errbit with
gc_tracer to see if I could confirm.
Here’s a look at the server log during this test where I print GC!!!!!! on
every GC event:
$ bundle exec unicorn -c config/unicorn.default.rb > /dev/null
I, [2015-06-14T17:36:21.656538 #27248] INFO -- : Refreshing Gem list
I, [2015-06-14T17:36:24.054633 #27248] INFO -- : listening on addr=0.0.0.0:8080 fd=10
I, [2015-06-14T17:36:24.057564 #27248] INFO -- : master process ready
I, [2015-06-14T17:36:24.058008 #27323] INFO -- : worker=0 spawned pid=27323
I, [2015-06-14T17:36:24.060624 #27323] INFO -- : worker=0 ready
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 1.0742
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0239
GC!!!!!!
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0910
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0273
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0400
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0267
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0253
127.0.0.1 - - [14/Jun/2015 17:36:29] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0266
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.7641
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0310
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0827
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0264
GC!!!!!!
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0952
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0313
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0236
127.0.0.1 - - [14/Jun/2015 17:36:30] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0262
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.7083
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0324
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.1000
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0381
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0689
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0357
GC!!!!!!
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0375
127.0.0.1 - - [14/Jun/2015 17:36:31] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.0266
127.0.0.1 - - [14/Jun/2015 17:36:32] "POST /notifier_api/v2/notices HTTP/1.1" 200 - 0.6870
...
The amount of time spent performing the actual GC is not negligible, but also
not immense. We’re seeing one GC event for every 10 requests. We can tune
Ruby’s GC parameters to change this behavior, but it will come at the expense
of memory. And increasing the time between GC events will likely increase the
time used to perform GC since in theory it should take more time to sweep more
memory space. Running into this many garbage collection events on a regular
basis indicates the process is steadily allocating chunks of memory.
It’s worth pointing out that what we’re seeing in these logs is not inherently
bad. It’s merely indicative of a process that steadily allocates more garbage
collectible memory. Whether or not this is too much depends entirely on the
actual work that the process is doing. In this case, the process is basically
recording one error notification per request and I’m inclined to think there’s
a large amount of unnecessary memory allocation which contributes in a sizable
way to Errbit’s performance issues. Since allocating and garbage collecting
memory takes some amount of time, figuring out where all that memory is going
will be my next priority.
Continue reading »
For some time, I’ve been aware that the performance of
Errbit is not great. But that problem has
so far taken a back seat to Errbit’s other priorities. For v0.5.0, I want to at
least make a dent here and actually do something to improve this situation.
But before I get started, I felt like it was a good idea to put together some
kind of bench test for two reasons. First, I want a record of where this began
so I can see how much progress we make over time. Second, and more importantly,
I need a decision making framework. If I need to make a decision to sacrafice
some functionality for performance, I want to have a handle on how much
relative performance is at stake.
My goal is not to create a perfect or objective or complete measurement of
Errbit’s behavior, but simply to get a sense of how Errbit’s performance
characteristics change as errors are added to its database and as the code
changes over the course of this performance improvement project.
Testing Methodology
To start, I created
errbit_loader with a pair of
load tests for the two most common errbit uses, creating and finding errors.
You start the test by specifying the number of errors to create and search for
in each block (count), and the number of parallel requests to keep open at any
one time (concurrency). In this round of testing, I used count=1000 and
concurrency=8. In each test, errbit_loader runs the create batch, followed by
the search batch and then repeats this pair of tests until stopped.
90% of the errors created by errbit_loader are repeating, so they should be
grouped with other like errors. The other 10% have unique messages which should
cause them to be not grouped with other errors.
I know from experience that performance degrades tremendously with many
millions of errors, but it took more than six hours just to create 100,000
errors in these load tests in my environment. For that reason, I have decided
to stop at 100,000 for now and increase the total count after we make some
progress.
For this round of testing, I am using Errbit
v0.4.0, Ruby 2.1.2, and mongo
2.6.3. Errbit is using its default deployment strategy which is single-threaded
unicorn with three worker processes and preload_app set to true. The test
process, mongo, and Errbit itself are all running on the same physical machine
which is a Lenovo ThinkPad T430s.
In just over six hours (6:02.59), I inserted 100,000 errors, in blocks of 1,000
and completed 100,000 searches. There is a bit of noise in these data between
80-90k because I forgot the tests were running at one point and started using
my laptop for other things. The results should still be usable, though, since
the underlying trends are clearly visible.
This first chart is a total request time overview. Out of each block of 1,000
tests, I’m showing the 95th percentile, with five percent of requests being
slower than each point on the chart. Time to insert errors appears to be fairly
constant as the database grows, while searching seems to grow linearly.
Here’s a closer look at the results for creates. This chart shows the 90th,
95th and 99th percentile.
And here’s the same view of search requests.
My hunch is we can do a lot better than this. Instinctively, it doesn’t seem
like it should take anywhere near a full second to insert an error. I’m not
completely unhappy with the results for search requests, but I think that’s
only because there are still relatively few errors in the database. Once we
speed up the inserts, we should be able to more easily test how performance
degrades as we start getting into millions of errors.
Continue reading »