↓ Archives ↓

trading algorithm — how to write and deploy

Through our work on the OptionsHouse API client, we've somehow become known as trading algorithm experts. At least once a week, Branded Crate gets a phone call or email from someone who wants to automate trading activity. To even have a thought like this requires some level of sophistication. Even so, many potential clients aren't aware of what it takes to create and manage a system like this. That's our area of expertise, so if you're considering trading automation, read on to learn more about how we do it.

The very heart of any trading algorithm is the actual algorithm, written using instructions a machine can understand (code). This is mainly what clients think about when they talk to us. The idea generally seems simple at first, but complexities emerge as you begin to consider automation. Without even thinking, clients "just know" to do things a certain way as they execute their trading strategies manually. Computers, on the other hand, don't know anything.

Let's say a client wants to buy N shares of some stock when the current price of that stock is lower than it was at the same time on the previous trading day and sell when the current stock price is higher than the same time on the previous trading day. This is probably a terrible strategy, but ignore that because it can still serve as an example of how and where complexities emerge.

Consider the first condition of our algorithm: buy when the current stock price is lower than it was yesterday. Checking the current stock price is easy. Most of the APIs I've seen have a call specifically for this purpose. You just send in the ticker symbol and get back the current price. But how can the computer know what the price was yesterday at a specific time? Unless an external service can provide this information, the program will actually have to have run yesterday and recorded price information at the right time. Storing data means creating and managing a database and introduces statefulness because database contents (state) affect the output of the algorithm. Adding a database also introduces a new failure scenario that requires special handling because when the database fails, the algorithm should probably stop itself as safely as possible. It is possible for anything to fail.

Now you have a stateful algorithm, constantly recording price information for your target security and it's capable of making the decision to buy according to its instructions. Let's say it does so and buys some stock at 10:00am. Now 10:05am comes and the same buy condition is met again. As a human being, you probably understood the algorithm to cover this scenario and you likely wouldn't buy again. But a computer follows its instructions to the letter. Unless the buy condition is qualified, it will buy every time it sees the stock price lower than it was yesterday. You can qualify the instruction by adding a condition like 'buy only once per day,' but this condition is still ambiguous to a computer. Is one day a trading day? or is it a 24 hour period? Computers are unable to make assumptions, so this level of specificity is required for all aspects of the program. Either way, the algorithm now has to take into account the last time it made a purchase, which introduces more state and more failure scenarios to account for.

There are many other scenarios to consider. You probably want to set some overall limits on how much your algorithm should buy so it doesn't exhaust your entire cash balance and start buying on margin. Even if you want that, you probably still need a limit so the algorithm doesn't start entering orders that your broker can't fill when you're out of money. Likewise, you probably want to limit selling to some extent or you could end up with a short position you never intended. And what should the algorithm do when the market is unexpectedly closed? What if your broker is having issues and failing to respond appropriately to requests? What if your broker returns the wrong current price info? This can happen and it's especially troublesome, but the algorithm can only take input and follow instructions so it needs to be told how to respond.

Speaking of input, algorithms generally require some kind of user input so that program behavior can change without requiring a code change. For this example algorithm, you could probably imagine wanting to change the stock symbol or transaction size among other things. You may also want to send signals to the running program giving it instructions to change modes or stop altogether. These are all program inputs or messages that need to be relayed to your algorithm. For those of you familiar with command line interfaces, this can be accomplished easily via command line arguments. But most of the time, clients need a more user-friendly way to manage their algorithm, and that means creating a user interface and making it accessible somewhere like an internet web address.

In order to do much of anything useful, the algorithm will have to be connected to the Internet, and as with anything on the Internet, there are some important security considerations. In order to access your broker, the program will need sufficient security credentials to do whatever it does. Most brokers I've looked at require your one and only account username and password to access their API. And in order for the program to be remotely manageable, it will have to provide some kind of remote interface. Putting your secret brokerage credentials on a remotely accessible computer connected to the Internet is a security risk to say the least. The risk can be managed to some degree, but that takes some effort and the risk cannot be eliminated entirely.

If you have a trading algorithm, you'll probably want to be able to check on it from time to time. You may want to know if it's running, and if so, what it's doing. The program itself can be made to answer any question about its current state, and it can create audit trails of what it's done in the past. For simple programs, a daily email digest could be sufficient. But more complex programs could require custom reports with charts and graphs, and all that needs to be provided from a remotely accessible interface.

Automated trading algorithms are generally computer programs. In order to run, they need to be installed on a computer somewhere (the host). The host needs to be on all the time, have an accurate and reliable clock, and a reliable Internet connection. Generally, this is not something you should plan on running on your desktop or laptop computer. There are cloud computing companies that provide this exact service at a reasonable price and we recommend using them. But generally, management is not included. Computers and software programs can fail in all kinds of ways. A smart program can recover from many failures on its own, but not all. Some failures will require human intervention to diagnose and fix, either from you or someone you trust.

Since trading algorithms are inherently dangerous things, it is absolutely vital that they be tested rigorously. If your broker provides a paper trading account, that's one good way to test, but it's not enough. All the decision making and trading code must get special scrutiny and extra attention from the developers including peer-review wherever possible. It's important to take all the time needed to ensure the safety of your algorithm to the best of your ability.

These considerations are just a minimum set for anyone thinking about running a trading algorithm for any length of time. There are other considerations like high availability and disaster recovery, but they're not necessarily essential. This may sound like a lot to deal with and it is. But it's still doable and certainly worth the effort if the algorithm pays off.

Roll Your Own PaaS With Flynn

As a developer, I've long struggled with the problem of how to deploy the applications I create. In an ideal world, I could spend all of my energy doing what I do best (building applications), and none of my energy dealing with operations concerns. That sounds like a good reason to have an operations team. But an operations team has the same problem because ideally, the operations team could spend all their time handling operations concerns, and none of their time worrying about how applications were created.

Deploying an application is largely an exercise in defining (or discovering) the relationship between an application and its environment. This can be a tricky and error-prone job because there is so much variety in applications, environments and the people who create them. If everyone involved could agree on an interface contract, we'd all save a lot of time and energy.

This is what PaaS has tried to do. Solutions like EngineYard, Heroku, Google App Engine, and OpenShift have sprung up to varying degrees of success. Of these, Heroku has had the largest impact on the way we think about software service deployment and what PaaS can do. You can find an entire ecosystem of software packages on GitHub designed to make your applications adhere to the tenets of The Twelve-Factor App. And that's a good thing because we're starting to see what life could be like in a world where apps fit neatly into PaaS-shaped boxes.

I was planning to write a lot about why I'm such a big fan of PaaS, and why Flynn makes sense, but I can't do a better job than Jeff Lindsay already did. In case you haven't heard of it, Flynn is a collection of services that essentially comprise a free and open-source PaaS. The project is crowd-funded and written almost entirely in Go. After playing with Flynn off and on for a few months, I was hooked on the idea, and I had to talk with Jonathan Rudenberg to hear more about what his plans are for the future of this project.

Flynn, when viewed as a whole system, is a Heroku-like PaaS that you run and manage yourself. But it's more flexible than Heroku because it's free and open-source. Another nice benefit is you can run services that bind to arbitrary TCP ports like data stores or mail servers. Eventually it will also be available as a paid service. Jonathan didn't get too much more specific on the paid service idea, but did mention you'd be able to bring your own hardware and let the service manage Flynn for you.

Flynn is not a production-ready system at the moment, but that hasn't stopped me from playing with it. flynn-demo is a project that uses Vagrant to launch a local demo of Flynn. Following along with the Vagrantfile, I was able to create a RightScale ServerTemplate to deploy flynn nodes. If you have a RightScale account, you can import this ServerTemplate and try it out for yourself.

Bootstrapping a Flynn node is a pretty simple process thanks to flynn-bootstrap which does most of the work. flynn-bootstrap uses Docker download and run images for each component of the system. Although you do have to provide a compatible Docker environment and pass some configuration to end up with a working system, and that's worth a little explanation.

If you aim to try this out right now, you'll need to install Docker v0.10.0. Flynn requires Docker to support hairpin NAT configuration, which is broken in more recent Docker releases. So we're stuck with Docker v0.10.0 for now. Here's the boot script I used to install Flynn's dependencies on a plain Ubuntu 12.04 EC2 instance and bootstrap Flynn:

The Flynn stack is divided into two logical layers. The first layer (layer-0 or "the grid") is designed to link a cluster of nodes to an abstract set of containerized processes. Flynn provides its own service discovery system called discoverd which is currently backed by etcd, and all services managed by Flynn are registered with discoverd. Flynn-host is the outermost grid component and acts as the glue between flynn and all the containerized processes managed by flynn. When flynn-host starts, it registers itself with discoverd and awaits instructions to do things like start and stop docker containers. You just need to provide it with an IP address that can be used to reach it and the IP addresses of other nodes if you're building a multi-node system.

flynn-host is the Flynn host service. An instance of it runs on every host in the Flynn cluster. It is responsible for running jobs (in Linux containers) and reporting back to schedulers and the leader.

Flynn's second layer (layer-1) takes care of higher level concerns. The controller provides an API to manage Flynn itself and application deployments, strowger handles TCP routing and gitreceive is an SSH server that receives git pushes and funnels them through the buildpack-based build process and on to flynn-host for deployment.

From the outside, you can manage Flynn using the controller API. You can do it on your own, or you can use flynn-cli which is what I'm using. But in order to actually reach the controller, you must give the controller a hostname. Strowger routes HTTP/HTTPS traffic by hostname so that multiple web services can share the same TCP port and the controller is just one of those services.

Lets see what happens when you actually run these bootstrap steps:

Pay close attention to the very last line because it contains all the connection parameters needed to talk to the flynn controller on this server. Actually not quite because there's a bug in this output. It should actually read:

But that's ok because as long as you stay on the rails, it all works great from here. The full Flynn stack is running on a single server and waiting for you to connect and deploy some apps. Here's a quick walkthrough:

Start by adding your public SSH key so gitreceive can authenticate you:

You'll need an app to deploy. Flynn has an example app you can use:

From your app's git repository, run 'flynn create example' which creates an app named 'example' on the remote flynn server automatically and adds the remote server as a git remote named 'flynn.'

Now you're ready to deploy. Just push to the remote named 'flynn' and flynn will deploy your app.

By default, there are no processes actually running your app, so you'll need to add some web workers to run the web service. Just as in Heroku and Foreman, you define your workers in a Procfile. The one for this example app defines only one worker named 'web.' Scale it up to three workers using flynn scale.

Now your app is running, but you still can't reach it until you tell Flynn which hostname should be routed to your application. Add an HTTP route for your new app and you should be able to hit it with curl. Obviously, you'll need to make a corresponding DNS entry as well:

You can see that Flynn has set up three web workers inside three Docker containers and exposed them on sequential TCP ports on the host server. Strowger routes incoming HTTP requests to the workers at random, but having looked at the code, it looks like other routing methods could be added without too much trouble.

Keep in mind, this is a young project with lofty goals, so there are some rough edges. But I'm impressed with what I've seen so far. The Flynn team, and especially Jonathan, has been more than helpful in helping me understand this system. I can't wait to see how this shapes up.

Get SSH Protocol Links Working in Ubuntu+Chrome+Unity

This has been plaguing me for years and I finally figured it out. Thanks to eleperte who created ssh-xdg-open, I was finally able to see what to do. Ssh-xdg-open didn't work for me, but there was enough information available for me to figure out the missing pieces.

Forget about gconftool and you don't need ssh-xdg-open. If all you want is working ssh://protocol links, then just use xdg-mime to set the default application for handling ssh protocol links and create an application handler with the same name as that application.

All this does is launch bash, parse the host from the URL and executes ssh. When ssh exits, it executes bash again so the window stays open. I wrote it this way because you can't count on everything to work all the time and if you don't keep the window open, the error messages will vanish into the ether and your sanity with them.

Is UglifyJS Really Worth It?

Like the rest of the world, RightScale has been moving more and more of its application from the server to the client. That means we've suddenly found managing larger and larger piles of JavaScript. All that JavaScript needs to be delivered to clients as quickly as possible in order to minimize the time customers spend waiting for web pages to load.

So we created a nice little build tool leveraging Grunt which among other things takes all that JavaScript and compiles it into one big blob for each application. In order to make that big blob as small as possible, we use UglifyJS.

Unfortunately, some of our apps are so big that running the uglify Grunt task can take a long time. Ideally, this task would be fast enough to where it could be run at or just before deploying. Fast enough is a pretty subjective term, but we deploy code all the time to production and various kinds of staging systems, so fast enough becomes however long you want to wait for code deploys in addition to the time it already takes. In my case, three extra minutes is not fast enough.

Continue reading →

Upload to YouTube Through Google API v3 and CORS

Do a search on google for "youtube api javascript upload" and you'll get all kinds of results. There are a huge number of ways people try to get around the document same origin policy to make an HTTP request using JavaScript. Lets go through some of them:

You can create a real HTML form and submit it with JavaScript, and you can avoid the page refresh by submitting to an iframe. You can use jsonp to sneak by and load remote JavaScript using a script tag. You can fruitlessly attempt to muck with document.domain. There are all kinds of other crazy hacks people use to circumvent the same origin policy, but they are all either severely limited, or suffer in terms of your ability to control the HTTP request parameters and properly handle the response in failure scenarios.

Another option is to skip the whole idea of submitting your requests directly from the browser to the remote server. You can install your own proxy server on the same domain as your client JavaScript application and make requests to your proxy which then makes the disallowed requests for you because your proxy server isn't governed by the same origin policy. This method gives you full control over the entire process, but setting up and maintaining a proxy server, paying for bandwidth and storage, and dealing with the added complexity might be too expensive and time consuming. It might also be totally unnecessary.

CORS is here to save the day. CORS has existed for a long time, but for some reason (maybe browser compatibility reasons), it hasn't yet caught on in a big way. Many well-known APIs, including Google's YouTube Data API v3 already support CORS. And chances are, the browser you're currently using supports CORS too.

Continue reading →

automated trading via optionshouse api

Trading securities is a dangerous game. It can be difficult to develop a strategy and stick to it in the face of an emotional marketplace that stampedes from one extreme to the other. Sticking to a trading strategy takes time, discipline and serious balls far beyond the capacity of most human beings.

One way rise above the impediments is to encode your strategy into an algorithm and instruct a machine to execute that strategy for you. You can still freak out and pull the plug at any time, but until you do, machines can execute your strategy without hesitation or emotion. Just the exercise of encoding potential trading strategies into machine instructions is enough to spot problems and potential weaknesses.

Continue reading →

in praise of the mundane

It's easy to fall into the trap of feeling special. From our own perspective we seem so original and in many respects we really are unique. Western society rightly encourages us to celebrate the things that make us special. Individuality is virtuous.

In reality, we're much more similar than we are different. The great ideas we have are at best incremental improvements on existing theory. At worst, they're complete plagiarism. Even the problems we face are just as commonplace. Nothing is new. Nothing is special. If you think otherwise, you're only deluding yourself. As the ecclesiast said so long ago:

That which has been is that which will be,
And that which has been done is that which will be done.
So there is nothing new under the sun.
Is there anything of which one might say,
“See this, it is new”?
Already it has existed for ages
Which were before us.
(Ecclesiastes 1:9,10 NASB)

There is nothing new under the sun, and yet we act as though there is. I see this pattern all over the place in the software world. We worry and fret over our silly problems and bite our nails wondering how to solve the same problems that have been solved countless times before. We think, "Here is some problem that is uniquely mine. In fact, this problem is so unique that I must invent a new kind of solution."

Continue reading →

getting started with cloud computing

I was talking to a friend (lets call him Dave) the other day. He had a good idea on how he could run his QuickBooks accounting software in the cloud. By running the software in the cloud, he wouldn't need to ship QuickBooks backup files back and forth to his accountants, he could just launch a cloud instance and let the accountants RDP into the instance and use the software.

It sounds great, but Dave is cheap and he wanted to run this on an EC2 t1.micro and the machine just couldn't handle it. So of course he wanted to upgrade the instance. Being the cloud computing guy that I am, he called me up and asked me how to do it. At first, I thought it was a silly question, and I told him that of course it is impossible to upgrade the memory on a running EC2 instance.

Continue reading →

jQuery Deferreds and the jQuery Promise Method

jqXHR and the Promise Interface

jQuery 1.5 features a brand new mechanism for dealing with asynchronous event-driven processing. This system, called deferreds, was first implemented for jQuery's $.ajax method and so we'll be looking closely at that method.

Continue reading →

hp 02 ciss for my photosmart c7280

Having been thoroughly satisfied with prior HP printer experiences, I made the mistake of purchasing a brand new HP Photosmart c7280. I'm a big fan of these all-in-one devices. I especially like having a WiFi interface, and scanning to a USB disk as opposed to some ridiculous TWAIN protocol is such a great idea it's hard to imagine why some devices still don't support it. But all the things I love about this printer are outweighed by the horrible ink system.

Lets start with the most obvious problem with these ink cartridges. They're way too small, the color cartridges are only 11 mL. I've seen claims that they can yield up to 500 pages. I have no data to argue with that figure, but I can tell you it seems very high compared to what I've seen. Continue reading →