23Nov

Switchboard: A Reverse Proxy Handler for URL-based Namespacing

Posted by Elf Sternberg as Uncategorized

tl;dr version: We recently replaced our Nginx with an HTTP reverse proxy written entirely in Node.js. We leveraged Nodejitsu’s node-http-proxy, and added our own logic to provide a useful switchboard for several small HTTP servers, each of which “does one thing well,” in the Coffeescript programming language.

There are three scenarios where a reverse proxy is useful. The first is when you have several back-end servers that all do the same thing, and you want to distribute the load across those servers; the second is when you have several different back-end servers on the same box, one per hostname, and you want to dispatch the packet to the correct handler.

The last is when you have several different services that are deployed from several different kinds of webservers, and you want all of those to be fed out of the same address/port combination. For a real-world example, we at Spiral Genetics have our core application server; an authentication and accounting server (“Who are you and do you have any money?”); a media server; an import handler for customer uploads; and a pub/sub/hub that coordinates among these, and publishes events to our web-based UI as events happen on the back-end.

Ngnix buffers requests before passing them on to the back-end service, and it doesn’t speak HTTP/1.0 natively to its back-ends, a requirement for websockets. Both of these limitations needed to be overcome.

My initial research led me to Node-HTTP-Proxy, which did about 90%of what I wanted it to do. The request-response cycle within Node-HTTP-Proxy’s HttpProxy object was exactly what I wanted, with non-buffering of the body of the HTTP request, and automatic upgrading to both HTTP/1.1 and the Websocket protocol as needed.

Node-HTTP-Proxy’s examples came in two flavors: a single route to a single back-end, multiple-routes to hostname-related back-ends, and a how-to example that showed how to route to different back-ends based on URL fragments.

It was this last example, route-by-proxy, that provided the bulk of the intelligence needed for our solution.


Segmentation

There were two things that needed to be addressed: first, since we had multiple back-end servers that would be addressed through a single hostname, segmenting those back-end servers would have to be done on a URL basis. Fortunately, Node 0.6’s request/response cycle pauses when the head of the request has been processed and allows us to process the inbound data in chunks, the same way we’d handle outbound data.

The second problem was that one of our services was “macros of instructions to the data server,” which was easier to store as a set of procedural steps somewhere other than the data server. So we needed a URL re-writer that looked like it was part of the data server’s namespace, but was actually in its own little application.

Node-HTTP-Proxy’s “proxy-by-url” example demonstrates how to do this, for the most part, but here’s what we wanted:

<configure and run the application>= (U->) [D->]
routes = [
    ['http://127.0.0.1:80', ['^/AnimatedSkills/', '^/180minutes/'], "Event Hub"],
    ['http://127.0.0.1:8002', '^/soundpodcast/', "Auth Server"]
]

This is the base configuration: For each destination, we have a list of URLs that we want to go to that destination. You’ll note that the last one in “Auth Server” goes to the home page; it’s a Django application that’s perfectly capable of figuring out if you’re logged in and serving you the application, or demanding you log in if you aren’t.

You’ll also note that the path to “Auth Server” comes before the “Data Server.” It’s a more precise path, and we want it to be analyzed first.

We also wanted to do URL rewrites after the path had been determined. So we’re going to pass that to our switchboard:

<configure and run the application>+= (U->) [<-D->]
switchboard_options =
    rewrites: [['^/soundpodcast/', '/']]

And finally we start the application, passing our switchboard to node-http-proxy’s httpProxy.createServer function:

<configure and run the application>+= (U->) [<-D]
server_options = {}
console.log("Listening on port 8120")
proxy = httpProxy.createServer(server_options, Connect.logger(), switchboard(routes, switchboard_options))
proxy.listen(8120)

The Switchboard

I’m asking three things of the switchboard: efficient look-ups based on the URL’s content, efficient rewrite of the URL to be passed to the back-end server, and a minimal number of proxy handlers to be configured in the meantime.

What I want, ultimately, is to create an array of routes that I will search, in order, for a match. First match wins, and to associate those routes with the appropriate proxy.

Using Coffeescript, let’s put that into a constructor.

<Switchboard constructor>= (U->) [D->]
constructor: (routes, routing_options = {}) ->
    @rewrites = []
    @routes = []

    for route in routes
        target = url.parse(route[0])

        options = {}
        options.target = {}
        options.target.host = target.hostname
        options.target.port = target.port
        options.target.https = (target.protocol == 'https:')

        proxy = new HttpProxy(options)
        proxy.on 'start', (req, res, target) => @emit('start', req, res, target)
        proxy.on 'end', (req, res) => @emit('end', req, res)

Here, for a given back-end, we’re creating a new HttpProxy object that knows the basic information needed to contact the back-end. That information won’t be needed until we make a request, but now it’s cached in the proxy object.

Now, for each route, the second item in each routing table object, we’re going to create a RegExp object for matching, a reference to the proxy we just created, and we’ll keep the comment. (We don’t actually do anything smart with the comment right now.)

<Switchboard constructor>+= (U->) [<-D->]
        for r in (if (route[1] instanceof Array) then route[1] else [route[1]])
            @routes.push
                match: new RegExp(r)
                proxy: proxy
                comment: route[2]

A little syntactic sugar there allows you to pass in plain paths rather than arrayed ones, but that’s about it.

And finally, we need to remember the rewriting options. Again, we care about a regexp for the inbound path.

<Switchboard constructor>+= (U->) [<-D]
    if routing_options and routing_options.rewrites?
        @rewrites = ({match: new RegExp(r[0]), rewrite: r[1]} for r in routing_options.rewrites)

You can’t tell me that Coffeescript’s comprehension’s aren’t cool.

Next, we want to be able to determine if a given request matches our switchboard. That’s a fairly straightforward RegExp match. When we get a hit, we want to return the route object associated with the match. If we don’t get a hit, return null

<Switchboard URL matcher>= (U->)
match: (path) ->
    for route in @routes
        m = route.match.exec(path)
        if m
            return route
    null

And finally, now that we’ve had a match, we want to proxy that match. The handler has all the context information we need.

<Switchboard proxy dispatch>= (U->)
proxy: (handler, req, res) ->
    path = url.parse(req.url)
    req.url = @rewrite(req.url)
    p = handler.proxy.proxyRequest req, res, null

Rewrites are probably more complicated than the proxy itself. We have to get the URL right, but we can’t afford to confuse the HttpProxy, which has its own needs about URLs. The popular convention is to replace \$1 with the first match of the RegExp, \$2 with the second and so on. Here, we use Node’s
url.parse() method to create a URL object, perform a match against our list of rewrites, and if we get a hit, we make a new URL object, copying those from the original, and then mangling the pathname. We transform this new object back into a standard URL for HttpProxy to make sense of, and return it. Otherwise, we return the original URL unchanged.

<Switchboard URL rewriter>= (U->)
rewrite: (ourl) ->
    path = url.parse(ourl)
    for check in @rewrites
        m = check.match.exec(path.pathname)
        if m
            resp = {}
            for i of path
                if path.hasOwnProperty(i)
                    resp[i] = path[i]
            resp.pathname = check.rewrite
            for i in [1..m.length - 1]
                resp.pathname = resp.pathname.replace(@substitutions[i], m[i])
            return url.format(resp)
    return ourl

The @substitutions are pre-cached regular expressions for $1 through $9:

<Substitution regexps for URL rewriting>= (U->)
substitutions: (new RegExp('\\$' + i) for i in [0..10])


And our class looks like:

<Class Switchboard>= (U->)
class Switchboard extends events.EventEmitter
      <Substitution regexps for URL rewriting>

      <Switchboard constructor>

      <Switchboard URL matcher>

      <Switchboard URL rewriter>

      <Switchboard proxy dispatch>

It’s nice and all that we have this class, but for our purposes we’re going to create a single function that pulls it all together, and handles the case where there is no match. And then we’ll export this
function so other people can use it:

<export the switchboard function>= (U->)
module.exports = (routes, options) ->
    switchboard = new Switchboard routes, options
    (req, res, next) ->
        path = url.parse(req.url)
        handler = switchboard.match(path.pathname)
        if handler
            return switchboard.proxy handler, req, res, next
        try
            res.writeHead 404
            res.end()
        catch error
            console.error("res.writeHead/res.end error: %s", er.message)
        undefined

So now our programs become, first, our library:

<switchboard.coffee>=
events = require('events')
HttpProxy = require('http-proxy').HttpProxy
url = require('url')

<Class Switchboard>

<export the switchboard function>

And secondly, our example:

<dispatchexample.coffee>=
Connect = require('connect')
httpProxy = require('http-proxy')
switchboard = require('./switchboard')

<configure and run the application>

As always, this code, in both the Literate Programming original and a fully functional NodeJS/Coffeescript library, is available at my github under elfsternberg/node-http-proxy-switchboard.

3 Responses to Switchboard: A Reverse Proxy Handler for URL-based Namespacing

Jaddy

November 25th, 2011 at 5:03 am

Having had to solve a similar problem this week (i.e. mobile servers behind their carrier’s NAT being publicly available), I found Varnish (varnish-cache.org), which seems to me to do everything you need.

Did you evaluate varnish, and if so: why did you find it not suitable for your configuration?

Elf Sternberg

November 28th, 2011 at 10:20 am

We looked at Varnish, but were unhappy with both the configuration issue and the upload scale issue. It is the latter that seems to kill a lot of proxy servers. Someone recommended Pound to us, but that didn’t work either.

Chuck

August 14th, 2012 at 1:41 am

Hi Elf, I found this blog post after your SO question. Thanks for providing this. Very informative, and useful to me as I’m in a similar position.

To speak for this solution rather than Varnish and Pound, I’d like to add that for some (e.g. me) https is a crucial factor – albeit you could use varnish internally, but then you’d still need a reverse proxy on top of this.

And lastly, not to nitpick, only to make this post perfect:
“Ngnix buffers requests before passing them on to the back-end service, and it doesn’t speak HTTP/1.0”, you mean HTTP/1.1 right?

Thanks again for a nice blog post 🙂

Comment Form

Subscribe to Feed

Categories

Calendar

November 2011
M T W T F S S
« Oct   Dec »
 123456
78910111213
14151617181920
21222324252627
282930