Elf Sternberg: Coffeescript and Node-Promise, for a simple CGI script

This is a simple little program that I wrote mostly to practice using Coffeescript, Node, and Promises. I have a lot of little web projects going on all the time on my PC, and keeping track of them all is a sometimes difficult task. I wanted to make that task easier.

The basics of the task are simple: get a list of all the ports where I usually drop off a web-oriented project, try to get the home page, and if it's there, try to get the title of out the HTML.

For this project, I used the excellent node-promise library, mostly because its behavior most closely matched that of jQuery's Deferreds library. I also used the scraper library for screen-scraping the HTML; Scraper actually returns a jQuery object suitable for manipulating on the client.

Although Node is famous for being asynchronous, let's face it: there is an order in which some things must be done. In this case, you must get all the ports, then visit every port to get the title, and then print the results. Because I'm going to use Haml, I must also get the template; this can happen in parallel with, well, just about everything else. But we can not display the results until we have all the titles and the template.

Literate Program

As is my usual practice, this article was written with the Literate Programming toolkit Noweb. Where you see something that looks like , it’s a placeholder for code described elsewhere in the document. Placeholders with an equal sign at the end of them indicate the place where that code is defined. The link (U->) indicates that the code you’re seeing is used later in the document, and (<-U) indicates it was used earlier but is being defined here.

The Program

The first step is to get all the ports. Netstat is the cheapest way to do that, and spawning processes and reading from them is something Node does very well.

The only thing of note here is the promise. This object returns a promise that, when resolved, returns the data.

<a href="#NWD1fH8dg-1" name="NW1fH8dg-2a1g46-1"></a><dfn><get ports>=</dfn> (<a href="#NWD1fH8dg-7">U-></a>)
get_ports = () ->
    promise = new deferred.Promise()
    data = ''

    accrue = (d) ->
        data += d

    netstat = spawn 'netstat', ['-anp', '-t', 'tcp']
    netstat.stdout.on 'data', accrue

    netstat.on 'exit', () ->
        promise.resolve(data)

    promise

Once we have the ports, we want to de-dupe them, as netstat sometimes returns duplicates. The de-dupe is trivial in coffeescript:

<a href="#NWD1fH8dg-2" name="NW1fH8dg-2Wcoz7-1"></a><dfn><de-duplicate an array>=</dfn> (<a href="#NWD1fH8dg-7">U-></a>)
dedupe = (arr) ->
    obj={}
    for i in arr
        obj[i] = 0
    for i of obj
        i

For each port, we want to get the title. We want to use the promise so the program will block until done.

<a href="#NWD1fH8dg-3" name="NW1fH8dg-1lIkx5-1"></a><dfn><get titles>=</dfn> (<a href="#NWD1fH8dg-7">U-></a>)
get_title = (port) ->
    promise = new deferred.Promise()
    scraper 'http://localhost:' + port, (err, jQuery) ->
        if err
            promise.resolve [port, err.message]
            return
        promise.resolve [port, jQuery('title').text()]
    promise

We want to get the titles from all of the the ports, and then spew out the results. As this is a CGI program, we want a simple header.

It's that double deferred.when() that makes the difference. when() takes a promise as an argument. deferred.all() takes a bunch of promises and returns a single promise that resolves when all of the promises passed in finish. So here, we're saying all of the get_title() operation, and the get_template operation, must complete before we go on to render the results. Notice how the data that gets returned is the array from the inner deferral and the template.

<a href="#NWD1fH8dg-4" name="NW1fH8dg-1qTuoA-1"></a><dfn><display ports>=</dfn> (<a href="#NWD1fH8dg-7">U-></a>)
display_ports = (data) ->
    <a href="#NWD1fH8dg-5" name="NW1fH8dg-1qTuoA-1-u1"></a><return matched ports>

    <a href="#NWD1fH8dg-6" name="NW1fH8dg-1qTuoA-1-u2"></a><get template file>

    matches =  dedupe(matcher(i) for i in data.split(/\n/) when matcher(i))
    promises = (get_title(i) for i in matches)
    deferred.when deferred.all(deferred.all(promises), getTemplate()), (data) ->
        [data, template] = data
        console.log("Content-type: text/html\r\n\r\n")
        handler = haml(template)
        console.log handler({data: data})

The matcher is just a regular expression check:

<a href="#NWD1fH8dg-5" name="NW1fH8dg-42ae3g-1"></a><dfn><return matched ports>=</dfn> (<a href="#NWD1fH8dg-4"><-U</a>)
matcher = (i) ->
    r = (/^.{20}.*?\:(\d+)/).exec(i)
    if not r
        return null
    r = parseInt(r[1])
    if (r >= 3000 and r < 3099) or (r >= 8000 and r < 8300) or (r == 80) or (r == 81)
        return r
    null

And the template get is equally trivial. dReadFile is an asynchronous read function from node-promise that returns a promise, the resolution of which is the contents of the file.

<a href="#NWD1fH8dg-6" name="NW1fH8dg-3ke9wX-1"></a><dfn><get template file>=</dfn> (<a href="#NWD1fH8dg-4"><-U</a>)
getTemplate = () ->
    dReadFile('layout.haml', 'utf8')

The whole of the program becomes:

<a href="#NWD1fH8dg-7" name="NW1fH8dg-3tKBm6-1"></a><dfn><counter.cgi>=</dfn>
#!/usr/bin/coffee

deferred = require('promise')
dReadFile = require('fs-promise').readFile
spawn = require('child_process').spawn
scraper = require('scraper')
haml = require('haml')

<a href="#NWD1fH8dg-1" name="NW1fH8dg-3tKBm6-1-u1"></a><get ports>

<a href="#NWD1fH8dg-3" name="NW1fH8dg-3tKBm6-1-u2"></a><get titles>

<a href="#NWD1fH8dg-2" name="NW1fH8dg-3tKBm6-1-u3"></a><de-duplicate an array>

<a href="#NWD1fH8dg-4" name="NW1fH8dg-3tKBm6-1-u4"></a><display ports>

deferred.when get_ports(), display_ports

And that is pretty much it. The last line launches the script, and guarantees the process runs in the right order.

Cakefile

<a href="#NWD1fH8dg-8" name="NW1fH8dg-2LCiO7-1"></a><dfn><Cakefile>=</dfn>
exec = require('child_process').exec

task 'build', 'Build the main program out of Noweb', ->
    exec 'notangle -c -Rcounter.cgi counter.nw > counter.cgi', (err) ->
        console.log err if err

xelatex_cmd = ('xelatex counter.tex; ' +
    'while grep -s "Rerun to get cross-references right" counter.log; ' +
    'do xelatex counter.tex;\n done')

task 'docs', 'Build the PDF of this document', ->
    exec 'noweave -x -delay counter.nw > counter.tex', (err, stdout) ->
        if err
            console.log err
            return
        exec xelatex_cmd, (err) ->
            console.log err if err

task 'html', 'Build the PDF of this document', ->
    exec 'noweave -filter l2h -delay -index -autodefs c -html counter.nw > counter_doc.html', (err) ->
        if err
            console.log err

Index

  * _<Cakefile>_: D1
  * _<counter.cgi>_: D1
  * _<de-duplicate an array>_: D1, U2
  * _<display ports>_: D1, U2
  * _<get ports>_: D1, U2
  * _<get template file>_: U1, D2
  * _<get titles>_: D1, U2
  * _<return matched ports>_: U1, D2

Warning

Okay, so this is a fairly simple program. It also requires a ton of stuff be installed in a directory where the CGI is going to be run from, so it exposes a lot of stuff you might not want to expose. Like I said, this was an experiment.

Source Code

The source code is available from GitHub at PortProject. Yeah, it's a boring name. Also, at this moment there is a bug in jsdom (Fix ReferenceError in the scanForImportRules helper function) that causes this script to spew warnings about CSS parsing. Those can safely be ignored (really!).

Elf M. Sternberg

Full Stack Web Developer

Where one teaches, two learn.