Socket hang up crashes fixed with node.js domains

@tomgco and I were hacking late on a new Clock node.js project. The caffeine fueled @tomgco loves pounding browser refresh like a freaking machine gun, then I hear “Oh my web server has crashed!” Developers pummel refresh, it's a fact of life, but it doesn't normally cause the httpServer to crash. Earlier that day we'd upgraded node to 0.8.20 so it didn't take long to turn our attention to the changelog and then on to a tweet that Tom had spotted.

https://twitter.com/nodejs/status/303893363877363712

‘No more leaking memory’; This killer line fills me with mixed emotion. Memory leaks are our new worst enemy since switching to node.js. Sneaking up on us, killing our services at peak times and keeping me up at night reading the dtrace manual. Naturally I’m ecstatic to find there will be less of them, but at the same time, ALL MY NODE APPS ARE LEAKING MEMORY and the fix requires a code change. Dang!

After some googling and testing we confirmed the following fix in 0.8.20 was now causing our development web server to crash:

  http: Raise hangup error on destroyed socket write (isaacs)

Here is the original commit:

https://github.com/isaacs/node/commit/e261156e7386e3d870543bee4218c7f106bfcf22

Pulling down to the stable branch: https://github.com/joyent/node/pull/4775

and found issues were already coming in: https://github.com/ether/etherpad-lite/issues/1541 https://github.com/LearnBoost/socket.io/issues/1160

In case you missed it, this isn’t going to get fixed properly in 0.8

“The proper fix is to treat ECONNRESET correctly. However, this is a behavior/semantics change, and cannot land in a stable branch. So, the full-of-sad bandaid fix is to not put data into the output buffer if the socket is destroyed, and also remove anything that is in the output buffer when the HTTP request sees that it closes.”- issacs

We just needed a ‘bandaid’ on our 0.8 apps and I was actually glad to have good reason to retro fit Domains around our apps.

The Problem

Below is a simple web server that waits 5 seconds before responding. This will error in 0.8.20 when the client connection hangs up.

var http = require('http')

http.createServer(function (req, res) {

  // Wait 5 seconds before responding
  setTimeout(function () {
    res.writeHead(200, {'Content-Type': 'text/plain'})
    res.end('Hello World\n')
  }, 5000)

}).listen(1337, '127.0.0.1')

setInterval(function () {
  console.log(process.memoryUsage().rss)
}, 2000)

console.log('Server running at http://127.0.0.1:1337/')

Running this server pre 0.8.20 you can:

  curl http://127.0.0.1:1337/ & ; sleep 2 && killall curl

Which will kill the connection atfer 2 seconds and you won't see any errors from the server but instead get a memory leak.

Switch to 0.8.20. (We use nave) to quickly switch node versions:

  nave use 0.8.20

Run the server, then connect run the curl oneliner

  curl http://127.0.0.1:1337/ & ; sleep 2 && killall curl

You'll see the server errors and dies.

timers.js:103
            if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^
Error: socket hang up
    at createHangUpError (http.js:1360:15)
    at ServerResponse.OutgoingMessage._writeRaw (http.js:507:26)
    at ServerResponse.OutgoingMessage._send (http.js:476:15)
    at ServerResponse.OutgoingMessage.write (http.js:740:18)
    at ServerResponse.OutgoingMessage.end (http.js:882:16)
    at Object._onTimeout (/socket-hangup/server.js:8:9)
    at Timer.list.ontimeout (timers.js:101:19)

Our Solution

Wrap the request and response in a domain.

var http = require('http')
  , domain = require('domain')
  , serverDomain = domain.create()

// Domain for the server
serverDomain.run(function () {

  http.createServer(function (req, res) {

    var reqd = domain.create()
    reqd.add(req)
    reqd.add(res)

    // On error dispose of the domain
    reqd.on('error', function (error) {
      console.error('Error', error, req.url)
      reqd.dispose()
    })

    // Wait 5 seconds before responding
    setTimeout(function () {
      res.writeHead(200, {'Content-Type': 'text/plain'})
      res.end('Hello World\n')
    }, 5000)

  }).listen(1337, '127.0.0.1')

})


setInterval(function () {
  console.log(process.memoryUsage().rss)
  if (typeof gc === 'function') {
    gc()
  }
}, 2000)

console.log('Server running at http://127.0.0.1:1337/')

Express

If you are using express 3 you can apply a fix like this

var http = require('http')
  , domain = require('domain')
  , serverDomain = domain.create()
  , express = require('express')
  , app = express()

app.get('/', function (req, res) {

  // Wait 5 seconds before responding
  setTimeout(function () {
    res.send('Hello World')
  }, 5000)

})

// Domain for the server
serverDomain.run(function () {

  http.createServer(function (req, res) {

    var reqd = domain.create()
    reqd.add(req)
    reqd.add(res)

    // On error dispose of the domain
    reqd.on('error', function (error) {
      console.error('Error', error.code, error.message, req.url)
      reqd.dispose()
    })

    // Pass the request to express
    app(req, res)

  }).listen(1337, '127.0.0.1')

})

setInterval(function () {
  console.log(process.memoryUsage().rss)
  if (typeof gc === 'function') {
    gc()
  }
}, 2000)

console.log('Server running at http://127.0.0.1:1337/')

We’ve not got this in production yet but this patch looks like it is going to get us by. If you have a better solution please let us know.

Join Us

Come and work for Clock

Clock is made up of bright, hard-working and talented people and we're always on the look out for more. You can browse the current jobs below or follow us @clock for the latest vacancies.

View
Jobs