This weekend, I was just checking Google Webmaster Tool for my blog when I found out that the site property is missing. The problem was that Google failed to verify my site because there was no Google Analytics tracking code available (I choose GA tracking method to verify the site in Webmaster Tool).
I'm pretty sure the tracking code has been installed. It must be a bug from Google side, I thought.
I clicked the button to check my site again. To my surprise, it showed an error that my site is failed to load. I opened new tab and enter my site url and all I keep getting was the spinner until a few minutes later Google Chrome displayed an error saying it was connection timeout or something.
That's weird. I use
now to deploy my blog, and I think they'll restart the server if there is an error. I searched the zeit community slack channel and confirmed that indeed,
now will restart my app if it crashes.
But it didn't.
I started investigating this weird behavior. The only clue I have is the latest error reporting from Sentry.
Error: connection timeout node_modules/f13b11854ffeb4f8d2920fd5cedc61d358741cfd/lib/drivers/node-mongodb-native/connection.js in Db.<anonymous> at line 168:17node_modules/f13b11854ffeb4f8d2920fd5cedc61d358741cfd/lib/drivers/node-mongodb-native/connection.js in Db.<anonymous> at line 168:17
I think I know what crashes my server. Somehow, sometimes my (free) mongodb server is unresponsive, causing timeout when trying to access the page that needs DB connection.
The problem is: why didn't
now restart my app after crash?
For making sure it's not now's fault, I tried to reproducing this behavior locally.
now uses something similar to
pm2 to manage services and restart them in the event of an error. I installed both binary in my local machine, and start the server. After the server is started, I kill my local
mongod instance, and then visit the page. Both
pm2 prints the error when the page is visited after
mongod is killed.
Interestingly, neither of them tried to restart the server after the error. Even after turning
mongod back up, and visiting the page again, the server is still crashed.
Here's the server code that connects to db and create an http server.
As you can see, I've already handled the error case by exiting the process if there is an error. So if there is an error, the app will quit and
now will restart it.
Turns out, I only add error handling for initial connection to the mongodb server. Any connection failures that happen after initial connection will crash the server with no way of recover unless it's restarted manually.
Working on a Fix
The fix is quite straightforward, I got these by searching express and mongoose integration. I didn't know that
mongoose.connect returns an object with
connection property that behaves like
EventEmitter. This connection property can listen to various events like
Initial error handling will be moved as callback for
error event. Attaching listener on express is moved as callback for
open event. The only missing handler from previous code is for
disconnected event. This is where we handle connection failures in the middle of request, we simply reconnect to the mongodb server.
Heres the code after the fix:
Honestly, I still don't know why these crashes are unrecoverable. But at least for now, my blog is up and running again. I also added health check monitoring just in case.