SyntaxHighlighter

Monday, September 28, 2020

ElasticBeanstalk HTTP Time

Web servers with NodeJS are fast, right? Yep, most of the time. The use case for our system is that users collect data with photos on their mobile app offline until they send their data. It's a nice hand-shake that the data exists on the device until the server receives it, so there is no data-loss and working offline was a requirement for the app. Typically an upload is a big JSON object and some photos. A few hundred K in most uploads. Recently one of our auditors keep a week's worth of data with a lot of photos, and gets a timeout when trying to send the data. 60 seconds and the upload fails. (We don't chunk it into multiple requests to maintain that handshake, and in 99.99% of the cases it is never a problem.)

We won't release a new mobile version for this single user, so the change needs to be made on the servers. Ours use ElasticBeanstalk, which is a fairly standard system setup, in that our traffic follows this flow:

LoadBalancer -> Instances, and each instance NGINX -> NodeJS

Lots of these stages can timeout on your HTTP requests, mostly by having a default. I have a testable HTTP request I made through Chome and watched the results in in the Dev Tools Network window. Initial testing died exactly at 60.

LoadBalancer

We are using AWS, so go to the EC2 service, on the left for Load Balancers, and change the "Idle Timeout." 

NGINX

This could have two places to change, one is NGINX itself (which is really handling all the web traffic) and the other is the NGINX-Node proxy. 

I made these changes in our server { context.

        client_max_body_size 50M;

        client_body_timeout 300s;

        keepalive_timeout 300s;

        send_timeout 300s;

And these in our location /api { (which is a proxy_pass to the nodejs upstream)

            proxy_send_timeout 300s;

            proxy_read_timeout 300s;

            send_timeout 300s;

These NGINX directives are all documented here: http://nginx.org/en/docs/dirindex.html. The main thing is to allow the connection to live with NGINX and upload for a while (the server changes), and that NGINX will keep the connection to node for longer (the location changes). These all had timeouts 30s - 60s.

This worked well! My test action made it to 120s. This increased my previous timeout values, but still not 300s like the settings suggested.

NodeJS

The default Node http servers could have timeouts on two places. First is the default server timeout (120s) and the second is the client request timeout. The server timeout defaults to 0 in Node 13 (and though we are using v12.18 this one was not an issue). Each request can have a different timeout as well. I made our two problematic endpoints have a longer timeout this:

req.socket.setTimeout(5 * 60 * 1000) // 5 minute timeout!

That was the final one. My test call actually completed in 4.3 minutes so the extended timings worked through the full system.