Squid Proxy Behind a Load Balancer on AWS

Squid is a proxy software that allows a computer without internet access to proxy through another computer that does have internet access.

Squid is very easy to get setup and the computer that needs internet just needs to specify environment variables called HTTP_PROXY and HTTPS_PROXY which have the value of http://squid.ip.address:3128/

The complication comes in where you need a Squid instance (sitting on an EC2) to sit behind an AWS load balancer. This is usually done for a number of reasons, such as service redundancy, uptime guarantees or even mitigating against maintenance schedules.

Unfortunately if you try and place an Application Load Balancer (the default?) which uses HTTP in front of Squid, then the hostname is stripped out on all Request Headers sent to Squid. You will end up with an error 400 with a message saying that INVALID_URL=0.

This means you are forced to use a TCP Load Balancer instead. The complication however, is that TCP Load Balancers don’t actually work straight out the box as expected.

The trick here is to use a AWS Classic Load Balancer on TCP 3128 to TCP 3128. If you use a standard TCP balancer then it just times out and if you use a HTTP (ALB) balancer, then it will strip the hostname in the GET requests.