Navigation
Powered by Squarespace
« Web Automation fun with Selenium RC | Main | Yay Math! Joys and hardships of basic trig »
Saturday
Nov222008

Vite,Vite! Squid Reverse Proxy ...

So http acceleration is something I've been looking to take for a spin for some time now. I've had a lot very good experience with application acceleration using memcached, and I was eager to explore this concept further. A popular option to implement web-acceleration has been Squid, so I thought this would be a nice option to try out first.

Wow.

It was hard! But I got through it. And that's the important thing I suppose. I do have a couple of thoughts I'd like to share on the experience, with the most important being:

Know what you intend to accomplish. You'll find that (at least for a novice user like me) Squid is a difficult service to configure correctly, and can easily make things a lot worse on your server if you aren't careful. So think carefully about what you intend to accomplish by implementing the Squid Web Cache (or really, any other caching strategy). If you run a site that serves a hugh proportion of static content (such as html pages, images, or other media), then a reverse proxy like Squid can dramatically reduce the load on your web-server. This is particularly handy if you either currently handle a high volume of traffic, or especially if you expect to receive a large influx of visitors over a short period of time. If you serve mostly dynamic content, on the other hand, then be aware that you wouldn't reap a proportional benefit by simply running Squid out of the box. You would need to complement your caching strategy with some mechanism to turn your dynamic pages into static, cacheable elements.

Once you have decided (with excellent reason of course) that you must implement a web-accelerator, another key point to consider is : Benchmarking performance before and after the implementation. I can tell you from experience that few things are more frustrating than spending hours tuning and troubleshooting a solution, only to realize once you are done that you have no idea whether you are better off for your efforts. There are simple techniques to hedge against this, such as benchmarking your web-server with a realistic load before you begin.

Stress test your site or application to reveal problem areas or low-hanging fruit that you can target specifically. But bottom line: make sure that you can quantify the benefit of your efforts. So of course, I did none of this, and proceeded to undertake the acceleration of a tiny personal site with no real traffic to speak of. All in the name of science! A couple of quick tips:

  • Webmin is useless for Squid: Don't get me wrong, I love Webmin, but the Squid module is easily one of the most obtuse configuration tools I have ever used. You'll be better off editing the squid.conf directly.
  • Squid.conf is a scary beast: Over 4000 lines long in its default mode, the conf file contains pretty much every imaginable switch and optional mode under the sun. None of which I found useful in setting up my fairly simple cacheing protocol.
  • Beware the version used in online tutorials: There are vast differences in fairly fundamental aspects of configuration between 2.5, 2.6 (stable) and 3.0 (devel). My notes are for 2.6

Here is what I intended to accomplish:

  • Accelerate multiple domain name-based virtual hosts on a single physical machine: This is important, because configuration of Squid for multiple physical web-servers (even in a load balanced setup) appears to be simpler than setting up the proxy on the same machine as your web-server
  • Understand the Squid configuration well enough to setup a transparent proxy at a later time.

Well, so let's get started. Installing Squid is very painless. My server is running FC6, so a quick yum install squid did the trick. As I mentioned earlier, while I have grown to adore Webmin for administering the different services on my machine, the Squid configuration tool requires a thorough and masterful understanding of some seemingly complex issues. The Squid Wiki hosts a set of useful config examples, so I was able to cobble together the bits I needed for my desired setup.

http_port XXX.XXX.XXX.XXX:80 accel defaultsite=www.yoursitehere.com vhost
cache_peer 127.0.0.1 parent 81 0 no-query originserver name=myAccel weight=1
forwarded_for on
acl serverBeach dstdomain .www.shaheebroshan.com shaheebroshan.com
cache_peer_access myAccel allow serverBeach

 

The first line essentially defines this particular Squid configuration as a web-accelerator. The last keyword vhost further instructs Squid to be aware that the webserver is setup with domain based virtual hosts. Note that the setting tells Squid to listen for requests on port 80. Obviously, we will then need to tell the real webserver to listen and serve on a different port. The cache_peer setting points Squid to the real webserver and also provides the reference name for later use (myAccel in this case).

Once the Squid behavior is configured, you must specify the specific domains you wish to accelerate using a set of ACLs. The acl line names the rule, and then specifies the domains to include in the group. I am still trying to better understand the syntax, but it appears that the dot is used to indicate a wildcard match. This configuration format allows users to reach this site using "shaheeb.com or anything I set up at xyz.shaheeb.com etc. Finally the cache_peer_access line specifically allows all requests for the acl group to be allowed. Well that fairly well does it for the Squid configuration.

Now, before you go restarting servers, remember that we have to change the behavior of the "real" web-server before the proxy can take effect. Telling Apache to bind to a different port is as simple as:

Listen 127.0.0.1:81

Now restart the Squid service and the webserver and point your browser to your new speed-demon website. Logs like the cache.log and access.log will provide ample information about the server's behavior, as well as the efficiency of your cache configuration.

Whew! Allright, well go on and give it a try! I'd love to hear more about successfull Squid deployments, particularly in accelerator mode.

In the meanwhile, I'm going to start researching the setup of a Squid cache on a router. DD-WRT has been very good to me, and I'm sure there's a good tutorial out there to set something like this up. Good luck!

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (1)

I don't think my husband is nearly as cool as he thinks he is.

November 23, 2008 | Unregistered CommenterMona Roshan

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>