.. _tutorial-increasing_your_hitrate:

Achieving a high hitrate
------------------------

Now that Varnish is up and running, and you can access your web
application through Varnish. Unless your application is specifically
written to work behind a web accelerator you'll probably need to do
some changes to either the configuration or the application in order
to get a high hit rate in Varnish.

Varnish will not cache your data unless it's absolutely sure it is
safe to do so. So, for you to understand how Varnish decides if and
how to cache a page, I'll guide you through a couple of tools that you
will find useful.

Note that you need a tool to see what HTTP headers fly between you and
the web server. On the Varnish server, the easiest is to use
varnishlog and varnishtop but sometimes a client-side tool makes
sense. Here are the ones I use.

Tool: varnishtop
~~~~~~~~~~~~~~~~

You can use varnishtop to identify what URLs are hitting the backend
the most. ``varnishtop -i txurl`` is an essential command. You can see
some other examples of varnishtop usage in :ref:`tutorial-statistics`.


Tool: varnishlog
~~~~~~~~~~~~~~~~

When you have identified the an URL which is frequently sent to the
backend you can use varnishlog to have a look at the request.
``varnishlog -c -m 'RxURL:^/foo/bar`` will show you the requests
coming from the client (-c) matching /foo/bar.

For more information on how varnishlog works please see
:ref:`tutorial-logging` or man :ref:`ref-varnishlog`.

For extended diagnostics headers, see
http://www.varnish-cache.org/trac/wiki/VCLExampleHitMissHeader


Tool: lwp-request
~~~~~~~~~~~~~~~~~

lwp-request is part of The World-Wide Web library for Perl. It's a
couple of really basic programs that can execute an HTTP request and
give you the result. I mostly use two programs, GET and HEAD.

vg.no was the first site to use Varnish and the people running Varnish
there are quite clueful. So it's interesting to look at their HTTP
Headers. Let's send a GET request for their home page::

  $ GET -H 'Host: www.vg.no' -Used http://vg.no/
  GET http://vg.no/
  Host: www.vg.no
  User-Agent: lwp-request/5.834 libwww-perl/5.834
  
  200 OK
  Cache-Control: must-revalidate
  Refresh: 600
  Title: VG Nett - Forsiden - VG Nett
  X-Age: 463
  X-Cache: HIT
  X-Rick-Would-Never: Let you down
  X-VG-Jobb: http://www.finn.no/finn/job/fulltime/result?keyword=vg+multimedia Merk:HeaderNinja
  X-VG-Korken: http://www.youtube.com/watch?v=Fcj8CnD5188
  X-VG-WebCache: joanie
  X-VG-WebServer: leon

OK. Let me explain what it does. GET usually sends off HTTP 0.9
requests, which lack the Host header. So I add a Host header with the
-H option. -U print request headers, -s prints response status, -e
prints response headers and -d discards the actual content. We don't
really care about the content, only the headers.

As you can see, VG adds quite a bit of information in their
headers. Some of the headers, like the X-Rick-Would-Never are specific
to vg.no and their somewhat odd sense of humour. Others, like the
X-VG-Webcache are for debugging purposes. 

So, to check whether a site sets cookies for a specific URL, just do::

  GET -Used http://example.com/ |grep ^Set-Cookie

Tool: Live HTTP Headers
~~~~~~~~~~~~~~~~~~~~~~~

There is also a plugin for Firefox. *Live HTTP Headers* can show you
what headers are being sent and recieved. Live HTTP Headers can be
found at https://addons.mozilla.org/en-US/firefox/addon/3829/ or by
googling "Live HTTP Headers".


The role of HTTP Headers
~~~~~~~~~~~~~~~~~~~~~~~~

Along with each HTTP request and response comes a bunch of headers
carrying metadata. Varnish will look at these headers to determine if
it is appropriate to cache the contents and how long Varnish can keep
the content.

Please note that when considering these headers Varnish actually
considers itself *part of* the actual webserver. The rationale being
that both are under your control. 

The term *surrogate origin cache* is not really well defined by the
IETF so RFC 2616 so the various ways Varnish works might differ from
your expectations.

Let's take a look at the important headers you should be aware of:

Cache-Control
~~~~~~~~~~~~~

The Cache-Control instructs caches how to handle the content. Varnish
cares about the *max-age* parameter and uses it to calculate the TTL
for an object. 

"Cache-Control: nocache" is ignored but if you need this you can
easily add support for it.

So make sure you issue a Cache-Control header with a max-age
header. You can have a look at what Varnish Software's drupal server
issues::

  $ GET -Used http://www.varnish-software.com/|grep ^Cache-Control
  Cache-Control: public, max-age=600

Age
~~~

Varnish adds an Age header to indicate how long the object has been
kept inside Varnish. You can grep out Age from varnishlog like this::

  varnishlog -i TxHeader -I ^Age

Pragma
~~~~~~

An HTTP 1.0 server might send "Pragma: nocache". Varnish ignores this
header. You could easily add support for this header in VCL.

In vcl_fetch::

  if (beresp.http.Pragma ~ "nocache") {
     return(hit_for_pass);
  }

Authorization
~~~~~~~~~~~~~

If Varnish sees an Authorization header it will pass the request. If
this is not what you want you can unset the header.

Overriding the time-to-live (ttl)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sometimes your backend will misbehave. It might, depending on your
setup, be easier to override the ttl in Varnish than to fix your
somewhat cumbersome backend. 

You need VCL to identify the objects you want and then you set the
beresp.ttl to whatever you want::

  sub vcl_fetch {
      if (req.url ~ "^/legacy_broken_cms/") {
          set beresp.ttl = 5d;
      }
  }

The example will set the TTL to 5 days for the old legacy stuff on
your site.

Forcing caching for certain requests and certain responses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since you still have this cumbersome backend that isn't very friendly
to work with you might want to override more stuff in Varnish. We
recommend that you rely as much as you can on the default caching
rules. It is perfectly easy to force Varnish to lookup an object in
the cache but it isn't really recommended.


Normalizing your namespace
~~~~~~~~~~~~~~~~~~~~~~~~~~

Some sites are accessed via lots of
hostnames. http://www.varnish-software.com/,
http://varnish-software.com/ and http://varnishsoftware.com/ all point
at the same site. Since Varnish doesn't know they are different,
Varnish will cache different versions of every page for every
hostname. You can mitigate this in your web server configuration by
setting up redirects or by using the following VCL::

  if (req.http.host ~ "(?i)^(www.)?varnish-?software.com") {
    set req.http.host = "varnish-software.com";
  }


Ways of increasing your hitrate even more
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following chapters should give your ways of further increasing
your hitrate, especially the chapter on Cookies.

 * :ref:`tutorial-cookies`
 * :ref:`tutorial-vary`
 * :ref:`tutorial-purging`
 * :ref:`tutorial-esi`

