Efficient use of mod_rewrite, part 1

Consider the following mod_rewrite configuration :

RewriteCond %{REQUEST_URI} !^/FR/fr
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ /FR/fr$1 [PT,L]

The need here, is to get anything that wouldn't begin with /FR/fr from $DOCROOT/FR/fr (through, possibly, any other module rules, which was important in the setup where these rules were), except if they begin with /common/ or is /favicon.ico.

Depending on the requests pattern, reordering the RewriteConds can be better. Anyways, the case at hand was to massively duplicate this setup over different countries and languages.

Basically, domain.fr/foo/bar would need to come from $DOCROOT/FR/fr/foo/bar, domain.co.uk/foo/bar from $DOCROOT/GB/en/foo/bar, while anydomain/common/foo/bar would come from $DOCROOT/common/foo/bar.

In such a case, you have a lot of possible solutions, each of which has its pros and cons.

  • Create a virtual host per domain, duplicating all the rules in each of them.

    <VirtualHost *:80>
    ServerName domain.fr
    RewriteCond %{REQUEST_URI} !^/FR/fr
    RewriteCond %{REQUEST_URI} !^/common/
    RewriteCond %{REQUEST_URI} !^/favicon.ico$
    RewriteRule ^(.*)$ /FR/fr$1 [PT,L]
    </VirtualHost>

    <VirtualHost *:80>
    ServerName domain.co.uk
    RewriteCond %{REQUEST_URI} !^/GB/en
    RewriteCond %{REQUEST_URI} !^/common/
    RewriteCond %{REQUEST_URI} !^/favicon.ico$
    RewriteRule ^(.*)$ /GB/en$1 [PT,L]
    </VirtualHost>
    (...)

    Pros: quite efficient for execution time. Cons: Doesn't scale very well for humans : a lot of virtual hosts and rewrite rules to maintain ; if the list of urls that don't need rewrite increases, all set of rules need to be updated.

  • Put all the rewrite rules into a single virtual host, adding a condition on the SERVER_NAME:

    <VirtualHost *:80>
    ServerName domain.fr
    ServerAlias domain.co.uk

    RewriteCond %{SERVER_NAME} =domain.fr
    RewriteCond %{REQUEST_URI} !^/FR/fr
    RewriteCond %{REQUEST_URI} !^/common/
    RewriteCond %{REQUEST_URI} !^/favicon.ico$
    RewriteRule ^(.*)$ /FR/fr$1 [PT,L]

    RewriteCond %{SERVER_NAME} =domain.co.uk
    RewriteCond %{REQUEST_URI} !^/GB/en
    RewriteCond %{REQUEST_URI} !^/common/
    RewriteCond %{REQUEST_URI} !^/favicon.ico$
    RewriteRule ^(.*)$ /GB/en$1 [PT,L]

    (...)
    </VirtualHost>

    Pros: are there ?. Cons: Doesn't scale very well : a lot of rewrite rules to maintain ; rewrite rules being executed sequencially, the more domains there are, the more checks are done for the last domains ; if the list of urls that don't need rewrite increases, all set of rules need to be updated.

  • Same as above, refactoring the common parts.

    <VirtualHost *:80>
    ServerName domain.fr
    ServerAlias domain.co.uk

    RewriteCond %{REQUEST_URI} ^/common/ [OR]
    RewriteCond %{REQUEST_URI} ^/favicon.ico$
    RewriteRule .* - [L]

    RewriteCond %{SERVER_NAME} =domain.fr
    RewriteCond %{REQUEST_URI} !^/FR/fr
    RewriteRule ^(.*)$ /FR/fr$1 [PT,L]

    RewriteCond %{SERVER_NAME} =domain.co.uk
    RewriteCond %{REQUEST_URI} !^/GB/en
    RewriteRule ^(.*)$ /GB/en$1 [PT,L]

    (...)
    </VirtualHost>

    Pros: gives a canonical place for urls that don't need rewrite. Cons: Doesn't scale very well : a lot of rewrite rules to maintain ; rewrite rules being executed sequencially, the more domains there are, the more checks are done for the last domains.

  • Use a RewriteMap.
    Pros: Scales better ; gives a canonical place for urls that don't need rewrite. Cons: a separate file to maintain ; can be tricky to setup.

The main problem with mod_rewrite is that there is no way to use variables or back references in the test patterns. In our case, we'd like to be able to do something like this:

RewriteMap l10n txt:l10n.map # switch to dbm when necessary
RewriteCond %{REQUEST_URI} !^${l10n:%{SERVER_NAME}}
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ ${l10n:%{SERVER_NAME}}$1 [PT,L]

Or

RewriteMap l10n txt:l10n.map # switch to dbm when necessary
RewriteCond ${l10n:%{SERVER_NAME}} ^(.+)$
RewriteCond %{REQUEST_URI} !^%1
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ %1$1 [PT,L]

The map file would have on each line a domain name followed by the corresponding url start (/FR/fr for domain.fr, etc.).

Unfortunately, the above setup is not possible. A way around this lack of functionnality is to use some nice perl regexp trick.

RewriteMap l10n txt:l10n.map # switch to dbm when necessary
RewriteCond ${l10n:%{SERVER_NAME}} ^(.+)$
RewriteCond %1%{REQUEST_URI} !^(.+)\1
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ %1$1 [PT,L]

As you can see, only the second RewriteCond differs. What we really want to test is whether the %{REQUEST_URI} begins with the proper url start or not. Let's say we're considering domain.fr and the map gave us /FR/fr. Following the first RewriteCond, %1 contains /FR/fr.

At the second RewriteCond, if %{REQUEST_URI} begins with /FR/fr then %1%{REQUEST_URI} begins with /FR/fr/FR/fr. Otherwise, it will only begin with /FR/fr.

What we can test, then, is whether this %1%{REQUEST_URI} aggregate contains a repeating pattern at its beginning. This is exactly what the perl regexp does: it captures at least one character at the beginning of the tested string (^(.+)), and wants to find this captured string again (\1).

It is worth mentioning that this obviously falls flat when the url start itself contains a repeating pattern (e.g. /fr/fr instead of /FR/fr).

As we want the RewriteRule to work only when our %{REQUEST_URI} does not begin with /FR/fr, we negate the regexp, which means nothing will actually be captured, such that %1 in the RewriteRule will still be the last captured text, from the very first RewriteCond.

Note that if all the tested url starts are the same length and/or pattern, it may be worth changing the regexp to be more precise (and faster to match or not). Such as ^(/../..)\1 in our present case.

2009-04-14 22:58:43+0900

p.d.o

Both comments and pings are currently closed.

5 Responses to “Efficient use of mod_rewrite, part 1”

  1. linkfeedr » Blog Archive » Mike Hommey: Efficient use of mod_rewrite, part 1 - RSS Indexer (beta) Says:

    […] part 1 VA:F [1.1.8_518]Rating: 0.0/5 (0 votes cast) This article was found on Planet Debian. Click here to visit the full article on the original website.Consider the following mod_rewrite configuration :… more on the original website Report This […]

  2. Simon Says:

    Erm, wouldn’t “Options Multiviews” do the same thing?

  3. glandium Says:

    Multiviews uses the browser Accept-Language header to select language, not the domain name. And it requires the content to be static.

  4. Shot Says:

    You could always aptitude install libapache2-mod-macro and factor everything out. Something to the point of (note: not tested, even for syntax…):

    <Macro LangRewrite $domain $lang>
    <VirtualHost *:80>
    ServerName $domain
    RewriteCond %{REQUEST_URI} !^/$lang
    RewriteCond %{REQUEST_URI} !^/common/
    RewriteCond %{REQUEST_URI} !^/favicon.ico$
    RewriteRule ^(.*)$ /FR/fr$1 [PT,L]
    </VirtualHost>
    </Macro>

    LangRewrite domain.fr FR/fr
    LangRewrite domain.co.uk GB/en
    …

  5. Shot Says:

    (Aaand you’d need to change the RewriteRule to use $lang instead of FR/fr, of course. Apologies.)