Efficient use of mod_rewrite, part 1
Consider the following mod_rewrite configuration :
RewriteCond %{REQUEST_URI} !^/FR/fr
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ /FR/fr$1 [PT,L]
The need here, is to get anything that wouldn't begin with /FR/fr
from $DOCROOT/FR/fr
(through, possibly, any other module rules, which was important in the setup where these rules were), except if they begin with /common/
or is /favicon.ico
.
Depending on the requests pattern, reordering the RewriteCond
s can be better. Anyways, the case at hand was to massively duplicate this setup over different countries and languages.
Basically, domain.fr/foo/bar
would need to come from $DOCROOT/FR/fr/foo/bar
, domain.co.uk/foo/bar
from $DOCROOT/GB/en/foo/bar
, while anydomain/common/foo/bar
would come from $DOCROOT/common/foo/bar
.
In such a case, you have a lot of possible solutions, each of which has its pros and cons.
- Create a virtual host per domain, duplicating all the rules in each of them.
<VirtualHost *:80>
ServerName domain.fr
RewriteCond %{REQUEST_URI} !^/FR/fr
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ /FR/fr$1 [PT,L]
</VirtualHost><VirtualHost *:80>
ServerName domain.co.uk
RewriteCond %{REQUEST_URI} !^/GB/en
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ /GB/en$1 [PT,L]
</VirtualHost>
(...)Pros: quite efficient for execution time. Cons: Doesn't scale very well for humans : a lot of virtual hosts and rewrite rules to maintain ; if the list of urls that don't need rewrite increases, all set of rules need to be updated.
- Put all the rewrite rules into a single virtual host, adding a condition on the
SERVER_NAME
:
<VirtualHost *:80>
ServerName domain.fr
ServerAlias domain.co.ukRewriteCond %{SERVER_NAME} =domain.fr
RewriteCond %{REQUEST_URI} !^/FR/fr
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ /FR/fr$1 [PT,L]RewriteCond %{SERVER_NAME} =domain.co.uk
RewriteCond %{REQUEST_URI} !^/GB/en
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ /GB/en$1 [PT,L](...)
</VirtualHost>Pros: are there ?. Cons: Doesn't scale very well : a lot of rewrite rules to maintain ; rewrite rules being executed sequencially, the more domains there are, the more checks are done for the last domains ; if the list of urls that don't need rewrite increases, all set of rules need to be updated.
- Same as above, refactoring the common parts.
<VirtualHost *:80>
ServerName domain.fr
ServerAlias domain.co.ukRewriteCond %{REQUEST_URI} ^/common/ [OR]
RewriteCond %{REQUEST_URI} ^/favicon.ico$
RewriteRule .* - [L]RewriteCond %{SERVER_NAME} =domain.fr
RewriteCond %{REQUEST_URI} !^/FR/fr
RewriteRule ^(.*)$ /FR/fr$1 [PT,L]RewriteCond %{SERVER_NAME} =domain.co.uk
RewriteCond %{REQUEST_URI} !^/GB/en
RewriteRule ^(.*)$ /GB/en$1 [PT,L](...)
</VirtualHost>Pros: gives a canonical place for urls that don't need rewrite. Cons: Doesn't scale very well : a lot of rewrite rules to maintain ; rewrite rules being executed sequencially, the more domains there are, the more checks are done for the last domains.
- Use a
RewriteMap
.
Pros: Scales better ; gives a canonical place for urls that don't need rewrite. Cons: a separate file to maintain ; can be tricky to setup.
The main problem with mod_rewrite is that there is no way to use variables or back references in the test patterns. In our case, we'd like to be able to do something like this:
RewriteMap l10n txt:l10n.map # switch to dbm when necessary
RewriteCond %{REQUEST_URI} !^${l10n:%{SERVER_NAME}}
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ ${l10n:%{SERVER_NAME}}$1 [PT,L]
Or
RewriteMap l10n txt:l10n.map # switch to dbm when necessary
RewriteCond ${l10n:%{SERVER_NAME}} ^(.+)$
RewriteCond %{REQUEST_URI} !^%1
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ %1$1 [PT,L]
The map file would have on each line a domain name followed by the corresponding url start (/FR/fr
for domain.fr
, etc.).
Unfortunately, the above setup is not possible. A way around this lack of functionnality is to use some nice perl regexp trick.
RewriteMap l10n txt:l10n.map # switch to dbm when necessary
RewriteCond ${l10n:%{SERVER_NAME}} ^(.+)$
RewriteCond %1%{REQUEST_URI} !^(.+)\1
RewriteCond %{REQUEST_URI} !^/common/
RewriteCond %{REQUEST_URI} !^/favicon.ico$
RewriteRule ^(.*)$ %1$1 [PT,L]
As you can see, only the second RewriteCond
differs. What we really want to test is whether the %{REQUEST_URI}
begins with the proper url start or not. Let's say we're considering domain.fr
and the map gave us /FR/fr
. Following the first RewriteCond
, %1
contains /FR/fr
.
At the second RewriteCond
, if %{REQUEST_URI}
begins with /FR/fr
then %1%{REQUEST_URI}
begins with /FR/fr/FR/fr
. Otherwise, it will only begin with /FR/fr
.
What we can test, then, is whether this %1%{REQUEST_URI}
aggregate contains a repeating pattern at its beginning. This is exactly what the perl regexp does: it captures at least one character at the beginning of the tested string (^(.+)
), and wants to find this captured string again (\1
).
It is worth mentioning that this obviously falls flat when the url start itself contains a repeating pattern (e.g. /fr/fr
instead of /FR/fr
).
As we want the RewriteRule
to work only when our %{REQUEST_URI}
does not begin with /FR/fr
, we negate the regexp, which means nothing will actually be captured, such that %1
in the RewriteRule
will still be the last captured text, from the very first RewriteCond
.
Note that if all the tested url starts are the same length and/or pattern, it may be worth changing the regexp to be more precise (and faster to match or not). Such as ^(/../..)\1
in our present case.
2009-04-14 22:58:43+0900