{"id":329,"date":"2009-04-14T22:58:43","date_gmt":"2009-04-14T20:58:43","guid":{"rendered":"http:\/\/glandium.org\/blog\/?p=329"},"modified":"2010-01-27T08:52:24","modified_gmt":"2010-01-27T07:52:24","slug":"efficient-use-of-mod_rewrite-part-1","status":"publish","type":"post","link":"https:\/\/glandium.org\/blog\/?p=329","title":{"rendered":"Efficient use of mod_rewrite, part 1"},"content":{"rendered":"<p>Consider the following <em>mod_rewrite<\/em> configuration :<\/p>\n<blockquote><p><code>RewriteCond %{REQUEST_URI} !^\/FR\/fr<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ \/FR\/fr$1 [PT,L]<\/code><\/p><\/blockquote>\n<p>The need here, is to get anything that wouldn't begin with <code>\/FR\/fr<\/code> from <code>$DOCROOT\/FR\/fr<\/code> (through, possibly, any other module rules, which was important in the setup where these rules were), except if they begin with <code>\/common\/<\/code> or is <code>\/favicon.ico<\/code>.<\/p>\n<p>Depending on the requests pattern, reordering the <code>RewriteCond<\/code>s can be better. Anyways, the case at hand was to massively duplicate this setup over different countries and languages.<\/p>\n<p>Basically, <code>domain.fr\/foo\/bar<\/code> would need to come from <code>$DOCROOT\/FR\/fr\/foo\/bar<\/code>, <code>domain.co.uk\/foo\/bar<\/code> from <code>$DOCROOT\/GB\/en\/foo\/bar<\/code>, while <code>anydomain\/common\/foo\/bar<\/code> would come from <code>$DOCROOT\/common\/foo\/bar<\/code>.<\/p>\n<p>In such a case, you have a lot of possible solutions, each of which has its pros and cons.<\/p>\n<ul>\n<li>Create a virtual host per domain, duplicating all the rules in each of them.<br \/>\n<blockquote><p><code>&lt;VirtualHost *:80&gt;<br \/>\nServerName domain.fr<br \/>\nRewriteCond %{REQUEST_URI} !^\/FR\/fr<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ \/FR\/fr$1 [PT,L]<br \/>\n&lt;\/VirtualHost&gt;<\/code><\/p>\n<p><code>&lt;VirtualHost *:80&gt;<br \/>\nServerName domain.co.uk<br \/>\nRewriteCond %{REQUEST_URI} !^\/GB\/en<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ \/GB\/en$1 [PT,L]<br \/>\n&lt;\/VirtualHost&gt;<br \/>\n(...)<\/code><\/p><\/blockquote>\n<p>Pros: quite efficient for execution time. Cons: Doesn't scale very well for humans : a lot of virtual hosts and rewrite rules to maintain ; if the list of urls that don't need rewrite increases, all set of rules need to be updated.<\/li>\n<li>Put all the rewrite rules into a single virtual host, adding a condition on the <code>SERVER_NAME<\/code>:<br \/>\n<blockquote><p><code>&lt;VirtualHost *:80&gt;<br \/>\nServerName domain.fr<br \/>\nServerAlias domain.co.uk<\/code><\/p>\n<p><code>RewriteCond %{SERVER_NAME} =domain.fr<br \/>\nRewriteCond %{REQUEST_URI} !^\/FR\/fr<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ \/FR\/fr$1 [PT,L]<\/code><\/p>\n<p><code>RewriteCond %{SERVER_NAME} =domain.co.uk<br \/>\nRewriteCond %{REQUEST_URI} !^\/GB\/en<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ \/GB\/en$1 [PT,L]<\/code><\/p>\n<p><code>(...)<br \/>\n&lt;\/VirtualHost&gt;<\/code><\/p><\/blockquote>\n<p>Pros: are there ?. Cons: Doesn't scale very well : a lot of rewrite rules to maintain ; rewrite rules being executed sequencially, the more domains there are, the more checks are done for the last domains ; if the list of urls that don't need rewrite increases, all set of rules need to be updated.<\/li>\n<li>Same as above, refactoring the common parts.<br \/>\n<blockquote><p><code>&lt;VirtualHost *:80&gt;<br \/>\nServerName domain.fr<br \/>\nServerAlias domain.co.uk<\/code><\/p>\n<p><code>RewriteCond %{REQUEST_URI} ^\/common\/ [OR]<br \/>\nRewriteCond %{REQUEST_URI} ^\/favicon.ico$<br \/>\nRewriteRule .* - [L]<\/code><\/p>\n<p><code>RewriteCond %{SERVER_NAME} =domain.fr<br \/>\nRewriteCond %{REQUEST_URI} !^\/FR\/fr<br \/>\nRewriteRule ^(.*)$ \/FR\/fr$1 [PT,L]<\/code><\/p>\n<p><code>RewriteCond %{SERVER_NAME} =domain.co.uk<br \/>\nRewriteCond %{REQUEST_URI} !^\/GB\/en<br \/>\nRewriteRule ^(.*)$ \/GB\/en$1 [PT,L]<\/code><\/p>\n<p><code>(...)<br \/>\n&lt;\/VirtualHost&gt;<\/code><\/p><\/blockquote>\n<p>Pros: gives a canonical place for urls that don't need rewrite. Cons: Doesn't scale very well : a lot of rewrite rules to maintain ; rewrite rules being executed sequencially, the more domains there are, the more checks are done for the last domains.<\/li>\n<li>Use a <code>RewriteMap<\/code>.<br \/>\nPros: Scales better ; gives a canonical place for urls that don't need rewrite. Cons: a separate file to maintain ; can be tricky to setup.<\/li>\n<\/ul>\n<p>The main problem with <em>mod_rewrite<\/em> is that there is no way to use variables or back references in the test patterns. In our case, we'd like to be able to do something like this:<\/p>\n<blockquote><p><code>RewriteMap l10n txt:l10n.map # switch to dbm when necessary<br \/>\nRewriteCond %{REQUEST_URI} !^${l10n:%{SERVER_NAME}}<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ ${l10n:%{SERVER_NAME}}$1 [PT,L]<\/code><\/p><\/blockquote>\n<p>Or<\/p>\n<blockquote><p><code>RewriteMap l10n txt:l10n.map # switch to dbm when necessary<br \/>\nRewriteCond ${l10n:%{SERVER_NAME}} ^(.+)$<br \/>\nRewriteCond %{REQUEST_URI} !^%1<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ %1$1 [PT,L]<\/code><\/p><\/blockquote>\n<p>The map file would have on each line a domain name followed by the corresponding url start (<code>\/FR\/fr<\/code> for <code>domain.fr<\/code>, etc.).<\/p>\n<p>Unfortunately, the above setup is not possible. A way around this lack of functionnality is to use some nice perl regexp trick.<\/p>\n<blockquote><p><code>RewriteMap l10n txt:l10n.map # switch to dbm when necessary<br \/>\nRewriteCond ${l10n:%{SERVER_NAME}} ^(.+)$<br \/>\nRewriteCond %1%{REQUEST_URI} !^(.+)\\1<br \/>\nRewriteCond %{REQUEST_URI} !^\/common\/<br \/>\nRewriteCond %{REQUEST_URI} !^\/favicon.ico$<br \/>\nRewriteRule ^(.*)$ %1$1 [PT,L]<\/code><\/p><\/blockquote>\n<p>As you can see, only the second <code>RewriteCond<\/code> differs. What we really want to test is whether the <code>%{REQUEST_URI}<\/code> begins with the proper url start or not. Let's say we're considering <code>domain.fr<\/code> and the map gave us <code>\/FR\/fr<\/code>. Following the first <code>RewriteCond<\/code>, <code>%1<\/code> contains <code>\/FR\/fr<\/code>.<\/p>\n<p>At the second <code>RewriteCond<\/code>, if <code>%{REQUEST_URI}<\/code> begins with <code>\/FR\/fr<\/code> then <code>%1%{REQUEST_URI}<\/code> begins with <code>\/FR\/fr\/FR\/fr<\/code>. Otherwise, it will only begin with <code>\/FR\/fr<\/code>.<\/p>\n<p>What we can test, then, is whether this <code>%1%{REQUEST_URI}<\/code> aggregate contains a repeating pattern at its beginning. This is exactly what the perl regexp does: it captures at least one character at the beginning of the tested string (<code>^(.+)<\/code>), and wants to find this captured string again (<code>\\1<\/code>).<\/p>\n<p>It is worth mentioning that this obviously falls flat when the url start itself contains a repeating pattern (e.g. <code>\/fr\/fr<\/code> instead of <code>\/FR\/fr<\/code>).<\/p>\n<p>As we want the <code>RewriteRule<\/code> to work only when our <code>%{REQUEST_URI}<\/code> does <b>not<\/b> begin with <code>\/FR\/fr<\/code>, we negate the regexp, which means nothing will actually be captured, such that <code>%1<\/code> in the <code>RewriteRule<\/code> will still be the last captured text, from the very first <code>RewriteCond<\/code>.<\/p>\n<p>Note that if all the tested url starts are the same length and\/or pattern, it may be worth changing the regexp to be more precise (and faster to match or not). Such as <code>^(\/..\/..)\\1<\/code> in our present case.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Consider the following mod_rewrite configuration : RewriteCond %{REQUEST_URI} !^\/FR\/fr RewriteCond %{REQUEST_URI} !^\/common\/ RewriteCond %{REQUEST_URI} !^\/favicon.ico$ RewriteRule ^(.*)$ \/FR\/fr$1 [PT,L] The need here, is to get anything that wouldn&#8217;t begin with \/FR\/fr from $DOCROOT\/FR\/fr (through, possibly, any other module rules, which was important in the setup where these rules were), except if they begin with \/common\/ [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[23],"class_list":["post-329","post","type-post","status-publish","format-standard","hentry","category-pdo","tag-en"],"_links":{"self":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/329","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=329"}],"version-history":[{"count":17,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/329\/revisions"}],"predecessor-version":[{"id":644,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/329\/revisions\/644"}],"wp:attachment":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=329"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=329"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}