--- /dev/null
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html lang="en"><head>
+<title>Technical guide: mod_proxy_html</title>
+<style type="text/css">
+@import url(/index.css) ;
+</style>
+</head><body>
+<div id="apache">
+<h1>mod_proxy_html: Technical Guide</h1>
+<p><a href="./">mod_proxy_html</a> Version 2.4 (Sept-Nov 2004).</p>
+<h2>Contents</h2>
+<ul id="toc">
+<li><a href="#url">URL Rewriting</a>
+<ul>
+<li><a href="#html">HTML Links</a></li>
+<li><a href="#event">Scripting Events</a></li>
+<li><a href="#cdata">Embedded Scripts and Stylesheets</a></li>
+</ul>
+</li>
+<li><a href="#output">Output Transformation</a>
+<ul>
+<li><a href="#fpi">FPI (Doctype)</a></li>
+<li><a href="#ml">HTML vs XHTML</a></li>
+<li><a href="#charset">Character Encoding</a></li>
+</ul>
+</li>
+<li><a href="#meta">meta http-equiv support</a></li>
+<li><a href="#misc">Other Fixups</a></li>
+<li><a href="#debug">Debugging your Configuration</a></li>
+<li><a href="#browser">Workarounds for Browser Bugs</a></li>
+</ul>
+<h2 id="url">URL Rewriting</h2>
+<p>Rewriting URLs into a proxy's address space is of course the primary
+purpose of this module. From Version 2.0, this capability has been
+extended from rewriting HTML URLs to processing scripts and stylesheets
+that <em>may</em> contain URLs.</p>
+<p>Because the module doesn't contain parsers for javascript or CSS, this
+additional processing means we have had to introduce some heuristic parsing.
+What that means is that the parser cannot automatically distinguish between
+a URL that should be replaced and one that merely appears as text. It's
+up to you to match the right things! To help you do this, we have introduced
+some new features:</p>
+<ol>
+<li>The <code>ProxyHTMLExtended</code> directive. The extended processing
+will only be activated if this is On. The default is Off, which gives you
+the old behaviour.</li>
+<li>Regular Expression match-and-replace. This can be used anywhere,
+but is most useful where context information can help distinguish URLs
+that should be replaced and avoid false positives. For example,
+to rewrite URLs of CSS @import, we might define a rule<br />
+<code>ProxyHTMLURLMap url\(http://internal.example.com([^\)]*)\) url(http://proxy.example.com$1) Rihe</code><br />
+This explicitly rewrites from one servername to another, and uses regexp
+memory to match a path and append it unchanged in $1, while using the
+<code>url(...)</code> context to reduce the danger of a match that shouldn't
+be rewritten. The <b>R</b> flag invokes regexp processing for this rule;
+<b>i</b> makes the match case-insensitive; while <b>h</b> and <b>e</b>
+save processing cycles by preventing the match being applied to HTML links
+and scripting events, where it is clearly irrelevant.</li>
+</ol>
+<h3 id="html">HTML Links</h3>
+<p>HTML links are those attributes defined by the HTML 4 and XHTML 1
+DTDs as of type <strong>%URI</strong>. For example, the <strong>href</strong>
+attribute of the <strong>a</strong> element. For a full list, see the
+declaration of <code>linked_elts</code> in <code>pstartElement</code>.
+Rules are applicable provided the <b>h</b> flag is not set.</p>
+<p>An HTML link always contains exactly one URL. So whenever mod_proxy_html
+finds a matching <code>ProxyHTMLURLMap</code> rule, it will apply the
+transformation once and stop processing the attribute.</p>
+<h3 id="event">Scripting Events</h3>
+<p>Scripting events are the contents of event attributes as defined in the
+HTML4 and XHTML1 DTDs; for example <code>onclick</code>. For a full list,
+see the declaration of <code>events</code> in <code>pstartElement</code>.
+Rules are applicable provided the <b>e</b> flag is not set.</p>
+<p>A scripting event may contain more than one URL, and will contain other
+text. So when <code>ProxyHTMLExtended</code> is On, all applicable rules
+will be applied in order until and unless a rule with the <b>L</b> flag
+matches. A rule may match more than once, provided the matches do not
+overlap, so a URL/pattern that appears more than once is rewritten
+every time it matches.</p>
+<h3 id="cdata">Embedded Scripts and Stylesheets</h3>
+<p>Embedded scripts and stylesheets are the contents of
+<code><script></code> and <code><style></code> elements.
+Rules are applicable provided the <b>c</b> flag is not set.</p>
+<p>A script or stylesheet may contain more than one URL, and will contain other
+text. So when <code>ProxyHTMLExtended</code> is On, all applicable rules
+will be applied in order until and unless a rule with the <b>L</b> flag
+matches. A rule may match more than once, provided the matches do not
+overlap, so a URL/pattern that appears more than once is rewritten
+every time it matches.</p>
+<h2 id="output">Output Transformation</h2>
+<p>mod_proxy_html uses a SAX parser. This means that the input stream
+- and hence the output generated - will be normalised in various ways,
+even where nothing is actually rewritten. To an HTML or XML parser,
+the document is not changed by normalisation, except as noted below.
+Exceptions to this may arise where the input stream is malformed, when
+the output of mod_proxy_html may be undefined. These should of course
+be fixed at the backend: if mod_proxy_html doesn't work as expected,
+then neither will browsers in real life, except by coincidence.</p>
+<h3 id="fpi">FPI (Doctype)</h3>
+<p>Strictly speaking, HTML and XHTML documents are required to have a
+Formal Public Identifier (FPI), also know as a Document Type Declaration.
+This references a Document Type Definition (DTD) which defines the grammar/
+syntax to which the contents of the document must conform.</p>
+<p>The parser in mod_proxy_html loses any FPI in the input document, but
+gives you the option to insert one. You may select either HTML or XHTML
+(see below), and if your backend is sloppy you may also want to use the
+"Legacy" keyword to make it declare documents "Transitional". You may
+also declare a custom DTD, or (if your backend is seriously screwed
+so no DTD would be appropriate) omit it altogether.</p>
+<h3 id="ml">HTML vs XHTML</h3>
+<p>The differences between HTML 4.01 and XHTML 1.0 are essentially negligible,
+and mod_proxy_html can transform between the two. You can safely select
+either, regardless of what the backend generates, and mod_proxy_html will
+apply the appropriate rules in generating output. HTML saves a few bytes.</p>
+<p>If you declare a custom DTD, you should specify whether to generate
+HTML or XHTML syntax in the output. This affects empty elements:
+HTML <b><br></b> vs XHTML <b><br /></b>.</p>
+<h3 id="charset">Character Encoding</h3>
+<p>The parser uses <strong>UTF-8</strong> (Unicode) internally, and
+mod_proxy_html <em>always</em> generates output as UTF-8. This is
+supported by all general-purpose web software, and supports more
+character sets and languages than any other charset.</p>
+<p>The character encoding should be declared in HTTP: for example<br />
+<code>Content-Type: text/html; charset=latin1</code><br />
+mod_proxy_html has always supported this in its input, and ensured
+this happens in output. But prior to version 2, it did not fully
+support detection (sniffing) the charset when a backend fails to
+set the HTTP Header.</p>
+<p>From version 2.0, mod_proxy_html will detect the encoding of its input
+as follows:</p>
+<ol>
+<li>The HTTP headers, where available, always take precedence over other
+information.</li>
+<li>If the first 2-4 bytes are an XML Byte Order Mark (BOM), this is used.</li>
+<li>If the document starts with an XML declaration
+<code><?xml .... ?></code>, this determines encoding by XML rules.</li>
+<li>If the document contains the HTML hack
+<code><meta http-equiv="Content-Type" ...></code>, any charset declared
+here is used.</li>
+<li>In the absence of any of the above indications, the HTML-over-HTTP default
+encoding <b>ISO-8859-1</b> is assumed.</li>
+<li>The parser is set to ignore invalid characters, so a malformed input
+stream will generate glitches (unexpected characters) rather than risk
+aborting a parse altogether.</li>
+</ol>
+<h2 id="meta">meta http-equiv support</h2>
+<p>The HTML <code>meta</code> element includes a form
+<code><meta http-equiv="Some-Header" contents="some-value"></code>
+which should notionally be converted to a real HTTP header by the webserver.
+In practice, it is more commonly supported in browsers than servers, and
+is common in constructs such as ClientPull (aka "meta refresh").
+The <code>ProxyHTMLMeta</code> directive supports the server generating
+real HTTP headers from these. However, it does not strip them from the
+HTML (except for Content-Type, which is removed in case it contains
+conflicting charset information).</p>
+<h2 id="misc">Other Fixups</h2>
+<p>For additional minor functions of mod_proxy_html, please see the
+<code>ProxyHTMLFixups</code> and <code>ProxyHTMLStripComments</code>
+directives in the <a href="config.html">Configuration Guide</a>.</p>
+<h2 id="debug">Debugging your Configuration</h2>
+<p>From Version 2.1, mod_proxy_html supports a <code>ProxyHTMLLogVerbose</code>
+directive, to enable verbose logging at <code>LogLevel Info</code>. This
+is designed to help with setting up your proxy configuration and
+diagnosing unexpected behaviour; it is not recommended for normal
+operation, and can be disabled altogether at compile time for extra
+performance (see the top of the source).</p>
+<p>When verbose logging is enabled, the following messages will be logged:</p>
+<ol>
+<li>In <strong>Charset Detection</strong>, it will report what charset is
+detected and how (HTTP rules, XML rules, or HTML rules). Note that,
+regardless of verbose logging, an error or warning will be logged if an
+unsupported charset is detected or if no information can be found.</li>
+<li>When <code>ProxyHTMLMeta</code> is enabled, it logs each header/value
+pair processed.</li>
+<li>Whenever a <code>ProxyHTMLURLMap</code> rule matches and causes a
+rewrite, it is logged. The message contains abbreviated context information:
+<strong>H</strong> denotes an HTML link matched; <strong>E</strong>
+denotes a match in a scripting event, <strong>C</strong> denotes a match
+in an inline script or stylesheet. When the match is a regexp
+find-and-replace, it is also marked as <strong>RX</strong>.</li>
+</ol>
+<h2 id="browser">Workarounds for Browser Bugs</h2>
+<p>Because mod_proxy_html unsets the Content-Length header, it risks
+losing the performance advantage of HTTP Keep-Alive. It therefore sets
+up HTTP Chunked Encoding when responding to HTTP/1.1 requests. This
+enables keep-alive again for HTTP/1.1 agents.</p>
+<p>Unfortunately some buggy agents will send an HTTP/1.1 request but
+choke on an HTTP/1.1 response. Typically you will see numbers before
+and after, and possibly in the middle of, a page. To work around this, set the
+<code>force-response-1.0</code> environment variable in httpd.conf.
+For example,<br /><code>BrowserMatch MSIE force-response-1.0</code></p>
+</div>
+<div id="navbar"><a class="internal" href="./" title="Up">Up</a>
+*
+<a class="internal" href="/" title="WebThing Apache Centre">Home</a>
+*
+<a class="internal" href="/contact.html" title="Contact WebThing">Contact</a>
+*
+<a class="external" href="http://www.webthing.com/" title="WebThing Ltd">WebÞing</a>
+*
+<a class="external" href="http://www.apache.org/" title="Apache Software Foundation">Apache</a></div></body></html>
--- /dev/null
+/********************************************************************
+ Copyright (c) 2003-4, WebThing Ltd
+ Author: Nick Kew <nick@webthing.com>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+
+*********************************************************************/
+
+
+/********************************************************************
+ Note to Users
+
+ You are requested to register as a user, at
+ http://apache.webthing.com/registration.html
+
+ This entitles you to support from the developer.
+ I'm unlikely to reply to help/support requests from
+ non-registered users, unless you're paying and/or offering
+ constructive feedback such as bug reports or sensible
+ suggestions for further development.
+
+ It also makes a small contribution to the effort
+ that's gone into developing this work.
+*********************************************************************/
+
+/* End of Notices */
+
+
+
+
+/* GO_FASTER
+
+ You can #define GO_FASTER to disable informational logging.
+ This disables the ProxyHTMLLogVerbose option altogether.
+
+ Default is to leave it undefined, and enable verbose logging
+ as a configuration option. Binaries are supplied with verbose
+ logging enabled.
+*/
+
+#ifdef GO_FASTER
+#define VERBOSE(x)
+#else
+#define VERBOSE(x) if ( verbose ) x
+#endif
+
+#define VERSION_STRING "proxy_html/2.4"
+
+#include <ctype.h>
+
+/* libxml */
+#include <libxml/HTMLparser.h>
+
+/* apache */
+#include <http_protocol.h>
+#include <http_config.h>
+#include <http_log.h>
+#include <apr_strings.h>
+
+module AP_MODULE_DECLARE_DATA proxy_html_module ;
+
+#define M_HTML 0x01
+#define M_EVENTS 0x02
+#define M_CDATA 0x04
+#define M_REGEX 0x08
+#define M_ATSTART 0x10
+#define M_ATEND 0x20
+#define M_LAST 0x40
+
+typedef struct {
+ unsigned int start ;
+ unsigned int end ;
+} meta ;
+typedef struct urlmap {
+ struct urlmap* next ;
+ unsigned int flags ;
+ union {
+ const char* c ;
+ regex_t* r ;
+ } from ;
+ const char* to ;
+} urlmap ;
+typedef struct {
+ urlmap* map ;
+ const char* doctype ;
+ const char* etag ;
+ unsigned int flags ;
+ int extfix ;
+ int metafix ;
+ int strip_comments ;
+#ifndef GO_FASTER
+ int verbose ;
+#endif
+ size_t bufsz ;
+} proxy_html_conf ;
+typedef struct {
+ htmlSAXHandlerPtr sax ;
+ ap_filter_t* f ;
+ proxy_html_conf* cfg ;
+ htmlParserCtxtPtr parser ;
+ apr_bucket_brigade* bb ;
+ char* buf ;
+ size_t offset ;
+ size_t avail ;
+} saxctxt ;
+
+static int is_empty_elt(const char* name) {
+ const char** p ;
+ static const char* empty_elts[] = {
+ "br" ,
+ "link" ,
+ "img" ,
+ "hr" ,
+ "input" ,
+ "meta" ,
+ "base" ,
+ "area" ,
+ "param" ,
+ "col" ,
+ "frame" ,
+ "isindex" ,
+ "basefont" ,
+ NULL
+ } ;
+ for ( p = empty_elts ; *p ; ++p )
+ if ( !strcmp( *p, name) )
+ return 1 ;
+ return 0 ;
+}
+
+typedef struct {
+ const char* name ;
+ const char** attrs ;
+} elt_t ;
+
+#define NORM_LC 0x1
+#define NORM_MSSLASH 0x2
+#define NORM_RESET 0x4
+
+typedef enum { ATTR_IGNORE, ATTR_URI, ATTR_EVENT } rewrite_t ;
+
+static void normalise(unsigned int flags, char* str) {
+ xmlChar* p ;
+ if ( flags & NORM_LC )
+ for ( p = str ; *p ; ++p )
+ if ( isupper(*p) )
+ *p = tolower(*p) ;
+
+ if ( flags & NORM_MSSLASH )
+ for ( p = strchr(str, '\\') ; p ; p = strchr(p+1, '\\') )
+ *p = '/' ;
+
+}
+
+#define FLUSH ap_fwrite(ctx->f->next, ctx->bb, (chars+begin), (i-begin)) ; begin = i+1
+static void pcharacters(void* ctxt, const xmlChar *chars, int length) {
+ saxctxt* ctx = (saxctxt*) ctxt ;
+ int i ;
+ int begin ;
+ for ( begin=i=0; i<length; i++ ) {
+ switch (chars[i]) {
+ case '&' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, "&") ; break ;
+ case '<' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, "<") ; break ;
+ case '>' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, ">") ; break ;
+ case '"' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, """) ; break ;
+ default : break ;
+ }
+ }
+ FLUSH ;
+}
+static void preserve(saxctxt* ctx, const size_t len) {
+ char* newbuf ;
+ if ( len <= ( ctx->avail - ctx->offset ) )
+ return ;
+ else while ( len > ( ctx->avail - ctx->offset ) )
+ ctx->avail += ctx->cfg->bufsz ;
+
+ newbuf = realloc(ctx->buf, ctx->avail) ;
+ if ( newbuf != ctx->buf ) {
+ if ( ctx->buf )
+ apr_pool_cleanup_kill(ctx->f->r->pool, ctx->buf, (void*)free) ;
+ apr_pool_cleanup_register(ctx->f->r->pool, newbuf,
+ (void*)free, apr_pool_cleanup_null);
+ ctx->buf = newbuf ;
+ }
+}
+static void pappend(saxctxt* ctx, const char* buf, const size_t len) {
+ preserve(ctx, len) ;
+ memcpy(ctx->buf+ctx->offset, buf, len) ;
+ ctx->offset += len ;
+}
+static void dump_content(saxctxt* ctx) {
+ urlmap* m ;
+ char* found ;
+ size_t s_from, s_to ;
+ size_t match ;
+ char c = 0 ;
+ int nmatch ;
+ regmatch_t pmatch[10] ;
+ char* subs ;
+ size_t len, offs ;
+#ifndef GO_FASTER
+ int verbose = ctx->cfg->verbose ;
+#endif
+
+ pappend(ctx, &c, 1) ; /* append null byte */
+ /* parse the text for URLs */
+ for ( m = ctx->cfg->map ; m ; m = m->next ) {
+ if ( ! ( m->flags & M_CDATA ) )
+ continue ;
+ if ( m->flags & M_REGEX ) {
+ nmatch = 10 ;
+ offs = 0 ;
+ while ( ! ap_regexec(m->from.r, ctx->buf+offs, nmatch, pmatch, 0) ) {
+ match = pmatch[0].rm_so ;
+ s_from = pmatch[0].rm_eo - match ;
+ subs = ap_pregsub(ctx->f->r->pool, m->to, ctx->buf+offs,
+ nmatch, pmatch) ;
+ s_to = strlen(subs) ;
+ len = strlen(ctx->buf) ;
+ offs += match ;
+ VERBOSE( {
+ const char* f = apr_pstrndup(ctx->f->r->pool,
+ ctx->buf + offs , s_from ) ;
+ ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r,
+ "C/RX: match at %s, substituting %s", f, subs) ;
+ } )
+ if ( s_to > s_from) {
+ preserve(ctx, s_to - s_from) ;
+ memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from,
+ len + 1 - s_from - offs) ;
+ memcpy(ctx->buf+offs, subs, s_to) ;
+ } else {
+ memcpy(ctx->buf + offs, subs, s_to) ;
+ memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from,
+ len + 1 - s_from - offs) ;
+ }
+ offs += s_to ;
+ }
+ } else {
+ s_from = strlen(m->from.c) ;
+ s_to = strlen(m->to) ;
+ for ( found = strstr(ctx->buf, m->from.c) ; found ;
+ found = strstr(ctx->buf+match+s_to, m->from.c) ) {
+ match = found - ctx->buf ;
+ if ( ( m->flags & M_ATSTART ) && ( match != 0) )
+ break ;
+ len = strlen(ctx->buf) ;
+ if ( ( m->flags & M_ATEND ) && ( match < (len - s_from) ) )
+ continue ;
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r,
+ "C: matched %s, substituting %s", m->from.c, m->to) ) ;
+ if ( s_to > s_from ) {
+ preserve(ctx, s_to - s_from) ;
+ memmove(ctx->buf+match+s_to, ctx->buf+match+s_from,
+ len + 1 - s_from - match) ;
+ memcpy(ctx->buf+match, m->to, s_to) ;
+ } else {
+ memcpy(ctx->buf+match, m->to, s_to) ;
+ memmove(ctx->buf+match+s_to, ctx->buf+match+s_from,
+ len + 1 - s_from - match) ;
+ }
+ }
+ }
+ }
+ ap_fputs(ctx->f->next, ctx->bb, ctx->buf) ;
+}
+static void pcdata(void* ctxt, const xmlChar *chars, int length) {
+ saxctxt* ctx = (saxctxt*) ctxt ;
+ if ( ctx->cfg->extfix ) {
+ pappend(ctx, chars, length) ;
+ } else {
+ ap_fwrite(ctx->f->next, ctx->bb, chars, length) ;
+ }
+}
+static void pcomment(void* ctxt, const xmlChar *chars) {
+ saxctxt* ctx = (saxctxt*) ctxt ;
+ if ( ctx->cfg->strip_comments )
+ return ;
+
+ if ( ctx->cfg->extfix ) {
+ pappend(ctx, "<!--", 4) ;
+ pappend(ctx, chars, strlen(chars) ) ;
+ pappend(ctx, "-->", 3) ;
+ } else {
+ ap_fputstrs(ctx->f->next, ctx->bb, "<!--", chars, "-->", NULL) ;
+ }
+}
+static void pendElement(void* ctxt, const xmlChar* name) {
+ saxctxt* ctx = (saxctxt*) ctxt ;
+ if ( ctx->offset > 0 ) {
+ dump_content(ctx) ;
+ ctx->offset = 0 ; /* having dumped it, we can re-use the memory */
+ }
+ if ( ! is_empty_elt(name) )
+ ap_fprintf(ctx->f->next, ctx->bb, "</%s>", name) ;
+}
+static void pstartElement(void* ctxt, const xmlChar* name,
+ const xmlChar** attrs ) {
+
+ int num_match ;
+ size_t offs, len ;
+ char* subs ;
+ rewrite_t is_uri ;
+ const char** linkattrs ;
+ const xmlChar** a ;
+ const elt_t* elt ;
+ const char** linkattr ;
+ urlmap* m ;
+ size_t s_to, s_from, match ;
+ char* found ;
+ saxctxt* ctx = (saxctxt*) ctxt ;
+ size_t nmatch ;
+ regmatch_t pmatch[10] ;
+#ifndef GO_FASTER
+ int verbose = ctx->cfg->verbose ;
+#endif
+
+ static const char* href[] = { "href", NULL } ;
+ static const char* cite[] = { "cite", NULL } ;
+ static const char* action[] = { "action", NULL } ;
+ static const char* imgattr[] = { "src", "longdesc", "usemap", NULL } ;
+ static const char* inputattr[] = { "src", "usemap", NULL } ;
+ static const char* scriptattr[] = { "src", "for", NULL } ;
+ static const char* frameattr[] = { "src", "longdesc", NULL } ;
+ static const char* objattr[] = { "classid", "codebase", "data", "usemap", NULL } ;
+ static const char* profile[] = { "profile", NULL } ;
+ static const char* background[] = { "background", NULL } ;
+ static const char* codebase[] = { "codebase", NULL } ;
+
+ static const elt_t linked_elts[] = {
+ { "a" , href } ,
+ { "img" , imgattr } ,
+ { "form", action } ,
+ { "link" , href } ,
+ { "script" , scriptattr } ,
+ { "base" , href } ,
+ { "area" , href } ,
+ { "input" , inputattr } ,
+ { "frame", frameattr } ,
+ { "iframe", frameattr } ,
+ { "object", objattr } ,
+ { "q" , cite } ,
+ { "blockquote" , cite } ,
+ { "ins" , cite } ,
+ { "del" , cite } ,
+ { "head" , profile } ,
+ { "body" , background } ,
+ { "applet", codebase } ,
+ { NULL, NULL }
+ } ;
+ static const char* events[] = {
+ "onclick" ,
+ "ondblclick" ,
+ "onmousedown" ,
+ "onmouseup" ,
+ "onmouseover" ,
+ "onmousemove" ,
+ "onmouseout" ,
+ "onkeypress" ,
+ "onkeydown" ,
+ "onkeyup" ,
+ "onfocus" ,
+ "onblur" ,
+ "onload" ,
+ "onunload" ,
+ "onsubmit" ,
+ "onreset" ,
+ "onselect" ,
+ "onchange" ,
+ NULL
+ } ;
+
+ ap_fputc(ctx->f->next, ctx->bb, '<') ;
+ ap_fputs(ctx->f->next, ctx->bb, name) ;
+
+ if ( attrs ) {
+ linkattrs = 0 ;
+ for ( elt = linked_elts; elt->name != NULL ; ++elt )
+ if ( !strcmp(elt->name, name) ) {
+ linkattrs = elt->attrs ;
+ break ;
+ }
+ for ( a = attrs ; *a ; a += 2 ) {
+ ctx->offset = 0 ;
+ if ( a[1] ) {
+ pappend(ctx, a[1], strlen(a[1])+1) ;
+ is_uri = ATTR_IGNORE ;
+ if ( linkattrs ) {
+ for ( linkattr = linkattrs ; *linkattr ; ++linkattr) {
+ if ( !strcmp(*linkattr, *a) ) {
+ is_uri = ATTR_URI ;
+ break ;
+ }
+ }
+ }
+ if ( (is_uri == ATTR_IGNORE) && ctx->cfg->extfix ) {
+ for ( linkattr = events; *linkattr; ++linkattr ) {
+ if ( !strcmp(*linkattr, *a) ) {
+ is_uri = ATTR_EVENT ;
+ break ;
+ }
+ }
+ }
+ switch ( is_uri ) {
+ case ATTR_URI:
+ num_match = 0 ;
+ for ( m = ctx->cfg->map ; m ; m = m->next ) {
+ if ( ! ( m->flags & M_HTML ) )
+ continue ;
+ if ( m->flags & M_REGEX ) {
+ nmatch = 10 ;
+ if ( ! ap_regexec(m->from.r, ctx->buf, nmatch, pmatch, 0) ) {
+ ++num_match ;
+ offs = match = pmatch[0].rm_so ;
+ s_from = pmatch[0].rm_eo - match ;
+ subs = ap_pregsub(ctx->f->r->pool, m->to, ctx->buf+offs,
+ nmatch, pmatch) ;
+ VERBOSE( {
+ const char* f = apr_pstrndup(ctx->f->r->pool,
+ ctx->buf + offs , s_from ) ;
+ ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r,
+ "H/RX: match at %s, substituting %s", f, subs) ;
+ } )
+ s_to = strlen(subs) ;
+ len = strlen(ctx->buf) ;
+ if ( s_to > s_from) {
+ preserve(ctx, s_to - s_from) ;
+ memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from,
+ len + 1 - s_from - offs) ;
+ memcpy(ctx->buf+offs, subs, s_to) ;
+ } else {
+ memcpy(ctx->buf + offs, subs, s_to) ;
+ memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from,
+ len + 1 - s_from - offs) ;
+ }
+ }
+ } else {
+ s_from = strlen(m->from.c) ;
+ if ( ! strncasecmp(ctx->buf, m->from.c, s_from ) ) {
+ ++num_match ;
+ s_to = strlen(m->to) ;
+ len = strlen(ctx->buf) ;
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r,
+ "H: matched %s, substituting %s", m->from.c, m->to) ) ;
+ if ( s_to > s_from ) {
+ preserve(ctx, s_to - s_from) ;
+ memmove(ctx->buf+s_to, ctx->buf+s_from,
+ len + 1 - s_from ) ;
+ memcpy(ctx->buf, m->to, s_to) ;
+ } else { /* it fits in the existing space */
+ memcpy(ctx->buf, m->to, s_to) ;
+ memmove(ctx->buf+s_to, ctx->buf+s_from,
+ len + 1 - s_from) ;
+ }
+ break ;
+ }
+ }
+ if ( num_match > 0 ) /* URIs only want one match */
+ break ;
+ }
+ break ;
+ case ATTR_EVENT:
+ for ( m = ctx->cfg->map ; m ; m = m->next ) {
+ num_match = 0 ; /* reset here since we're working per-rule */
+ if ( ! ( m->flags & M_EVENTS ) )
+ continue ;
+ if ( m->flags & M_REGEX ) {
+ nmatch = 10 ;
+ offs = 0 ;
+ while ( ! ap_regexec(m->from.r, ctx->buf+offs,
+ nmatch, pmatch, 0) ) {
+ match = pmatch[0].rm_so ;
+ s_from = pmatch[0].rm_eo - match ;
+ subs = ap_pregsub(ctx->f->r->pool, m->to, ctx->buf+offs,
+ nmatch, pmatch) ;
+ VERBOSE( {
+ const char* f = apr_pstrndup(ctx->f->r->pool,
+ ctx->buf + offs , s_from ) ;
+ ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r,
+ "E/RX: match at %s, substituting %s", f, subs) ;
+ } )
+ s_to = strlen(subs) ;
+ offs += match ;
+ len = strlen(ctx->buf) ;
+ if ( s_to > s_from) {
+ preserve(ctx, s_to - s_from) ;
+ memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from,
+ len + 1 - s_from - offs) ;
+ memcpy(ctx->buf+offs, subs, s_to) ;
+ } else {
+ memcpy(ctx->buf + offs, subs, s_to) ;
+ memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from,
+ len + 1 - s_from - offs) ;
+ }
+ offs += s_to ;
+ ++num_match ;
+ }
+ } else {
+ found = strstr(ctx->buf, m->from.c) ;
+ if ( (m->flags & M_ATSTART) && ( found != ctx->buf) )
+ continue ;
+ while ( found ) {
+ s_from = strlen(m->from.c) ;
+ s_to = strlen(m->to) ;
+ match = found - ctx->buf ;
+ if ( ( s_from < strlen(found) ) && (m->flags & M_ATEND ) ) {
+ found = strstr(ctx->buf+match+s_from, m->from.c) ;
+ continue ;
+ } else {
+ found = strstr(ctx->buf+match+s_to, m->from.c) ;
+ }
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r,
+ "E: matched %s, substituting %s", m->from.c, m->to) ) ;
+ len = strlen(ctx->buf) ;
+ if ( s_to > s_from ) {
+ preserve(ctx, s_to - s_from) ;
+ memmove(ctx->buf+match+s_to, ctx->buf+match+s_from,
+ len + 1 - s_from - match) ;
+ memcpy(ctx->buf+match, m->to, s_to) ;
+ } else {
+ memcpy(ctx->buf+match, m->to, s_to) ;
+ memmove(ctx->buf+match+s_to, ctx->buf+match+s_from,
+ len + 1 - s_from - match) ;
+ }
+ ++num_match ;
+ }
+ }
+ if ( num_match && ( m->flags & M_LAST ) )
+ break ;
+ }
+ break ;
+ case ATTR_IGNORE:
+ break ;
+ }
+ }
+ if ( ! a[1] )
+ ap_fputstrs(ctx->f->next, ctx->bb, " ", a[0], NULL) ;
+ else {
+
+ if ( ctx->cfg->flags != 0 )
+ normalise(ctx->cfg->flags, ctx->buf) ;
+
+ /* write the attribute, using pcharacters to html-escape
+ anything that needs it in the value.
+ */
+ ap_fputstrs(ctx->f->next, ctx->bb, " ", a[0], "=\"", NULL) ;
+ pcharacters(ctx, ctx->buf, strlen(ctx->buf)) ;
+ ap_fputc(ctx->f->next, ctx->bb, '"') ;
+ }
+ }
+ }
+ ctx->offset = 0 ;
+ if ( is_empty_elt(name) )
+ ap_fputs(ctx->f->next, ctx->bb, ctx->cfg->etag) ;
+ else
+ ap_fputc(ctx->f->next, ctx->bb, '>') ;
+}
+static htmlSAXHandlerPtr setupSAX(apr_pool_t* pool) {
+ htmlSAXHandlerPtr sax = apr_pcalloc(pool, sizeof(htmlSAXHandler) ) ;
+ sax->startDocument = NULL ;
+ sax->endDocument = NULL ;
+ sax->startElement = pstartElement ;
+ sax->endElement = pendElement ;
+ sax->characters = pcharacters ;
+ sax->comment = pcomment ;
+ sax->cdataBlock = pcdata ;
+ return sax ;
+}
+
+static regex_t* seek_meta_ctype ;
+static regex_t* seek_charset ;
+static regex_t* seek_meta ;
+
+static void proxy_html_child_init(apr_pool_t* pool, server_rec* s) {
+ seek_meta_ctype = ap_pregcomp(pool,
+ "(<meta[^>]*http-equiv[ \t\r\n='\"]*content-type[^>]*>)",
+ REG_EXTENDED|REG_ICASE) ;
+ seek_charset = ap_pregcomp(pool, "charset=([A-Za-z0-9_-]+)",
+ REG_EXTENDED|REG_ICASE) ;
+ seek_meta = ap_pregcomp(pool, "<meta[^>]*(http-equiv)[^>]*>",
+ REG_EXTENDED|REG_ICASE) ;
+}
+
+static xmlCharEncoding sniff_encoding(request_rec* r, const char* cbuf, size_t bytes
+#ifndef GO_FASTER
+ , int verbose
+#endif
+ ) {
+ xmlCharEncoding ret ;
+ char* encoding = NULL ;
+ char* p ;
+ char* q ;
+ regmatch_t match[2] ;
+ unsigned char* buf = (unsigned char*)cbuf ;
+
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r,
+ "Content-Type is %s", r->content_type) ) ;
+
+/* If we've got it in the HTTP headers, there's nothing to do */
+ if ( r->content_type &&
+ ( p = ap_strcasestr(r->content_type, "charset=") , p > 0 ) ) {
+ p += 8 ;
+ if ( encoding = apr_pstrndup(r->pool, p, strcspn(p, " ;") ) , encoding ) {
+ if ( ret = xmlParseCharEncoding(encoding),
+ ret != XML_CHAR_ENCODING_ERROR ) {
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r,
+ "Got charset %s from HTTP headers", encoding) ) ;
+ return ret ;
+ } else {
+ ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+ "Unsupported charset %s in HTTP headers", encoding) ;
+ encoding = NULL ;
+ }
+ }
+ }
+
+/* to sniff, first we look for BOM */
+ if ( ret = xmlDetectCharEncoding(buf, bytes),
+ ret != XML_CHAR_ENCODING_NONE ) {
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r,
+ "Got charset from XML rules.") ) ;
+ return ret ;
+ }
+
+/* If none of the above, look for a META-thingey */
+ encoding = NULL ;
+ if ( ap_regexec(seek_meta_ctype, buf, 1, match, 0) == 0 ) {
+ p = apr_pstrndup(r->pool, buf + match[0].rm_so,
+ match[0].rm_eo - match[0].rm_so) ;
+ if ( ap_regexec(seek_charset, p, 2, match, 0) == 0 )
+ encoding = apr_pstrndup(r->pool, p+match[1].rm_so,
+ match[1].rm_eo - match[1].rm_so) ;
+ }
+
+/* either it's set to something we found or it's still the default */
+ if ( encoding )
+ if ( ret = xmlParseCharEncoding(encoding),
+ ret != XML_CHAR_ENCODING_ERROR ) {
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r,
+ "Got charset %s from HTML META", encoding) ) ;
+ return ret ;
+ } else {
+ ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+ "Unsupported charset %s in HTML META", encoding) ;
+ }
+
+/* the old HTTP default is a last resort */
+ ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
+ "No usable charset information: using old HTTP default LATIN1") ;
+ return XML_CHAR_ENCODING_8859_1 ;
+}
+static meta* metafix(request_rec* r, const char* buf /*, size_t bytes*/
+#ifndef GO_FASTER
+ , int verbose
+#endif
+ ) {
+ meta* ret = NULL ;
+ size_t offs = 0 ;
+ const char* p ;
+ const char* q ;
+ char* header ;
+ char* content ;
+ regmatch_t pmatch[2] ;
+ char delim ;
+
+ while ( ! ap_regexec(seek_meta, buf+offs, 2, pmatch, 0) ) {
+ header = NULL ;
+ content = NULL ;
+ p = buf+offs+pmatch[1].rm_eo ;
+ while ( !isalpha(*++p) ) ;
+ for ( q = p ; isalnum(*q) || (*q == '-') ; ++q ) ;
+ header = apr_pstrndup(r->pool, p, q-p) ;
+ if ( strncasecmp(header, "Content-", 8) ) {
+/* find content=... string */
+ for ( p = strstr(buf+offs+pmatch[0].rm_so, "content") ; *p ; ) {
+ p += 7 ;
+ while ( *p && isspace(*p) )
+ ++p ;
+ if ( *p != '=' )
+ continue ;
+ while ( *p && isspace(*++p) ) ;
+ if ( ( *p == '\'' ) || ( *p == '"' ) ) {
+ delim = *p++ ;
+ for ( q = p ; *q != delim ; ++q ) ;
+ } else {
+ for ( q = p ; *q && !isspace(*q) && (*q != '>') ; ++q ) ;
+ }
+ content = apr_pstrndup(r->pool, p, q-p) ;
+ break ;
+ }
+ } else if ( !strncasecmp(header, "Content-Type", 12) ) {
+ ret = apr_palloc(r->pool, sizeof(meta) ) ;
+ ret->start = pmatch[0].rm_so ;
+ ret->end = pmatch[0].rm_eo ;
+ }
+ if ( header && content ) {
+ VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r,
+ "Adding header [%s: %s] from HTML META", header, content) ) ;
+ apr_table_setn(r->headers_out, header, content) ;
+ }
+ offs += pmatch[0].rm_eo ;
+ }
+ return ret ;
+}
+
+static int proxy_html_filter_init(ap_filter_t* f) {
+ const char* env ;
+ saxctxt* fctx ;
+
+#if 0
+/* remove content-length filter */
+ ap_filter_rec_t* clf = ap_get_output_filter_handle("CONTENT_LENGTH") ;
+ ap_filter_t* ff = f->next ;
+
+ do {
+ ap_filter_t* fnext = ff->next ;
+ if ( ff->frec == clf )
+ ap_remove_output_filter(ff) ;
+ ff = fnext ;
+ } while ( ff ) ;
+#endif
+
+ fctx = f->ctx = apr_pcalloc(f->r->pool, sizeof(saxctxt)) ;
+ fctx->sax = setupSAX(f->r->pool) ;
+ fctx->f = f ;
+ fctx->bb = apr_brigade_create(f->r->pool, f->r->connection->bucket_alloc) ;
+ fctx->cfg = ap_get_module_config(f->r->per_dir_config,&proxy_html_module);
+
+ if ( f->r->proto_num >= 1001 ) {
+ if ( ! f->r->main && ! f->r->prev ) {
+ env = apr_table_get(f->r->subprocess_env, "force-response-1.0") ;
+ if ( !env )
+ f->r->chunked = 1 ;
+ }
+ }
+
+ apr_table_unset(f->r->headers_out, "Content-Length") ;
+ apr_table_unset(f->r->headers_out, "ETag") ;
+ return OK ;
+}
+static saxctxt* check_filter_init (ap_filter_t* f) {
+
+ const char* errmsg = NULL ;
+ if ( ! f->r->proxyreq ) {
+ errmsg = "Non-proxy request; not inserting proxy-html filter" ;
+ } else if ( ! f->r->content_type ) {
+ errmsg = "No content-type; bailing out of proxy-html filter" ;
+ } else if ( strncasecmp(f->r->content_type, "text/html", 9) &&
+ strncasecmp(f->r->content_type, "application/xhtml+xml", 21) ) {
+ errmsg = "Non-HTML content; not inserting proxy-html filter" ;
+ }
+
+ if ( errmsg ) {
+#ifndef GO_FASTER
+ proxy_html_conf* cfg
+ = ap_get_module_config(f->r->per_dir_config, &proxy_html_module);
+ if ( cfg->verbose ) {
+ ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, f->r, errmsg) ;
+ }
+#endif
+ ap_remove_output_filter(f) ;
+ return NULL ;
+ }
+ if ( ! f->ctx )
+ proxy_html_filter_init(f) ;
+ return f->ctx ;
+}
+static int proxy_html_filter(ap_filter_t* f, apr_bucket_brigade* bb) {
+ apr_bucket* b ;
+ meta* m = NULL ;
+ xmlCharEncoding enc ;
+ const char* buf = 0 ;
+ apr_size_t bytes = 0 ;
+ int xmlopts = XML_PARSE_RECOVER | XML_PARSE_NONET |
+ XML_PARSE_NOBLANKS | XML_PARSE_NOERROR | XML_PARSE_NOWARNING ;
+
+ saxctxt* ctxt = check_filter_init(f) ;
+ if ( ! ctxt )
+ return ap_pass_brigade(f->next, bb) ;
+
+ for ( b = APR_BRIGADE_FIRST(bb) ;
+ b != APR_BRIGADE_SENTINEL(bb) ;
+ b = APR_BUCKET_NEXT(b) ) {
+ if ( APR_BUCKET_IS_EOS(b) ) {
+ if ( ctxt->parser != NULL ) {
+ htmlParseChunk(ctxt->parser, buf, 0, 1) ;
+ }
+ APR_BRIGADE_INSERT_TAIL(ctxt->bb,
+ apr_bucket_eos_create(ctxt->bb->bucket_alloc) ) ;
+ ap_pass_brigade(ctxt->f->next, ctxt->bb) ;
+ } else if ( apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ)
+ == APR_SUCCESS ) {
+ if ( ctxt->parser == NULL ) {
+ if ( buf[bytes] != 0 ) {
+ /* make a string for parse routines to play with */
+ char* buf1 = apr_palloc(f->r->pool, bytes+1) ;
+ memcpy(buf1, buf, bytes) ;
+ buf1[bytes] = 0 ;
+ buf = buf1 ;
+ }
+#ifndef GO_FASTER
+ enc = sniff_encoding(f->r, buf, bytes, ctxt->cfg->verbose) ;
+ if ( ctxt->cfg->metafix )
+ m = metafix(f->r, buf, ctxt->cfg->verbose) ;
+#else
+ enc = sniff_encoding(f->r, buf, bytes) ;
+ if ( ctxt->cfg->metafix )
+ m = metafix(f->r, buf) ;
+#endif
+ ap_set_content_type(f->r, "text/html;charset=utf-8") ;
+ ap_fputs(f->next, ctxt->bb, ctxt->cfg->doctype) ;
+ if ( m ) {
+ ctxt->parser = htmlCreatePushParserCtxt(ctxt->sax, ctxt,
+ buf, m->start, 0, enc ) ;
+ htmlParseChunk(ctxt->parser, buf+m->end, bytes-m->end, 0) ;
+ } else {
+ ctxt->parser = htmlCreatePushParserCtxt(ctxt->sax, ctxt,
+ buf, bytes, 0, enc ) ;
+ }
+ apr_pool_cleanup_register(f->r->pool, ctxt->parser,
+ (void*)htmlFreeParserCtxt, apr_pool_cleanup_null) ;
+ if ( xmlopts = xmlCtxtUseOptions(ctxt->parser, xmlopts ), xmlopts )
+ ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, f->r,
+ "Unsupported parser opts %x", xmlopts) ;
+ } else {
+ htmlParseChunk(ctxt->parser, buf, bytes, 0) ;
+ }
+ } else {
+ ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, f->r, "Error in bucket read") ;
+ }
+ }
+ /*ap_fflush(ctxt->f->next, ctxt->bb) ; // uncomment for debug */
+ apr_brigade_cleanup(bb) ;
+ return APR_SUCCESS ;
+}
+static const char* fpi_html =
+ "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\">\n" ;
+static const char* fpi_html_legacy =
+ "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n" ;
+static const char* fpi_xhtml =
+ "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n" ;
+static const char* fpi_xhtml_legacy =
+ "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n" ;
+static const char* html_etag = ">" ;
+static const char* xhtml_etag = " />" ;
+/*#define DEFAULT_DOCTYPE fpi_html */
+static const char* DEFAULT_DOCTYPE = "" ;
+#define DEFAULT_ETAG html_etag
+
+static void* proxy_html_config(apr_pool_t* pool, char* x) {
+ proxy_html_conf* ret = apr_pcalloc(pool, sizeof(proxy_html_conf) ) ;
+ ret->doctype = DEFAULT_DOCTYPE ;
+ ret->etag = DEFAULT_ETAG ;
+ ret->bufsz = 8192 ;
+ return ret ;
+}
+static void* proxy_html_merge(apr_pool_t* pool, void* BASE, void* ADD) {
+ proxy_html_conf* base = (proxy_html_conf*) BASE ;
+ proxy_html_conf* add = (proxy_html_conf*) ADD ;
+ proxy_html_conf* conf = apr_palloc(pool, sizeof(proxy_html_conf)) ;
+
+ if ( add->map && base->map ) {
+ urlmap* a ;
+ conf->map = NULL ;
+ for ( a = base->map ; a ; a = a->next ) {
+ urlmap* save = conf->map ;
+ conf->map = apr_pmemdup(pool, a, sizeof(urlmap)) ;
+ conf->map->next = save ;
+ }
+ for ( a = add->map ; a ; a = a->next ) {
+ urlmap* save = conf->map ;
+ conf->map = apr_pmemdup(pool, a, sizeof(urlmap)) ;
+ conf->map->next = save ;
+ }
+ } else
+ conf->map = add->map ? add->map : base->map ;
+
+ conf->doctype = ( add->doctype == DEFAULT_DOCTYPE )
+ ? base->doctype : add->doctype ;
+ conf->etag = ( add->etag == DEFAULT_ETAG ) ? base->etag : add->etag ;
+ conf->bufsz = add->bufsz ;
+ if ( add->flags & NORM_RESET ) {
+ conf->flags = add->flags ^ NORM_RESET ;
+ conf->metafix = add->metafix ;
+ conf->extfix = add->extfix ;
+ conf->strip_comments = add->strip_comments ;
+#ifndef GO_FASTER
+ conf->verbose = add->verbose ;
+#endif
+ } else {
+ conf->flags = base->flags | add->flags ;
+ conf->metafix = base->metafix | add->metafix ;
+ conf->extfix = base->extfix | add->extfix ;
+ conf->strip_comments = base->strip_comments | add->strip_comments ;
+#ifndef GO_FASTER
+ conf->verbose = base->verbose | add->verbose ;
+#endif
+ }
+ return conf ;
+}
+#define REGFLAG(n,s,c) ( (s&&(strchr((s),(c))!=NULL)) ? (n) : 0 )
+#define XREGFLAG(n,s,c) ( (!s||(strchr((s),(c))==NULL)) ? (n) : 0 )
+static const char* set_urlmap(cmd_parms* cmd, void* CFG,
+ const char* from, const char* to, const char* flags) {
+ int regflags ;
+ proxy_html_conf* cfg = (proxy_html_conf*)CFG ;
+ urlmap* map ;
+ urlmap* newmap = apr_palloc(cmd->pool, sizeof(urlmap) ) ;
+
+ newmap->next = NULL ;
+ newmap->flags
+ = XREGFLAG(M_HTML,flags,'h')
+ | XREGFLAG(M_EVENTS,flags,'e')
+ | XREGFLAG(M_CDATA,flags,'c')
+ | REGFLAG(M_ATSTART,flags,'^')
+ | REGFLAG(M_ATEND,flags,'$')
+ | REGFLAG(M_REGEX,flags,'R')
+ | REGFLAG(M_LAST,flags,'L')
+ ;
+
+ if ( cfg->map ) {
+ for ( map = cfg->map ; map->next ; map = map->next ) ;
+ map->next = newmap ;
+ } else
+ cfg->map = newmap ;
+
+ if ( ! (newmap->flags & M_REGEX) ) {
+ newmap->from.c = apr_pstrdup(cmd->pool, from) ;
+ newmap->to = apr_pstrdup(cmd->pool, to) ;
+ } else {
+ regflags
+ = REGFLAG(REG_EXTENDED,flags,'x')
+ | REGFLAG(REG_ICASE,flags,'i')
+ | REGFLAG(REG_NOSUB,flags,'n')
+ | REGFLAG(REG_NEWLINE,flags,'s')
+ ;
+ newmap->from.r = ap_pregcomp(cmd->pool, from, regflags) ;
+ newmap->to = apr_pstrdup(cmd->pool, to) ;
+ }
+ return NULL ;
+}
+static const char* set_doctype(cmd_parms* cmd, void* CFG, const char* t,
+ const char* l) {
+ proxy_html_conf* cfg = (proxy_html_conf*)CFG ;
+ if ( !strcasecmp(t, "xhtml") ) {
+ cfg->etag = xhtml_etag ;
+ if ( l && !strcasecmp(l, "legacy") )
+ cfg->doctype = fpi_xhtml_legacy ;
+ else
+ cfg->doctype = fpi_xhtml ;
+ } else if ( !strcasecmp(t, "html") ) {
+ cfg->etag = html_etag ;
+ if ( l && !strcasecmp(l, "legacy") )
+ cfg->doctype = fpi_html_legacy ;
+ else
+ cfg->doctype = fpi_html ;
+ } else {
+ cfg->doctype = apr_pstrdup(cmd->pool, t) ;
+ if ( l && ( ( l[0] == 'x' ) || ( l[0] == 'X' ) ) )
+ cfg->etag = xhtml_etag ;
+ else
+ cfg->etag = html_etag ;
+ }
+ return NULL ;
+}
+static void set_param(proxy_html_conf* cfg, const char* arg) {
+ if ( arg && *arg ) {
+ if ( !strcmp(arg, "lowercase") )
+ cfg->flags |= NORM_LC ;
+ else if ( !strcmp(arg, "dospath") )
+ cfg->flags |= NORM_MSSLASH ;
+ else if ( !strcmp(arg, "reset") )
+ cfg->flags |= NORM_RESET ;
+ }
+}
+static const char* set_flags(cmd_parms* cmd, void* CFG, const char* arg1,
+ const char* arg2, const char* arg3) {
+ set_param( (proxy_html_conf*)CFG, arg1) ;
+ set_param( (proxy_html_conf*)CFG, arg2) ;
+ set_param( (proxy_html_conf*)CFG, arg3) ;
+ return NULL ;
+}
+static const command_rec proxy_html_cmds[] = {
+ AP_INIT_TAKE23("ProxyHTMLURLMap", set_urlmap, NULL,
+ RSRC_CONF|ACCESS_CONF, "Map URL From To" ) ,
+ AP_INIT_TAKE12("ProxyHTMLDoctype", set_doctype, NULL,
+ RSRC_CONF|ACCESS_CONF, "(HTML|XHTML) [Legacy]" ) ,
+ AP_INIT_TAKE123("ProxyHTMLFixups", set_flags, NULL,
+ RSRC_CONF|ACCESS_CONF, "Options are lowercase, dospath" ) ,
+ AP_INIT_FLAG("ProxyHTMLMeta", ap_set_flag_slot,
+ (void*)APR_OFFSETOF(proxy_html_conf, metafix),
+ RSRC_CONF|ACCESS_CONF, "Fix META http-equiv elements" ) ,
+ AP_INIT_FLAG("ProxyHTMLExtended", ap_set_flag_slot,
+ (void*)APR_OFFSETOF(proxy_html_conf, extfix),
+ RSRC_CONF|ACCESS_CONF, "Map URLs in Javascript and CSS" ) ,
+ AP_INIT_FLAG("ProxyHTMLStripComments", ap_set_flag_slot,
+ (void*)APR_OFFSETOF(proxy_html_conf, strip_comments),
+ RSRC_CONF|ACCESS_CONF, "Strip out comments" ) ,
+#ifndef GO_FASTER
+ AP_INIT_FLAG("ProxyHTMLLogVerbose", ap_set_flag_slot,
+ (void*)APR_OFFSETOF(proxy_html_conf, verbose),
+ RSRC_CONF|ACCESS_CONF, "Verbose Logging (use with LogLevel Info)" ) ,
+#endif
+ AP_INIT_TAKE1("ProxyHTMLBufSize", ap_set_int_slot,
+ (void*)APR_OFFSETOF(proxy_html_conf, bufsz),
+ RSRC_CONF|ACCESS_CONF, "Buffer size" ) ,
+ { NULL }
+} ;
+static int mod_proxy_html(apr_pool_t* p, apr_pool_t* p1, apr_pool_t* p2,
+ server_rec* s) {
+ ap_add_version_component(p, VERSION_STRING) ;
+ return OK ;
+}
+static void proxy_html_hooks(apr_pool_t* p) {
+ ap_register_output_filter("proxy-html", proxy_html_filter,
+ NULL, AP_FTYPE_RESOURCE) ;
+ ap_hook_post_config(mod_proxy_html, NULL, NULL, APR_HOOK_MIDDLE) ;
+ ap_hook_child_init(proxy_html_child_init, NULL, NULL, APR_HOOK_MIDDLE) ;
+}
+module AP_MODULE_DECLARE_DATA proxy_html_module = {
+ STANDARD20_MODULE_STUFF,
+ proxy_html_config,
+ proxy_html_merge,
+ NULL,
+ NULL,
+ proxy_html_cmds,
+ proxy_html_hooks
+} ;