From: Emmanuel Lacour Date: Sat, 13 Oct 2007 14:27:09 +0000 (+0000) Subject: [svn-inject] Installing original source of mod-proxy-html X-Git-Tag: 2.4.3-2~2 X-Git-Url: http://git.home-dn.net/?p=manu%2Fmod-proxy-html.git;a=commitdiff_plain;h=7254aa90316b0f07c538db96cb6a1740654c9a74;ds=sidebyside [svn-inject] Installing original source of mod-proxy-html --- 7254aa90316b0f07c538db96cb6a1740654c9a74 diff --git a/config.html b/config.html new file mode 100644 index 0000000..1317a3e --- /dev/null +++ b/config.html @@ -0,0 +1,154 @@ + + + +mod_proxy_html + + +
+

mod_proxy_html: Configuration

+

mod_proxy_html Version 2.4 (Sept-Nov 2004)

+

Configuration Directives

+

The following can be used anywhere in an httpd.conf +or included configuration file.

+
+
ProxyHTMLURLMap
+
+

Syntax: +ProxyHTMLURLMap from-pattern to-pattern flags

+

This is the key directive for rewriting HTML links. When parsing a document, +whenever a link target matches from-pattern, the matching +portion will be rewritten to to-pattern.

+

Starting at version 2.0, this supports a wider range of pattern-matching +and substitutions, including regular expression search and replace, +controlled by the optional third flags argument. +

+

Flags for ProxyHTMLURLMap

+

Flags are case-sensitive.

+
+
h
+

Ignore HTML links (pass through unchanged)

+
e
+

Ignore scripting events (pass through unchanged)

+
c
+

Pass embedded script and style sections through untouched.

+
L
+

Last-match. If this rule matches, no more rules are applied +(note that this happens automatically for HTML links).

+
R
+

Use Regular Expression matching-and-replace. from-pattern +is a regexp, and to-pattern a replacement string that may be +based on the regexp. Regexp memory is supported: you can use brackets () +in the from-pattern and retrieve the matches with $1 to $9 +in the to-pattern.

+

If R is not set, it will use string-literal search-and-replace, as in +versions 1.x. Logic is starts-with in HTML links, but +contains in scripting events and embedded script and style sections. +

+
+
x
+

Use POSIX extended Regular Expressions. Only applicable with R.

+
i
+

Case-insensitive matching. Only applicable with R.

+
n
+

Disable regexp memory (for speed). Only applicable with R.

+
s
+

Line-based regexp matching. Only applicable with R.

+
^
+

Match at start only. This applies only to string matching +(not regexps) and is irrelevant to HTML links.

+
$
+

Match at end only. This applies only to string matching +(not regexps) and is irrelevant to HTML links.

+
+ +
+
ProxyHTMLDoctype
+
+

Syntax: ProxyHTMLDoctype HTML|XHTML [Legacy]

+

Alternative Syntax: ProxyHTMLDocType fpi [SGML|XML]

+

In the first form, documents will be declared as HTML 4.01 or XHTML 1.0 +according to the option selected. This option also determines whether +HTML or XHTML syntax is used for output. Note that the format of the +documents coming from the backend server is immaterial: the parser will +deal with it automatically. If the optional second argument is set to +"Legacy", documents will be declared "Transitional", an option that may +be necessary if you are proxying pre-1998 content or working with defective +authoring/publishing tools.

+

In the second form, it will insert your own FPI. The optional second +argument determines whether SGML/HTML or XML/XHTML syntax will be used.

+

Starting at version 2.0, the default is changed to omitting any FPI, +on the grounds that no FPI is better than a bogus one. If your backend +generates decent HTML or XHTML, set it accordingly.

+
+
ProxyHTMLFixups
+
+

Syntax: ProxyHTMLFixups [lowercase] [dospath] [reset]

+

This directive takes one to three arguments as follows:

+
    +
  • lowercase Urls are rewritten to lowercase
  • +
  • dospath Backslashes in URLs are rewritten to forward slashes.
  • +
  • reset Unset any options set at a higher level in the configuration.
  • +
+

Take care when using these. The fixes will correct certain authoring +mistakes, but risk also erroneously fixing links that were correct to start with. +Only use them if you know you have a broken backend server.

+
+
ProxyHTMLMeta
+

Syntax ProxyHTMLMeta [On|Off]

+

Parses <meta http-equiv ...> elements to real HTTP +headers.

+
+
ProxyHTMLExtended
+

Syntax ProxyHTMLExtended [On|Off]

+

Set to Off, this gives the same behaviour as 1.x versions +of mod_proxy_html. HTML links are rewritten according the ProxyHTMLURLMap +directives, but links appearing in Javascript and CSS are ignored.

+

Set to On, all scripting events and embedded scripts or +stylesheets are also processed by the ProxyHTMLURLMap rules, according to +the flags set for each rule. Since this requires more parsing, performance +will be best if you only enable it when strictly necessary.

+
+
ProxyHTMLStripComments
+

Syntax ProxyHTMLStripComments [On|Off]

+

This directive will cause mod_proxy_html to strip HTML comments. +Note that this will also kill off any scripts or styles embedded in +comments (a bogosity introduced in 1995/6 with Netscape 2 for the +benefit of then-older browsers, but still in use today). +It may also interfere with comment-based processors such as SSI or ESI: +be sure to run any of those before mod_proxy_html in the +filter chain if stripping comments!

+
+
ProxyHTMLLogVerbose
+

Syntax ProxyHTMLLogVerbose [On|Off]

+

Turns on verbose logging. This causes mod_proxy_html to make +error log entries (at LogLevel Info) about charset +detection and about all meta substitutions and rewrites made. +When Off, only errors and warnings (if any) are logged.

+
+
ProxyHTMLBufSize
+

Syntax ProxyHTMLBufSize nnnn

+

Set the buffer size increment for buffering inline stylesheets and scripts.

+

In order to parse non-HTML content (stylesheets and scripts), mod_proxy_html +has to read the entire script or stylesheet into a buffer. This buffer will +be expanded as necessary to hold the largest script or stylesheet in a page, +in increments of [nnnn] as set by this directive.

+

The default is 8192, and will work well for almost all pages. However, +if you know you're proxying a lot of pages containing stylesheets and/or +scripts bigger than 8K, it will be more efficient to set a larger buffer +size and avoid the need to resize the buffer dynamically during a request. +

+
+
+
+ diff --git a/guide.html b/guide.html new file mode 100644 index 0000000..50ffee8 --- /dev/null +++ b/guide.html @@ -0,0 +1,202 @@ + + + +Technical guide: mod_proxy_html + + +
+

mod_proxy_html: Technical Guide

+

mod_proxy_html Version 2.4 (Sept-Nov 2004).

+

Contents

+ +

URL Rewriting

+

Rewriting URLs into a proxy's address space is of course the primary +purpose of this module. From Version 2.0, this capability has been +extended from rewriting HTML URLs to processing scripts and stylesheets +that may contain URLs.

+

Because the module doesn't contain parsers for javascript or CSS, this +additional processing means we have had to introduce some heuristic parsing. +What that means is that the parser cannot automatically distinguish between +a URL that should be replaced and one that merely appears as text. It's +up to you to match the right things! To help you do this, we have introduced +some new features:

+
    +
  1. The ProxyHTMLExtended directive. The extended processing +will only be activated if this is On. The default is Off, which gives you +the old behaviour.
  2. +
  3. Regular Expression match-and-replace. This can be used anywhere, +but is most useful where context information can help distinguish URLs +that should be replaced and avoid false positives. For example, +to rewrite URLs of CSS @import, we might define a rule
    +ProxyHTMLURLMap url\(http://internal.example.com([^\)]*)\) url(http://proxy.example.com$1) Rihe
    +This explicitly rewrites from one servername to another, and uses regexp +memory to match a path and append it unchanged in $1, while using the +url(...) context to reduce the danger of a match that shouldn't +be rewritten. The R flag invokes regexp processing for this rule; +i makes the match case-insensitive; while h and e +save processing cycles by preventing the match being applied to HTML links +and scripting events, where it is clearly irrelevant.
  4. +
+

HTML Links

+

HTML links are those attributes defined by the HTML 4 and XHTML 1 +DTDs as of type %URI. For example, the href +attribute of the a element. For a full list, see the +declaration of linked_elts in pstartElement. +Rules are applicable provided the h flag is not set.

+

An HTML link always contains exactly one URL. So whenever mod_proxy_html +finds a matching ProxyHTMLURLMap rule, it will apply the +transformation once and stop processing the attribute.

+

Scripting Events

+

Scripting events are the contents of event attributes as defined in the +HTML4 and XHTML1 DTDs; for example onclick. For a full list, +see the declaration of events in pstartElement. +Rules are applicable provided the e flag is not set.

+

A scripting event may contain more than one URL, and will contain other +text. So when ProxyHTMLExtended is On, all applicable rules +will be applied in order until and unless a rule with the L flag +matches. A rule may match more than once, provided the matches do not +overlap, so a URL/pattern that appears more than once is rewritten +every time it matches.

+

Embedded Scripts and Stylesheets

+

Embedded scripts and stylesheets are the contents of +<script> and <style> elements. +Rules are applicable provided the c flag is not set.

+

A script or stylesheet may contain more than one URL, and will contain other +text. So when ProxyHTMLExtended is On, all applicable rules +will be applied in order until and unless a rule with the L flag +matches. A rule may match more than once, provided the matches do not +overlap, so a URL/pattern that appears more than once is rewritten +every time it matches.

+

Output Transformation

+

mod_proxy_html uses a SAX parser. This means that the input stream +- and hence the output generated - will be normalised in various ways, +even where nothing is actually rewritten. To an HTML or XML parser, +the document is not changed by normalisation, except as noted below. +Exceptions to this may arise where the input stream is malformed, when +the output of mod_proxy_html may be undefined. These should of course +be fixed at the backend: if mod_proxy_html doesn't work as expected, +then neither will browsers in real life, except by coincidence.

+

FPI (Doctype)

+

Strictly speaking, HTML and XHTML documents are required to have a +Formal Public Identifier (FPI), also know as a Document Type Declaration. +This references a Document Type Definition (DTD) which defines the grammar/ +syntax to which the contents of the document must conform.

+

The parser in mod_proxy_html loses any FPI in the input document, but +gives you the option to insert one. You may select either HTML or XHTML +(see below), and if your backend is sloppy you may also want to use the +"Legacy" keyword to make it declare documents "Transitional". You may +also declare a custom DTD, or (if your backend is seriously screwed +so no DTD would be appropriate) omit it altogether.

+

HTML vs XHTML

+

The differences between HTML 4.01 and XHTML 1.0 are essentially negligible, +and mod_proxy_html can transform between the two. You can safely select +either, regardless of what the backend generates, and mod_proxy_html will +apply the appropriate rules in generating output. HTML saves a few bytes.

+

If you declare a custom DTD, you should specify whether to generate +HTML or XHTML syntax in the output. This affects empty elements: +HTML <br> vs XHTML <br />.

+

Character Encoding

+

The parser uses UTF-8 (Unicode) internally, and +mod_proxy_html always generates output as UTF-8. This is +supported by all general-purpose web software, and supports more +character sets and languages than any other charset.

+

The character encoding should be declared in HTTP: for example
+Content-Type: text/html; charset=latin1
+mod_proxy_html has always supported this in its input, and ensured +this happens in output. But prior to version 2, it did not fully +support detection (sniffing) the charset when a backend fails to +set the HTTP Header.

+

From version 2.0, mod_proxy_html will detect the encoding of its input +as follows:

+
    +
  1. The HTTP headers, where available, always take precedence over other +information.
  2. +
  3. If the first 2-4 bytes are an XML Byte Order Mark (BOM), this is used.
  4. +
  5. If the document starts with an XML declaration +<?xml .... ?>, this determines encoding by XML rules.
  6. +
  7. If the document contains the HTML hack +<meta http-equiv="Content-Type" ...>, any charset declared +here is used.
  8. +
  9. In the absence of any of the above indications, the HTML-over-HTTP default +encoding ISO-8859-1 is assumed.
  10. +
  11. The parser is set to ignore invalid characters, so a malformed input +stream will generate glitches (unexpected characters) rather than risk +aborting a parse altogether.
  12. +
+

meta http-equiv support

+

The HTML meta element includes a form +<meta http-equiv="Some-Header" contents="some-value"> +which should notionally be converted to a real HTTP header by the webserver. +In practice, it is more commonly supported in browsers than servers, and +is common in constructs such as ClientPull (aka "meta refresh"). +The ProxyHTMLMeta directive supports the server generating +real HTTP headers from these. However, it does not strip them from the +HTML (except for Content-Type, which is removed in case it contains +conflicting charset information).

+

Other Fixups

+

For additional minor functions of mod_proxy_html, please see the +ProxyHTMLFixups and ProxyHTMLStripComments +directives in the Configuration Guide.

+

Debugging your Configuration

+

From Version 2.1, mod_proxy_html supports a ProxyHTMLLogVerbose +directive, to enable verbose logging at LogLevel Info. This +is designed to help with setting up your proxy configuration and +diagnosing unexpected behaviour; it is not recommended for normal +operation, and can be disabled altogether at compile time for extra +performance (see the top of the source).

+

When verbose logging is enabled, the following messages will be logged:

+
    +
  1. In Charset Detection, it will report what charset is +detected and how (HTTP rules, XML rules, or HTML rules). Note that, +regardless of verbose logging, an error or warning will be logged if an +unsupported charset is detected or if no information can be found.
  2. +
  3. When ProxyHTMLMeta is enabled, it logs each header/value +pair processed.
  4. +
  5. Whenever a ProxyHTMLURLMap rule matches and causes a +rewrite, it is logged. The message contains abbreviated context information: +H denotes an HTML link matched; E +denotes a match in a scripting event, C denotes a match +in an inline script or stylesheet. When the match is a regexp +find-and-replace, it is also marked as RX.
  6. +
+

Workarounds for Browser Bugs

+

Because mod_proxy_html unsets the Content-Length header, it risks +losing the performance advantage of HTTP Keep-Alive. It therefore sets +up HTTP Chunked Encoding when responding to HTTP/1.1 requests. This +enables keep-alive again for HTTP/1.1 agents.

+

Unfortunately some buggy agents will send an HTTP/1.1 request but +choke on an HTTP/1.1 response. Typically you will see numbers before +and after, and possibly in the middle of, a page. To work around this, set the +force-response-1.0 environment variable in httpd.conf. +For example,
BrowserMatch MSIE force-response-1.0

+
+ diff --git a/mod_proxy_html.c b/mod_proxy_html.c new file mode 100644 index 0000000..dfdbf60 --- /dev/null +++ b/mod_proxy_html.c @@ -0,0 +1,1041 @@ +/******************************************************************** + Copyright (c) 2003-4, WebThing Ltd + Author: Nick Kew + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +*********************************************************************/ + + +/******************************************************************** + Note to Users + + You are requested to register as a user, at + http://apache.webthing.com/registration.html + + This entitles you to support from the developer. + I'm unlikely to reply to help/support requests from + non-registered users, unless you're paying and/or offering + constructive feedback such as bug reports or sensible + suggestions for further development. + + It also makes a small contribution to the effort + that's gone into developing this work. +*********************************************************************/ + +/* End of Notices */ + + + + +/* GO_FASTER + + You can #define GO_FASTER to disable informational logging. + This disables the ProxyHTMLLogVerbose option altogether. + + Default is to leave it undefined, and enable verbose logging + as a configuration option. Binaries are supplied with verbose + logging enabled. +*/ + +#ifdef GO_FASTER +#define VERBOSE(x) +#else +#define VERBOSE(x) if ( verbose ) x +#endif + +#define VERSION_STRING "proxy_html/2.4" + +#include + +/* libxml */ +#include + +/* apache */ +#include +#include +#include +#include + +module AP_MODULE_DECLARE_DATA proxy_html_module ; + +#define M_HTML 0x01 +#define M_EVENTS 0x02 +#define M_CDATA 0x04 +#define M_REGEX 0x08 +#define M_ATSTART 0x10 +#define M_ATEND 0x20 +#define M_LAST 0x40 + +typedef struct { + unsigned int start ; + unsigned int end ; +} meta ; +typedef struct urlmap { + struct urlmap* next ; + unsigned int flags ; + union { + const char* c ; + regex_t* r ; + } from ; + const char* to ; +} urlmap ; +typedef struct { + urlmap* map ; + const char* doctype ; + const char* etag ; + unsigned int flags ; + int extfix ; + int metafix ; + int strip_comments ; +#ifndef GO_FASTER + int verbose ; +#endif + size_t bufsz ; +} proxy_html_conf ; +typedef struct { + htmlSAXHandlerPtr sax ; + ap_filter_t* f ; + proxy_html_conf* cfg ; + htmlParserCtxtPtr parser ; + apr_bucket_brigade* bb ; + char* buf ; + size_t offset ; + size_t avail ; +} saxctxt ; + +static int is_empty_elt(const char* name) { + const char** p ; + static const char* empty_elts[] = { + "br" , + "link" , + "img" , + "hr" , + "input" , + "meta" , + "base" , + "area" , + "param" , + "col" , + "frame" , + "isindex" , + "basefont" , + NULL + } ; + for ( p = empty_elts ; *p ; ++p ) + if ( !strcmp( *p, name) ) + return 1 ; + return 0 ; +} + +typedef struct { + const char* name ; + const char** attrs ; +} elt_t ; + +#define NORM_LC 0x1 +#define NORM_MSSLASH 0x2 +#define NORM_RESET 0x4 + +typedef enum { ATTR_IGNORE, ATTR_URI, ATTR_EVENT } rewrite_t ; + +static void normalise(unsigned int flags, char* str) { + xmlChar* p ; + if ( flags & NORM_LC ) + for ( p = str ; *p ; ++p ) + if ( isupper(*p) ) + *p = tolower(*p) ; + + if ( flags & NORM_MSSLASH ) + for ( p = strchr(str, '\\') ; p ; p = strchr(p+1, '\\') ) + *p = '/' ; + +} + +#define FLUSH ap_fwrite(ctx->f->next, ctx->bb, (chars+begin), (i-begin)) ; begin = i+1 +static void pcharacters(void* ctxt, const xmlChar *chars, int length) { + saxctxt* ctx = (saxctxt*) ctxt ; + int i ; + int begin ; + for ( begin=i=0; if->next, ctx->bb, "&") ; break ; + case '<' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, "<") ; break ; + case '>' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, ">") ; break ; + case '"' : FLUSH ; ap_fputs(ctx->f->next, ctx->bb, """) ; break ; + default : break ; + } + } + FLUSH ; +} +static void preserve(saxctxt* ctx, const size_t len) { + char* newbuf ; + if ( len <= ( ctx->avail - ctx->offset ) ) + return ; + else while ( len > ( ctx->avail - ctx->offset ) ) + ctx->avail += ctx->cfg->bufsz ; + + newbuf = realloc(ctx->buf, ctx->avail) ; + if ( newbuf != ctx->buf ) { + if ( ctx->buf ) + apr_pool_cleanup_kill(ctx->f->r->pool, ctx->buf, (void*)free) ; + apr_pool_cleanup_register(ctx->f->r->pool, newbuf, + (void*)free, apr_pool_cleanup_null); + ctx->buf = newbuf ; + } +} +static void pappend(saxctxt* ctx, const char* buf, const size_t len) { + preserve(ctx, len) ; + memcpy(ctx->buf+ctx->offset, buf, len) ; + ctx->offset += len ; +} +static void dump_content(saxctxt* ctx) { + urlmap* m ; + char* found ; + size_t s_from, s_to ; + size_t match ; + char c = 0 ; + int nmatch ; + regmatch_t pmatch[10] ; + char* subs ; + size_t len, offs ; +#ifndef GO_FASTER + int verbose = ctx->cfg->verbose ; +#endif + + pappend(ctx, &c, 1) ; /* append null byte */ + /* parse the text for URLs */ + for ( m = ctx->cfg->map ; m ; m = m->next ) { + if ( ! ( m->flags & M_CDATA ) ) + continue ; + if ( m->flags & M_REGEX ) { + nmatch = 10 ; + offs = 0 ; + while ( ! ap_regexec(m->from.r, ctx->buf+offs, nmatch, pmatch, 0) ) { + match = pmatch[0].rm_so ; + s_from = pmatch[0].rm_eo - match ; + subs = ap_pregsub(ctx->f->r->pool, m->to, ctx->buf+offs, + nmatch, pmatch) ; + s_to = strlen(subs) ; + len = strlen(ctx->buf) ; + offs += match ; + VERBOSE( { + const char* f = apr_pstrndup(ctx->f->r->pool, + ctx->buf + offs , s_from ) ; + ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r, + "C/RX: match at %s, substituting %s", f, subs) ; + } ) + if ( s_to > s_from) { + preserve(ctx, s_to - s_from) ; + memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from, + len + 1 - s_from - offs) ; + memcpy(ctx->buf+offs, subs, s_to) ; + } else { + memcpy(ctx->buf + offs, subs, s_to) ; + memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from, + len + 1 - s_from - offs) ; + } + offs += s_to ; + } + } else { + s_from = strlen(m->from.c) ; + s_to = strlen(m->to) ; + for ( found = strstr(ctx->buf, m->from.c) ; found ; + found = strstr(ctx->buf+match+s_to, m->from.c) ) { + match = found - ctx->buf ; + if ( ( m->flags & M_ATSTART ) && ( match != 0) ) + break ; + len = strlen(ctx->buf) ; + if ( ( m->flags & M_ATEND ) && ( match < (len - s_from) ) ) + continue ; + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r, + "C: matched %s, substituting %s", m->from.c, m->to) ) ; + if ( s_to > s_from ) { + preserve(ctx, s_to - s_from) ; + memmove(ctx->buf+match+s_to, ctx->buf+match+s_from, + len + 1 - s_from - match) ; + memcpy(ctx->buf+match, m->to, s_to) ; + } else { + memcpy(ctx->buf+match, m->to, s_to) ; + memmove(ctx->buf+match+s_to, ctx->buf+match+s_from, + len + 1 - s_from - match) ; + } + } + } + } + ap_fputs(ctx->f->next, ctx->bb, ctx->buf) ; +} +static void pcdata(void* ctxt, const xmlChar *chars, int length) { + saxctxt* ctx = (saxctxt*) ctxt ; + if ( ctx->cfg->extfix ) { + pappend(ctx, chars, length) ; + } else { + ap_fwrite(ctx->f->next, ctx->bb, chars, length) ; + } +} +static void pcomment(void* ctxt, const xmlChar *chars) { + saxctxt* ctx = (saxctxt*) ctxt ; + if ( ctx->cfg->strip_comments ) + return ; + + if ( ctx->cfg->extfix ) { + pappend(ctx, "", 3) ; + } else { + ap_fputstrs(ctx->f->next, ctx->bb, "", NULL) ; + } +} +static void pendElement(void* ctxt, const xmlChar* name) { + saxctxt* ctx = (saxctxt*) ctxt ; + if ( ctx->offset > 0 ) { + dump_content(ctx) ; + ctx->offset = 0 ; /* having dumped it, we can re-use the memory */ + } + if ( ! is_empty_elt(name) ) + ap_fprintf(ctx->f->next, ctx->bb, "", name) ; +} +static void pstartElement(void* ctxt, const xmlChar* name, + const xmlChar** attrs ) { + + int num_match ; + size_t offs, len ; + char* subs ; + rewrite_t is_uri ; + const char** linkattrs ; + const xmlChar** a ; + const elt_t* elt ; + const char** linkattr ; + urlmap* m ; + size_t s_to, s_from, match ; + char* found ; + saxctxt* ctx = (saxctxt*) ctxt ; + size_t nmatch ; + regmatch_t pmatch[10] ; +#ifndef GO_FASTER + int verbose = ctx->cfg->verbose ; +#endif + + static const char* href[] = { "href", NULL } ; + static const char* cite[] = { "cite", NULL } ; + static const char* action[] = { "action", NULL } ; + static const char* imgattr[] = { "src", "longdesc", "usemap", NULL } ; + static const char* inputattr[] = { "src", "usemap", NULL } ; + static const char* scriptattr[] = { "src", "for", NULL } ; + static const char* frameattr[] = { "src", "longdesc", NULL } ; + static const char* objattr[] = { "classid", "codebase", "data", "usemap", NULL } ; + static const char* profile[] = { "profile", NULL } ; + static const char* background[] = { "background", NULL } ; + static const char* codebase[] = { "codebase", NULL } ; + + static const elt_t linked_elts[] = { + { "a" , href } , + { "img" , imgattr } , + { "form", action } , + { "link" , href } , + { "script" , scriptattr } , + { "base" , href } , + { "area" , href } , + { "input" , inputattr } , + { "frame", frameattr } , + { "iframe", frameattr } , + { "object", objattr } , + { "q" , cite } , + { "blockquote" , cite } , + { "ins" , cite } , + { "del" , cite } , + { "head" , profile } , + { "body" , background } , + { "applet", codebase } , + { NULL, NULL } + } ; + static const char* events[] = { + "onclick" , + "ondblclick" , + "onmousedown" , + "onmouseup" , + "onmouseover" , + "onmousemove" , + "onmouseout" , + "onkeypress" , + "onkeydown" , + "onkeyup" , + "onfocus" , + "onblur" , + "onload" , + "onunload" , + "onsubmit" , + "onreset" , + "onselect" , + "onchange" , + NULL + } ; + + ap_fputc(ctx->f->next, ctx->bb, '<') ; + ap_fputs(ctx->f->next, ctx->bb, name) ; + + if ( attrs ) { + linkattrs = 0 ; + for ( elt = linked_elts; elt->name != NULL ; ++elt ) + if ( !strcmp(elt->name, name) ) { + linkattrs = elt->attrs ; + break ; + } + for ( a = attrs ; *a ; a += 2 ) { + ctx->offset = 0 ; + if ( a[1] ) { + pappend(ctx, a[1], strlen(a[1])+1) ; + is_uri = ATTR_IGNORE ; + if ( linkattrs ) { + for ( linkattr = linkattrs ; *linkattr ; ++linkattr) { + if ( !strcmp(*linkattr, *a) ) { + is_uri = ATTR_URI ; + break ; + } + } + } + if ( (is_uri == ATTR_IGNORE) && ctx->cfg->extfix ) { + for ( linkattr = events; *linkattr; ++linkattr ) { + if ( !strcmp(*linkattr, *a) ) { + is_uri = ATTR_EVENT ; + break ; + } + } + } + switch ( is_uri ) { + case ATTR_URI: + num_match = 0 ; + for ( m = ctx->cfg->map ; m ; m = m->next ) { + if ( ! ( m->flags & M_HTML ) ) + continue ; + if ( m->flags & M_REGEX ) { + nmatch = 10 ; + if ( ! ap_regexec(m->from.r, ctx->buf, nmatch, pmatch, 0) ) { + ++num_match ; + offs = match = pmatch[0].rm_so ; + s_from = pmatch[0].rm_eo - match ; + subs = ap_pregsub(ctx->f->r->pool, m->to, ctx->buf+offs, + nmatch, pmatch) ; + VERBOSE( { + const char* f = apr_pstrndup(ctx->f->r->pool, + ctx->buf + offs , s_from ) ; + ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r, + "H/RX: match at %s, substituting %s", f, subs) ; + } ) + s_to = strlen(subs) ; + len = strlen(ctx->buf) ; + if ( s_to > s_from) { + preserve(ctx, s_to - s_from) ; + memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from, + len + 1 - s_from - offs) ; + memcpy(ctx->buf+offs, subs, s_to) ; + } else { + memcpy(ctx->buf + offs, subs, s_to) ; + memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from, + len + 1 - s_from - offs) ; + } + } + } else { + s_from = strlen(m->from.c) ; + if ( ! strncasecmp(ctx->buf, m->from.c, s_from ) ) { + ++num_match ; + s_to = strlen(m->to) ; + len = strlen(ctx->buf) ; + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r, + "H: matched %s, substituting %s", m->from.c, m->to) ) ; + if ( s_to > s_from ) { + preserve(ctx, s_to - s_from) ; + memmove(ctx->buf+s_to, ctx->buf+s_from, + len + 1 - s_from ) ; + memcpy(ctx->buf, m->to, s_to) ; + } else { /* it fits in the existing space */ + memcpy(ctx->buf, m->to, s_to) ; + memmove(ctx->buf+s_to, ctx->buf+s_from, + len + 1 - s_from) ; + } + break ; + } + } + if ( num_match > 0 ) /* URIs only want one match */ + break ; + } + break ; + case ATTR_EVENT: + for ( m = ctx->cfg->map ; m ; m = m->next ) { + num_match = 0 ; /* reset here since we're working per-rule */ + if ( ! ( m->flags & M_EVENTS ) ) + continue ; + if ( m->flags & M_REGEX ) { + nmatch = 10 ; + offs = 0 ; + while ( ! ap_regexec(m->from.r, ctx->buf+offs, + nmatch, pmatch, 0) ) { + match = pmatch[0].rm_so ; + s_from = pmatch[0].rm_eo - match ; + subs = ap_pregsub(ctx->f->r->pool, m->to, ctx->buf+offs, + nmatch, pmatch) ; + VERBOSE( { + const char* f = apr_pstrndup(ctx->f->r->pool, + ctx->buf + offs , s_from ) ; + ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r, + "E/RX: match at %s, substituting %s", f, subs) ; + } ) + s_to = strlen(subs) ; + offs += match ; + len = strlen(ctx->buf) ; + if ( s_to > s_from) { + preserve(ctx, s_to - s_from) ; + memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from, + len + 1 - s_from - offs) ; + memcpy(ctx->buf+offs, subs, s_to) ; + } else { + memcpy(ctx->buf + offs, subs, s_to) ; + memmove(ctx->buf+offs+s_to, ctx->buf+offs+s_from, + len + 1 - s_from - offs) ; + } + offs += s_to ; + ++num_match ; + } + } else { + found = strstr(ctx->buf, m->from.c) ; + if ( (m->flags & M_ATSTART) && ( found != ctx->buf) ) + continue ; + while ( found ) { + s_from = strlen(m->from.c) ; + s_to = strlen(m->to) ; + match = found - ctx->buf ; + if ( ( s_from < strlen(found) ) && (m->flags & M_ATEND ) ) { + found = strstr(ctx->buf+match+s_from, m->from.c) ; + continue ; + } else { + found = strstr(ctx->buf+match+s_to, m->from.c) ; + } + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, ctx->f->r, + "E: matched %s, substituting %s", m->from.c, m->to) ) ; + len = strlen(ctx->buf) ; + if ( s_to > s_from ) { + preserve(ctx, s_to - s_from) ; + memmove(ctx->buf+match+s_to, ctx->buf+match+s_from, + len + 1 - s_from - match) ; + memcpy(ctx->buf+match, m->to, s_to) ; + } else { + memcpy(ctx->buf+match, m->to, s_to) ; + memmove(ctx->buf+match+s_to, ctx->buf+match+s_from, + len + 1 - s_from - match) ; + } + ++num_match ; + } + } + if ( num_match && ( m->flags & M_LAST ) ) + break ; + } + break ; + case ATTR_IGNORE: + break ; + } + } + if ( ! a[1] ) + ap_fputstrs(ctx->f->next, ctx->bb, " ", a[0], NULL) ; + else { + + if ( ctx->cfg->flags != 0 ) + normalise(ctx->cfg->flags, ctx->buf) ; + + /* write the attribute, using pcharacters to html-escape + anything that needs it in the value. + */ + ap_fputstrs(ctx->f->next, ctx->bb, " ", a[0], "=\"", NULL) ; + pcharacters(ctx, ctx->buf, strlen(ctx->buf)) ; + ap_fputc(ctx->f->next, ctx->bb, '"') ; + } + } + } + ctx->offset = 0 ; + if ( is_empty_elt(name) ) + ap_fputs(ctx->f->next, ctx->bb, ctx->cfg->etag) ; + else + ap_fputc(ctx->f->next, ctx->bb, '>') ; +} +static htmlSAXHandlerPtr setupSAX(apr_pool_t* pool) { + htmlSAXHandlerPtr sax = apr_pcalloc(pool, sizeof(htmlSAXHandler) ) ; + sax->startDocument = NULL ; + sax->endDocument = NULL ; + sax->startElement = pstartElement ; + sax->endElement = pendElement ; + sax->characters = pcharacters ; + sax->comment = pcomment ; + sax->cdataBlock = pcdata ; + return sax ; +} + +static regex_t* seek_meta_ctype ; +static regex_t* seek_charset ; +static regex_t* seek_meta ; + +static void proxy_html_child_init(apr_pool_t* pool, server_rec* s) { + seek_meta_ctype = ap_pregcomp(pool, + "(]*http-equiv[ \t\r\n='\"]*content-type[^>]*>)", + REG_EXTENDED|REG_ICASE) ; + seek_charset = ap_pregcomp(pool, "charset=([A-Za-z0-9_-]+)", + REG_EXTENDED|REG_ICASE) ; + seek_meta = ap_pregcomp(pool, "]*(http-equiv)[^>]*>", + REG_EXTENDED|REG_ICASE) ; +} + +static xmlCharEncoding sniff_encoding(request_rec* r, const char* cbuf, size_t bytes +#ifndef GO_FASTER + , int verbose +#endif + ) { + xmlCharEncoding ret ; + char* encoding = NULL ; + char* p ; + char* q ; + regmatch_t match[2] ; + unsigned char* buf = (unsigned char*)cbuf ; + + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, + "Content-Type is %s", r->content_type) ) ; + +/* If we've got it in the HTTP headers, there's nothing to do */ + if ( r->content_type && + ( p = ap_strcasestr(r->content_type, "charset=") , p > 0 ) ) { + p += 8 ; + if ( encoding = apr_pstrndup(r->pool, p, strcspn(p, " ;") ) , encoding ) { + if ( ret = xmlParseCharEncoding(encoding), + ret != XML_CHAR_ENCODING_ERROR ) { + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, + "Got charset %s from HTTP headers", encoding) ) ; + return ret ; + } else { + ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, + "Unsupported charset %s in HTTP headers", encoding) ; + encoding = NULL ; + } + } + } + +/* to sniff, first we look for BOM */ + if ( ret = xmlDetectCharEncoding(buf, bytes), + ret != XML_CHAR_ENCODING_NONE ) { + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, + "Got charset from XML rules.") ) ; + return ret ; + } + +/* If none of the above, look for a META-thingey */ + encoding = NULL ; + if ( ap_regexec(seek_meta_ctype, buf, 1, match, 0) == 0 ) { + p = apr_pstrndup(r->pool, buf + match[0].rm_so, + match[0].rm_eo - match[0].rm_so) ; + if ( ap_regexec(seek_charset, p, 2, match, 0) == 0 ) + encoding = apr_pstrndup(r->pool, p+match[1].rm_so, + match[1].rm_eo - match[1].rm_so) ; + } + +/* either it's set to something we found or it's still the default */ + if ( encoding ) + if ( ret = xmlParseCharEncoding(encoding), + ret != XML_CHAR_ENCODING_ERROR ) { + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, + "Got charset %s from HTML META", encoding) ) ; + return ret ; + } else { + ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, + "Unsupported charset %s in HTML META", encoding) ; + } + +/* the old HTTP default is a last resort */ + ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r, + "No usable charset information: using old HTTP default LATIN1") ; + return XML_CHAR_ENCODING_8859_1 ; +} +static meta* metafix(request_rec* r, const char* buf /*, size_t bytes*/ +#ifndef GO_FASTER + , int verbose +#endif + ) { + meta* ret = NULL ; + size_t offs = 0 ; + const char* p ; + const char* q ; + char* header ; + char* content ; + regmatch_t pmatch[2] ; + char delim ; + + while ( ! ap_regexec(seek_meta, buf+offs, 2, pmatch, 0) ) { + header = NULL ; + content = NULL ; + p = buf+offs+pmatch[1].rm_eo ; + while ( !isalpha(*++p) ) ; + for ( q = p ; isalnum(*q) || (*q == '-') ; ++q ) ; + header = apr_pstrndup(r->pool, p, q-p) ; + if ( strncasecmp(header, "Content-", 8) ) { +/* find content=... string */ + for ( p = strstr(buf+offs+pmatch[0].rm_so, "content") ; *p ; ) { + p += 7 ; + while ( *p && isspace(*p) ) + ++p ; + if ( *p != '=' ) + continue ; + while ( *p && isspace(*++p) ) ; + if ( ( *p == '\'' ) || ( *p == '"' ) ) { + delim = *p++ ; + for ( q = p ; *q != delim ; ++q ) ; + } else { + for ( q = p ; *q && !isspace(*q) && (*q != '>') ; ++q ) ; + } + content = apr_pstrndup(r->pool, p, q-p) ; + break ; + } + } else if ( !strncasecmp(header, "Content-Type", 12) ) { + ret = apr_palloc(r->pool, sizeof(meta) ) ; + ret->start = pmatch[0].rm_so ; + ret->end = pmatch[0].rm_eo ; + } + if ( header && content ) { + VERBOSE( ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, + "Adding header [%s: %s] from HTML META", header, content) ) ; + apr_table_setn(r->headers_out, header, content) ; + } + offs += pmatch[0].rm_eo ; + } + return ret ; +} + +static int proxy_html_filter_init(ap_filter_t* f) { + const char* env ; + saxctxt* fctx ; + +#if 0 +/* remove content-length filter */ + ap_filter_rec_t* clf = ap_get_output_filter_handle("CONTENT_LENGTH") ; + ap_filter_t* ff = f->next ; + + do { + ap_filter_t* fnext = ff->next ; + if ( ff->frec == clf ) + ap_remove_output_filter(ff) ; + ff = fnext ; + } while ( ff ) ; +#endif + + fctx = f->ctx = apr_pcalloc(f->r->pool, sizeof(saxctxt)) ; + fctx->sax = setupSAX(f->r->pool) ; + fctx->f = f ; + fctx->bb = apr_brigade_create(f->r->pool, f->r->connection->bucket_alloc) ; + fctx->cfg = ap_get_module_config(f->r->per_dir_config,&proxy_html_module); + + if ( f->r->proto_num >= 1001 ) { + if ( ! f->r->main && ! f->r->prev ) { + env = apr_table_get(f->r->subprocess_env, "force-response-1.0") ; + if ( !env ) + f->r->chunked = 1 ; + } + } + + apr_table_unset(f->r->headers_out, "Content-Length") ; + apr_table_unset(f->r->headers_out, "ETag") ; + return OK ; +} +static saxctxt* check_filter_init (ap_filter_t* f) { + + const char* errmsg = NULL ; + if ( ! f->r->proxyreq ) { + errmsg = "Non-proxy request; not inserting proxy-html filter" ; + } else if ( ! f->r->content_type ) { + errmsg = "No content-type; bailing out of proxy-html filter" ; + } else if ( strncasecmp(f->r->content_type, "text/html", 9) && + strncasecmp(f->r->content_type, "application/xhtml+xml", 21) ) { + errmsg = "Non-HTML content; not inserting proxy-html filter" ; + } + + if ( errmsg ) { +#ifndef GO_FASTER + proxy_html_conf* cfg + = ap_get_module_config(f->r->per_dir_config, &proxy_html_module); + if ( cfg->verbose ) { + ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, f->r, errmsg) ; + } +#endif + ap_remove_output_filter(f) ; + return NULL ; + } + if ( ! f->ctx ) + proxy_html_filter_init(f) ; + return f->ctx ; +} +static int proxy_html_filter(ap_filter_t* f, apr_bucket_brigade* bb) { + apr_bucket* b ; + meta* m = NULL ; + xmlCharEncoding enc ; + const char* buf = 0 ; + apr_size_t bytes = 0 ; + int xmlopts = XML_PARSE_RECOVER | XML_PARSE_NONET | + XML_PARSE_NOBLANKS | XML_PARSE_NOERROR | XML_PARSE_NOWARNING ; + + saxctxt* ctxt = check_filter_init(f) ; + if ( ! ctxt ) + return ap_pass_brigade(f->next, bb) ; + + for ( b = APR_BRIGADE_FIRST(bb) ; + b != APR_BRIGADE_SENTINEL(bb) ; + b = APR_BUCKET_NEXT(b) ) { + if ( APR_BUCKET_IS_EOS(b) ) { + if ( ctxt->parser != NULL ) { + htmlParseChunk(ctxt->parser, buf, 0, 1) ; + } + APR_BRIGADE_INSERT_TAIL(ctxt->bb, + apr_bucket_eos_create(ctxt->bb->bucket_alloc) ) ; + ap_pass_brigade(ctxt->f->next, ctxt->bb) ; + } else if ( apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ) + == APR_SUCCESS ) { + if ( ctxt->parser == NULL ) { + if ( buf[bytes] != 0 ) { + /* make a string for parse routines to play with */ + char* buf1 = apr_palloc(f->r->pool, bytes+1) ; + memcpy(buf1, buf, bytes) ; + buf1[bytes] = 0 ; + buf = buf1 ; + } +#ifndef GO_FASTER + enc = sniff_encoding(f->r, buf, bytes, ctxt->cfg->verbose) ; + if ( ctxt->cfg->metafix ) + m = metafix(f->r, buf, ctxt->cfg->verbose) ; +#else + enc = sniff_encoding(f->r, buf, bytes) ; + if ( ctxt->cfg->metafix ) + m = metafix(f->r, buf) ; +#endif + ap_set_content_type(f->r, "text/html;charset=utf-8") ; + ap_fputs(f->next, ctxt->bb, ctxt->cfg->doctype) ; + if ( m ) { + ctxt->parser = htmlCreatePushParserCtxt(ctxt->sax, ctxt, + buf, m->start, 0, enc ) ; + htmlParseChunk(ctxt->parser, buf+m->end, bytes-m->end, 0) ; + } else { + ctxt->parser = htmlCreatePushParserCtxt(ctxt->sax, ctxt, + buf, bytes, 0, enc ) ; + } + apr_pool_cleanup_register(f->r->pool, ctxt->parser, + (void*)htmlFreeParserCtxt, apr_pool_cleanup_null) ; + if ( xmlopts = xmlCtxtUseOptions(ctxt->parser, xmlopts ), xmlopts ) + ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, f->r, + "Unsupported parser opts %x", xmlopts) ; + } else { + htmlParseChunk(ctxt->parser, buf, bytes, 0) ; + } + } else { + ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, f->r, "Error in bucket read") ; + } + } + /*ap_fflush(ctxt->f->next, ctxt->bb) ; // uncomment for debug */ + apr_brigade_cleanup(bb) ; + return APR_SUCCESS ; +} +static const char* fpi_html = + "\n" ; +static const char* fpi_html_legacy = + "\n" ; +static const char* fpi_xhtml = + "\n" ; +static const char* fpi_xhtml_legacy = + "\n" ; +static const char* html_etag = ">" ; +static const char* xhtml_etag = " />" ; +/*#define DEFAULT_DOCTYPE fpi_html */ +static const char* DEFAULT_DOCTYPE = "" ; +#define DEFAULT_ETAG html_etag + +static void* proxy_html_config(apr_pool_t* pool, char* x) { + proxy_html_conf* ret = apr_pcalloc(pool, sizeof(proxy_html_conf) ) ; + ret->doctype = DEFAULT_DOCTYPE ; + ret->etag = DEFAULT_ETAG ; + ret->bufsz = 8192 ; + return ret ; +} +static void* proxy_html_merge(apr_pool_t* pool, void* BASE, void* ADD) { + proxy_html_conf* base = (proxy_html_conf*) BASE ; + proxy_html_conf* add = (proxy_html_conf*) ADD ; + proxy_html_conf* conf = apr_palloc(pool, sizeof(proxy_html_conf)) ; + + if ( add->map && base->map ) { + urlmap* a ; + conf->map = NULL ; + for ( a = base->map ; a ; a = a->next ) { + urlmap* save = conf->map ; + conf->map = apr_pmemdup(pool, a, sizeof(urlmap)) ; + conf->map->next = save ; + } + for ( a = add->map ; a ; a = a->next ) { + urlmap* save = conf->map ; + conf->map = apr_pmemdup(pool, a, sizeof(urlmap)) ; + conf->map->next = save ; + } + } else + conf->map = add->map ? add->map : base->map ; + + conf->doctype = ( add->doctype == DEFAULT_DOCTYPE ) + ? base->doctype : add->doctype ; + conf->etag = ( add->etag == DEFAULT_ETAG ) ? base->etag : add->etag ; + conf->bufsz = add->bufsz ; + if ( add->flags & NORM_RESET ) { + conf->flags = add->flags ^ NORM_RESET ; + conf->metafix = add->metafix ; + conf->extfix = add->extfix ; + conf->strip_comments = add->strip_comments ; +#ifndef GO_FASTER + conf->verbose = add->verbose ; +#endif + } else { + conf->flags = base->flags | add->flags ; + conf->metafix = base->metafix | add->metafix ; + conf->extfix = base->extfix | add->extfix ; + conf->strip_comments = base->strip_comments | add->strip_comments ; +#ifndef GO_FASTER + conf->verbose = base->verbose | add->verbose ; +#endif + } + return conf ; +} +#define REGFLAG(n,s,c) ( (s&&(strchr((s),(c))!=NULL)) ? (n) : 0 ) +#define XREGFLAG(n,s,c) ( (!s||(strchr((s),(c))==NULL)) ? (n) : 0 ) +static const char* set_urlmap(cmd_parms* cmd, void* CFG, + const char* from, const char* to, const char* flags) { + int regflags ; + proxy_html_conf* cfg = (proxy_html_conf*)CFG ; + urlmap* map ; + urlmap* newmap = apr_palloc(cmd->pool, sizeof(urlmap) ) ; + + newmap->next = NULL ; + newmap->flags + = XREGFLAG(M_HTML,flags,'h') + | XREGFLAG(M_EVENTS,flags,'e') + | XREGFLAG(M_CDATA,flags,'c') + | REGFLAG(M_ATSTART,flags,'^') + | REGFLAG(M_ATEND,flags,'$') + | REGFLAG(M_REGEX,flags,'R') + | REGFLAG(M_LAST,flags,'L') + ; + + if ( cfg->map ) { + for ( map = cfg->map ; map->next ; map = map->next ) ; + map->next = newmap ; + } else + cfg->map = newmap ; + + if ( ! (newmap->flags & M_REGEX) ) { + newmap->from.c = apr_pstrdup(cmd->pool, from) ; + newmap->to = apr_pstrdup(cmd->pool, to) ; + } else { + regflags + = REGFLAG(REG_EXTENDED,flags,'x') + | REGFLAG(REG_ICASE,flags,'i') + | REGFLAG(REG_NOSUB,flags,'n') + | REGFLAG(REG_NEWLINE,flags,'s') + ; + newmap->from.r = ap_pregcomp(cmd->pool, from, regflags) ; + newmap->to = apr_pstrdup(cmd->pool, to) ; + } + return NULL ; +} +static const char* set_doctype(cmd_parms* cmd, void* CFG, const char* t, + const char* l) { + proxy_html_conf* cfg = (proxy_html_conf*)CFG ; + if ( !strcasecmp(t, "xhtml") ) { + cfg->etag = xhtml_etag ; + if ( l && !strcasecmp(l, "legacy") ) + cfg->doctype = fpi_xhtml_legacy ; + else + cfg->doctype = fpi_xhtml ; + } else if ( !strcasecmp(t, "html") ) { + cfg->etag = html_etag ; + if ( l && !strcasecmp(l, "legacy") ) + cfg->doctype = fpi_html_legacy ; + else + cfg->doctype = fpi_html ; + } else { + cfg->doctype = apr_pstrdup(cmd->pool, t) ; + if ( l && ( ( l[0] == 'x' ) || ( l[0] == 'X' ) ) ) + cfg->etag = xhtml_etag ; + else + cfg->etag = html_etag ; + } + return NULL ; +} +static void set_param(proxy_html_conf* cfg, const char* arg) { + if ( arg && *arg ) { + if ( !strcmp(arg, "lowercase") ) + cfg->flags |= NORM_LC ; + else if ( !strcmp(arg, "dospath") ) + cfg->flags |= NORM_MSSLASH ; + else if ( !strcmp(arg, "reset") ) + cfg->flags |= NORM_RESET ; + } +} +static const char* set_flags(cmd_parms* cmd, void* CFG, const char* arg1, + const char* arg2, const char* arg3) { + set_param( (proxy_html_conf*)CFG, arg1) ; + set_param( (proxy_html_conf*)CFG, arg2) ; + set_param( (proxy_html_conf*)CFG, arg3) ; + return NULL ; +} +static const command_rec proxy_html_cmds[] = { + AP_INIT_TAKE23("ProxyHTMLURLMap", set_urlmap, NULL, + RSRC_CONF|ACCESS_CONF, "Map URL From To" ) , + AP_INIT_TAKE12("ProxyHTMLDoctype", set_doctype, NULL, + RSRC_CONF|ACCESS_CONF, "(HTML|XHTML) [Legacy]" ) , + AP_INIT_TAKE123("ProxyHTMLFixups", set_flags, NULL, + RSRC_CONF|ACCESS_CONF, "Options are lowercase, dospath" ) , + AP_INIT_FLAG("ProxyHTMLMeta", ap_set_flag_slot, + (void*)APR_OFFSETOF(proxy_html_conf, metafix), + RSRC_CONF|ACCESS_CONF, "Fix META http-equiv elements" ) , + AP_INIT_FLAG("ProxyHTMLExtended", ap_set_flag_slot, + (void*)APR_OFFSETOF(proxy_html_conf, extfix), + RSRC_CONF|ACCESS_CONF, "Map URLs in Javascript and CSS" ) , + AP_INIT_FLAG("ProxyHTMLStripComments", ap_set_flag_slot, + (void*)APR_OFFSETOF(proxy_html_conf, strip_comments), + RSRC_CONF|ACCESS_CONF, "Strip out comments" ) , +#ifndef GO_FASTER + AP_INIT_FLAG("ProxyHTMLLogVerbose", ap_set_flag_slot, + (void*)APR_OFFSETOF(proxy_html_conf, verbose), + RSRC_CONF|ACCESS_CONF, "Verbose Logging (use with LogLevel Info)" ) , +#endif + AP_INIT_TAKE1("ProxyHTMLBufSize", ap_set_int_slot, + (void*)APR_OFFSETOF(proxy_html_conf, bufsz), + RSRC_CONF|ACCESS_CONF, "Buffer size" ) , + { NULL } +} ; +static int mod_proxy_html(apr_pool_t* p, apr_pool_t* p1, apr_pool_t* p2, + server_rec* s) { + ap_add_version_component(p, VERSION_STRING) ; + return OK ; +} +static void proxy_html_hooks(apr_pool_t* p) { + ap_register_output_filter("proxy-html", proxy_html_filter, + NULL, AP_FTYPE_RESOURCE) ; + ap_hook_post_config(mod_proxy_html, NULL, NULL, APR_HOOK_MIDDLE) ; + ap_hook_child_init(proxy_html_child_init, NULL, NULL, APR_HOOK_MIDDLE) ; +} +module AP_MODULE_DECLARE_DATA proxy_html_module = { + STANDARD20_MODULE_STUFF, + proxy_html_config, + proxy_html_merge, + NULL, + NULL, + proxy_html_cmds, + proxy_html_hooks +} ;