This answers some of the most frequently asked questions that aren't dealt with (or that people overlook) in the documentation and the apachetutor tutorial. This was written for Version 2, and most of the questions are moot in Version 3.
That depends entirely on libxml2. mod_proxy_html supports charset detection, but does not itself support any charsets. It works by passing the charset detected to libxml2 when it sets up the parser.
This means that mod_proxy_html inherits its charset support from libxml2, and will always support exactly the same charsets available in the version of libxml2 you have installed. So bug the libxml2 folks, not us!
In Version 3, charset support is much expanded provided
ProxyHTMLMeta is enabled, and any charset can be supported
by aliasing it with
utf-8 internally for everything.
Generating output with another charset is therefore an additional
overhead, and the decision was taken to exclude any such capability
from mod_proxy_html. There is an easy workaround: you can transcode
the output using another filter, such as mod_charset_lite.
Version 3 supports output transformation to other
mod_proxy_html is based on W3C HTML 4.01 and XHTML 1.0 (which are identical in terms of elements and attributes). It supports all links defined in W3C HTML, even those that have been deprecated since 1997. But it does NOT support proprietary pseudo-HTML "extensions" that have never been part of any published HTML standard. Of course, it's trivial to add them to the source.
This has been the most commonly requested feature since mod_proxy_html 2.0 was released in 2004. It cannot reasonably be satisfied, because everyone's pet "extensions" are different. Version 3 deals with this by taking all HTML knowledge out of the code and loading it from httpd.conf instead, so admins can meet their own needs without recompiling.