public class ContextAwareLexer
extends NodeUtils
The Lexer that comes with htmlparser does not handle non-escaped HTML
entities within SCRIPT tags - by default, something like:
Can cause the lexer to skip over a large part of the document. Technically,
the above isn't legit HTML, but of course, folks do stuff like that all the
time. So, this class uses a ParseContext object, passed in at construction,
which observes the SCRIPT and STYLE tags, both setting properties on the
ParseContext, and using that state information to perform a parseCDATA()
call instead of a nextNode() call at the right time, to try to keep the
SAX parsing in sync with the document.
- Author:
- brad