Interface | Description |
---|---|
EncodingSniffer |
A step of character encoding sniffing.
|
Class | Description |
---|---|
BaseEncodingSniffer |
Implements common utility methods for EncodingSniffer.
|
ByteOrderMarkSniffer |
EncodingSniffer that peek the content for
Byte Order Mark bytes. |
CharsetDetector |
Abstract class containing common methods for determining the character
encoding of a text Resource, most of which should be refactored into a
Util package.
|
ContentTypeHeaderSniffer |
EncodingSniffer obtaining character encoding from
Content-Type HTTP header. |
PrescanMetadataSniffer |
EncodingSniffer that pre-scan byte stream for
<meta http-equiv="content-type" ... > tag. |
RotatingCharsetDetector | |
StandardCharsetDetector |
CharsetDetector that roughly follows steps prescribed by
WHAT-WG recommendation:,
with following simplifications:
no support for inheriting parent browsing context's character encoding
(information is not readily available to Wayback)
default is fixed to UTF-8 , regardless of user's locale (a crawler's
locale information is not readily available to Wayback)
does not support confidence, thus does not support
encoding switching (this is more about CharsetDetector 's
design)
CHANGE 1.8.1 2014-07-07: added BOM detection as the first step. |
UniversalChardetSniffer |
EncodingSniffer that runs UniversalDetector on
the content. |
Copyright © 2005–2015 IIPC. All rights reserved.