| Interface | Description |
|---|---|
| EncodingSniffer |
A step of character encoding sniffing.
|
| Class | Description |
|---|---|
| BaseEncodingSniffer |
Implements common utility methods for EncodingSniffer.
|
| ByteOrderMarkSniffer |
EncodingSniffer that peek the content for
Byte Order Mark bytes. |
| CharsetDetector |
Abstract class containing common methods for determining the character
encoding of a text Resource, most of which should be refactored into a
Util package.
|
| ContentTypeHeaderSniffer |
EncodingSniffer obtaining character encoding from
Content-Type HTTP header. |
| PrescanMetadataSniffer |
EncodingSniffer that pre-scan byte stream for
{@code
|
| RotatingCharsetDetector | |
| StandardCharsetDetector |
CharsetDetector that roughly follows steps prescribed by
WHAT-WG recommendation:,
with following simplifications:
no support for inheriting parent browsing context's character encoding
(information is not readily available to Wayback)
default is fixed to UTF-8, regardless of user's locale (a crawler's
locale information is not readily available to Wayback)
does not support confidence, thus does not support
encoding switching (this is more about CharsetDetector's
design)
|
| UniversalChardetSniffer |
EncodingSniffer that runs UniversalDetector on
the content. |
Copyright © 2005–2019 IIPC. All rights reserved.