public class RotatingCharsetDetectorTest
extends junit.framework.TestCase
RotatingCharsetDetector.| Constructor and Description |
|---|
RotatingCharsetDetectorTest() |
| Modifier and Type | Method and Description |
|---|---|
protected WarcResource |
createResource(String payload,
String encoding) |
protected WarcResource |
createResource(String contentType,
String payload,
String encoding) |
void |
testContentTypeHeaderSniffer()
test of
ContentTypeHeaderSniffer |
void |
testFalseMetaUTF16()
content is UTF-8 encoded, but META tag says it's UTF-16.
|
void |
testFalseMetaUTF8()
content is UTF-16 encoded, but META tag says it's UTF-8.
|
void |
testXCharsetName()
test of
x- charset names. |
void |
testXUserDefined()
test of
x-user-defined charset name. |
countTestCases, createResult, getName, run, run, runBare, runTest, setName, setUp, tearDown, toStringassertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertFalse, assertFalse, assertNotNull, assertNotNull, assertNotSame, assertNotSame, assertNull, assertNull, assertSame, assertSame, assertTrue, assertTrue, fail, failprotected WarcResource createResource(String payload, String encoding) throws IOException
IOExceptionprotected WarcResource createResource(String contentType, String payload, String encoding) throws IOException
IOExceptionpublic void testFalseMetaUTF16()
throws Exception
PrescanMetadataSniffer overrides UTF-16 to UTF-8.Exceptionpublic void testXCharsetName()
throws Exception
x- charset names.Exceptionpublic void testXUserDefined()
throws Exception
x-user-defined charset name.
mapped to windows-1252.Exceptionpublic void testFalseMetaUTF8()
throws Exception
PrescanMetadataSniffer shall fail because it's UTF-16 encoded,
and UniversalChardetSniffer should detect UTF-16.
Unfortunately, this test fails currently. Universal Chardet returns
null for sample content, even with some non-ASCII chars. Hopefully
UTF-16 texts have BOM, or plenty of non-ASCII chars make Universal Chardet
work.
Exceptionpublic void testContentTypeHeaderSniffer()
throws Exception
ContentTypeHeaderSnifferExceptionCopyright © 2005–2015 IIPC. All rights reserved.