public class RotatingCharsetDetectorTest
extends junit.framework.TestCase
RotatingCharsetDetector
.Constructor and Description |
---|
RotatingCharsetDetectorTest() |
Modifier and Type | Method and Description |
---|---|
protected WarcResource |
createResource(String payload,
String encoding) |
protected WarcResource |
createResource(String contentType,
String payload,
String encoding) |
void |
testContentTypeHeaderSniffer()
test of
ContentTypeHeaderSniffer |
void |
testFalseMetaUTF16()
content is UTF-8 encoded, but META tag says it's UTF-16.
|
void |
testFalseMetaUTF8()
content is UTF-16 encoded, but META tag says it's UTF-8.
|
void |
testXCharsetName()
test of
x- charset names. |
void |
testXUserDefined()
test of
x-user-defined charset name. |
countTestCases, createResult, getName, run, run, runBare, runTest, setName, setUp, tearDown, toString
assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertEquals, assertFalse, assertFalse, assertNotNull, assertNotNull, assertNotSame, assertNotSame, assertNull, assertNull, assertSame, assertSame, assertTrue, assertTrue, fail, fail
protected WarcResource createResource(String payload, String encoding) throws IOException
IOException
protected WarcResource createResource(String contentType, String payload, String encoding) throws IOException
IOException
public void testFalseMetaUTF16() throws Exception
PrescanMetadataSniffer
overrides UTF-16 to UTF-8.Exception
public void testXCharsetName() throws Exception
x-
charset names.Exception
public void testXUserDefined() throws Exception
x-user-defined
charset name.
mapped to windows-1252
.Exception
public void testFalseMetaUTF8() throws Exception
PrescanMetadataSniffer
shall fail because it's UTF-16 encoded,
and UniversalChardetSniffer
should detect UTF-16.
Unfortunately, this test fails currently. Universal Chardet returns
null
for sample content, even with some non-ASCII chars. Hopefully
UTF-16 texts have BOM, or plenty of non-ASCII chars make Universal Chardet
work.
Exception
public void testContentTypeHeaderSniffer() throws Exception
ContentTypeHeaderSniffer
Exception
Copyright © 2005–2015 IIPC. All rights reserved.