public class TestWARCReader
extends org.archive.io.ArchiveReader
It works as ArchiveReader reading from WARC file with just one WARC record at offset 0 (there's no "warcinfo" record).
Content of the record is customized through WARCRecordInfo.
(TestWARCRecordInfo offers commonly-used default values and convenient factory
methods.
Typical test code would be:
String payload = "hogehogehogehogehoge"; WARCRecordInfo recinfo = TestWARCRecordInfo.createHttpResponse(payload); TestWARCReader ar = new TestWARCReader(recinfo); WARCRecord rec = (WARCRecord)ar.get(0);
| Modifier and Type | Field and Description |
|---|---|
static String |
CRLF |
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DOT_COMPRESSED_FILE_EXTENSION, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY| Constructor and Description |
|---|
TestWARCReader(InputStream is) |
TestWARCReader(org.archive.io.warc.WARCRecordInfo recinfo) |
| Modifier and Type | Method and Description |
|---|---|
static InputStream |
buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo)
build minimal WARC record byte stream.
|
protected org.archive.io.warc.WARCRecord |
createArchiveRecord(InputStream is,
long offset) |
void |
dump(boolean compress) |
org.archive.io.warc.WARCRecord |
get(long offset) |
org.archive.io.ArchiveReader |
getDeleteFileOnCloseReader(File f) |
String |
getDotFileExtension() |
String |
getFileExtension() |
protected void |
gotoEOR(org.archive.io.ArchiveRecord record) |
cdxOutput, cleanupCurrentRecord, close, currentRecord, get, getCurrentRecord, getFileName, getIn, getInputStream, getLogger, getOptions, getReaderIdentifier, getStrippedFileName, getStrippedFileName, getTrueOrFalse, getVersion, initialize, isCompressed, isDigest, isStrict, isValid, iterator, logStdErr, output, outputRecord, outputRecord, positionForRecord, setCompressed, setDigest, setIn, setReaderIdentifier, setStrict, setVersion, stripExtension, validate, validatepublic static final String CRLF
public TestWARCReader(InputStream is)
public TestWARCReader(org.archive.io.warc.WARCRecordInfo recinfo)
throws IOException
IOExceptionpublic org.archive.io.warc.WARCRecord get(long offset)
throws IOException
get in class org.archive.io.ArchiveReaderIOExceptionprotected org.archive.io.warc.WARCRecord createArchiveRecord(InputStream is, long offset) throws IOException
createArchiveRecord in class org.archive.io.ArchiveReaderIOExceptionprotected void gotoEOR(org.archive.io.ArchiveRecord record)
throws IOException
gotoEOR in class org.archive.io.ArchiveReaderIOExceptionpublic String getFileExtension()
getFileExtension in class org.archive.io.ArchiveReaderpublic String getDotFileExtension()
getDotFileExtension in class org.archive.io.ArchiveReaderpublic void dump(boolean compress)
throws IOException,
ParseException
dump in class org.archive.io.ArchiveReaderIOExceptionParseExceptionpublic org.archive.io.ArchiveReader getDeleteFileOnCloseReader(File f)
getDeleteFileOnCloseReader in class org.archive.io.ArchiveReaderpublic static InputStream buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo) throws IOException
recinfo - WARCRecordInfo with record metadata and contentIOExceptionCopyright © 2005–2015 IIPC. All rights reserved.