public class TestWARCReader
extends org.archive.io.ArchiveReader
It works as ArchiveReader reading from WARC file with just one WARC record at offset 0 (there's no "warcinfo" record).
Content of the record is customized through WARCRecordInfo
.
(TestWARCRecordInfo
offers commonly-used default values and convenient factory
methods.
Typical test code would be:
String payload = "hogehogehogehogehoge"; WARCRecordInfo recinfo = TestWARCRecordInfo.createHttpResponse(payload); TestWARCReader ar = new TestWARCReader(recinfo); WARCRecord rec = (WARCRecord)ar.get(0);
Modifier and Type | Field and Description |
---|---|
static String |
CRLF |
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DOT_COMPRESSED_FILE_EXTENSION, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
Constructor and Description |
---|
TestWARCReader(InputStream is) |
TestWARCReader(org.archive.io.warc.WARCRecordInfo recinfo) |
Modifier and Type | Method and Description |
---|---|
static InputStream |
buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo)
build minimal WARC record byte stream.
|
protected org.archive.io.warc.WARCRecord |
createArchiveRecord(InputStream is,
long offset) |
void |
dump(boolean compress) |
org.archive.io.warc.WARCRecord |
get(long offset) |
org.archive.io.ArchiveReader |
getDeleteFileOnCloseReader(File f) |
String |
getDotFileExtension() |
String |
getFileExtension() |
protected void |
gotoEOR(org.archive.io.ArchiveRecord record) |
cdxOutput, cleanupCurrentRecord, close, currentRecord, get, getCurrentRecord, getFileName, getIn, getInputStream, getLogger, getOptions, getReaderIdentifier, getStrippedFileName, getStrippedFileName, getTrueOrFalse, getVersion, initialize, isCompressed, isDigest, isStrict, isValid, iterator, logStdErr, output, outputRecord, outputRecord, positionForRecord, setCompressed, setDigest, setIn, setReaderIdentifier, setStrict, setVersion, stripExtension, validate, validate
public static final String CRLF
public TestWARCReader(InputStream is)
public TestWARCReader(org.archive.io.warc.WARCRecordInfo recinfo) throws IOException
IOException
public org.archive.io.warc.WARCRecord get(long offset) throws IOException
get
in class org.archive.io.ArchiveReader
IOException
protected org.archive.io.warc.WARCRecord createArchiveRecord(InputStream is, long offset) throws IOException
createArchiveRecord
in class org.archive.io.ArchiveReader
IOException
protected void gotoEOR(org.archive.io.ArchiveRecord record) throws IOException
gotoEOR
in class org.archive.io.ArchiveReader
IOException
public String getFileExtension()
getFileExtension
in class org.archive.io.ArchiveReader
public String getDotFileExtension()
getDotFileExtension
in class org.archive.io.ArchiveReader
public void dump(boolean compress) throws IOException, ParseException
dump
in class org.archive.io.ArchiveReader
IOException
ParseException
public org.archive.io.ArchiveReader getDeleteFileOnCloseReader(File f)
getDeleteFileOnCloseReader
in class org.archive.io.ArchiveReader
public static InputStream buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo) throws IOException
recinfo
- WARCRecordInfo with record metadata and contentIOException
Copyright © 2005–2015 IIPC. All rights reserved.