public class TestARCReader
extends org.archive.io.arc.ARCReader
It works as ArchiveReader reading from ARC file with just one WARC record at offset 0 (no version_block line).
record content is customized through WARCRecordInfo
.
(TestWARCRecordInfo
offers commonly-used default values and convenient factory
methods.
TestWARCReader
ACTUAL_LENGTH_KEY, ALEXA_DAT_MIME, ARC_FILE_EXTENSION, ARC_GZIP_EXTRA_FIELD, ARC_MAGIC_NUMBER, ARC_META_CHARSET, CARRIAGE_RETURN_ORD, CHECKSUM_FIELD_KEY, CHECKSUM_HEADER_FIELD_KEY, CODE_HEADER_FIELD_KEY, COMPRESSED_ARC_FILE_EXTENSION, DATE_STRING_KEY, DECLARED_LENGTH_KEY, DEFAULT_ENCODING, DEFAULT_GZIP_HEADER_LENGTH, DEFAULT_MAX_ARC_FILE_SIZE, DEFAULT_MIME, DELIMITER, DNS_MIME, DOT_ARC_FILE_EXTENSION, DOT_COMPRESSED_ARC_FILE_EXTENSION, DOT_COMPRESSED_FILE_EXTENSION, FIELD_COUNT, FILEDESC_SCHEME, FILENAME_FIELD_KEY, FILENAME_HEADER_FIELD_KEY, GZIP_HEADER_BEGIN, HEADER_FIELD_SEPARATOR, HEADER_LENGTH, IP_HEADER_FIELD_KEY, IP_KEY, LEADING_NEWLINES_KEY, LINE_SEPARATOR, LOCATION_HEADER_FIELD_KEY, MAX_HEADER_LINE_LENGTH, MAX_META_LENGTH, MAX_METADATA_LINE_LENGTH, MIME_KEY, MINIMUM_RECORD_LENGTH, NEW_LINE_ORD, OFFSET_FIELD_KEY, OFFSET_HEADER_FIELD_KEY, REQUIRED_VERSION_1_HEADER_FIELDS, STATUSCODE_FIELD_KEY, TOKENIZED_PREFIX, TRAILING_CRUFT_KEY, TRAILING_NEWLINES_KEY, URL_KEY
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
Constructor and Description |
---|
TestARCReader(InputStream is) |
TestARCReader(org.archive.io.warc.WARCRecordInfo recinfo) |
Modifier and Type | Method and Description |
---|---|
InputStream |
buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo)
build minimally-conforming ARC record byte stream.
|
org.archive.io.arc.ARCRecord |
get(long offset) |
createArchiveRecord, createCDXIndexFile, dump, getDeleteFileOnCloseReader, getDotFileExtension, getFileExtension, getVersion, gotoEOR, isAlignedOnFirstRecord, isParseHttpHeaders, main, output, output, outputRecord, setAlignedOnFirstRecord, setParseHttpHeaders
cdxOutput, cleanupCurrentRecord, close, currentRecord, get, getCurrentRecord, getFileName, getIn, getInputStream, getLogger, getOptions, getReaderIdentifier, getStrippedFileName, getStrippedFileName, getTrueOrFalse, initialize, isCompressed, isDigest, isStrict, isValid, iterator, logStdErr, outputRecord, positionForRecord, setCompressed, setDigest, setIn, setReaderIdentifier, setStrict, setVersion, stripExtension, validate, validate
public TestARCReader(InputStream is)
public TestARCReader(org.archive.io.warc.WARCRecordInfo recinfo) throws IOException
IOException
public org.archive.io.arc.ARCRecord get(long offset) throws IOException
get
in class org.archive.io.ArchiveReader
IOException
public InputStream buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo) throws IOException
recinfo
- WARCRecordInfo with record metadata and content.
type parameter does not matter, be sure to set contentType to that of
payload.IOException
Copyright © 2005–2015 IIPC. All rights reserved.