public class TestARCReader
extends org.archive.io.arc.ARCReader
It works as ArchiveReader reading from ARC file with just one WARC record at offset 0 (no version_block line).
record content is customized through WARCRecordInfo.
(TestWARCRecordInfo offers commonly-used default values and convenient factory
methods.
TestWARCReaderACTUAL_LENGTH_KEY, ALEXA_DAT_MIME, ARC_FILE_EXTENSION, ARC_GZIP_EXTRA_FIELD, ARC_MAGIC_NUMBER, ARC_META_CHARSET, CARRIAGE_RETURN_ORD, CHECKSUM_FIELD_KEY, CHECKSUM_HEADER_FIELD_KEY, CODE_HEADER_FIELD_KEY, COMPRESSED_ARC_FILE_EXTENSION, DATE_STRING_KEY, DECLARED_LENGTH_KEY, DEFAULT_ENCODING, DEFAULT_GZIP_HEADER_LENGTH, DEFAULT_MAX_ARC_FILE_SIZE, DEFAULT_MIME, DELIMITER, DNS_MIME, DOT_ARC_FILE_EXTENSION, DOT_COMPRESSED_ARC_FILE_EXTENSION, DOT_COMPRESSED_FILE_EXTENSION, FIELD_COUNT, FILEDESC_SCHEME, FILENAME_FIELD_KEY, FILENAME_HEADER_FIELD_KEY, GZIP_HEADER_BEGIN, HEADER_FIELD_SEPARATOR, HEADER_LENGTH, IP_HEADER_FIELD_KEY, IP_KEY, LEADING_NEWLINES_KEY, LINE_SEPARATOR, LOCATION_HEADER_FIELD_KEY, MAX_HEADER_LINE_LENGTH, MAX_META_LENGTH, MAX_METADATA_LINE_LENGTH, MIME_KEY, MINIMUM_RECORD_LENGTH, NEW_LINE_ORD, OFFSET_FIELD_KEY, OFFSET_HEADER_FIELD_KEY, REQUIRED_VERSION_1_HEADER_FIELDS, STATUSCODE_FIELD_KEY, TOKENIZED_PREFIX, TRAILING_CRUFT_KEY, TRAILING_NEWLINES_KEY, URL_KEYABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY| Constructor and Description |
|---|
TestARCReader(InputStream is) |
TestARCReader(org.archive.io.warc.WARCRecordInfo recinfo) |
| Modifier and Type | Method and Description |
|---|---|
InputStream |
buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo)
build minimally-conforming ARC record byte stream.
|
org.archive.io.arc.ARCRecord |
get(long offset) |
createArchiveRecord, createCDXIndexFile, dump, getDeleteFileOnCloseReader, getDotFileExtension, getFileExtension, getVersion, gotoEOR, isAlignedOnFirstRecord, isParseHttpHeaders, main, output, output, outputRecord, setAlignedOnFirstRecord, setParseHttpHeaderscdxOutput, cleanupCurrentRecord, close, currentRecord, get, getCurrentRecord, getFileName, getIn, getInputStream, getLogger, getOptions, getReaderIdentifier, getStrippedFileName, getStrippedFileName, getTrueOrFalse, initialize, isCompressed, isDigest, isStrict, isValid, iterator, logStdErr, outputRecord, positionForRecord, setCompressed, setDigest, setIn, setReaderIdentifier, setStrict, setVersion, stripExtension, validate, validatepublic TestARCReader(InputStream is)
public TestARCReader(org.archive.io.warc.WARCRecordInfo recinfo)
throws IOException
IOExceptionpublic org.archive.io.arc.ARCRecord get(long offset)
throws IOException
get in class org.archive.io.ArchiveReaderIOExceptionpublic InputStream buildRecordContent(org.archive.io.warc.WARCRecordInfo recinfo) throws IOException
recinfo - WARCRecordInfo with record metadata and content.
type parameter does not matter, be sure to set contentType to that of
payload.IOExceptionCopyright © 2005–2015 IIPC. All rights reserved.