public class DupeTimestampBestStatusFilter extends WrappedProcessor
timestamp
prefix and filter out all but just
one CDX line per group.
Timestamp prefix is specified in terms of the number of digits
(from left). If two CDX lines have timestamp
s whose prefixes are
identical, they are considered to be in the same group.
It picks the first CDX line with the best (i.e. smallest)
statuscode
field within each group.
CDX lines with filename
that starts with any of prefixes
specified in noCollapsePrefix
are written out regardless of its
timestamp
or statuscode
, in addition to the one picked for
the group.
Instantiated by CDXServer
as part of CDX line processing pipeline.
Modifier and Type | Field and Description |
---|---|
protected int |
bestHttpCode |
protected String |
lastTimestamp |
protected String[] |
noCollapsePrefix |
protected int |
timestampDedupLength |
inner
Constructor and Description |
---|
DupeTimestampBestStatusFilter(BaseProcessor output,
int timestampDedupLength,
String[] noCollapsePrefix) |
Modifier and Type | Method and Description |
---|---|
protected boolean |
include(org.archive.format.cdx.CDXLine line) |
protected boolean |
isBlocked(org.archive.format.cdx.CDXLine line) |
protected boolean |
noCollapse(org.archive.format.cdx.CDXLine line) |
protected boolean |
passThrough(org.archive.format.cdx.CDXLine line)
Return
true if line is to be passed through,
as specified by noCollapsePrefix . |
int |
writeLine(org.archive.format.cdx.CDXLine line)
Process
line . |
begin, end, modifyOutputFormat, trackLine, writeResumeKey
protected String lastTimestamp
protected int bestHttpCode
protected int timestampDedupLength
protected String[] noCollapsePrefix
public DupeTimestampBestStatusFilter(BaseProcessor output, int timestampDedupLength, String[] noCollapsePrefix)
protected final boolean passThrough(org.archive.format.cdx.CDXLine line)
true
if line
is to be passed through,
as specified by noCollapsePrefix
.
Soft-blocked captures are also passed-through.
line
- CDX lineprotected final boolean isBlocked(org.archive.format.cdx.CDXLine line)
protected final boolean noCollapse(org.archive.format.cdx.CDXLine line)
public int writeLine(org.archive.format.cdx.CDXLine line)
BaseProcessor
line
.writeLine
in interface BaseProcessor
writeLine
in class WrappedProcessor
line
- CDXLine
line
is sent to output, 0 otherwise.protected boolean include(org.archive.format.cdx.CDXLine line)
Copyright © 2005–2017 IIPC. All rights reserved.