Class | Description |
---|---|
BeanShellFilter | |
ClosestResultTrackingFilter |
Class which observes CaptureSearchResults, keeping track of the closest
result found to a given date.
|
CompositeExclusionFilter |
SearchResultFilter that abstracts multiple SearchResultFilters -- if all
filters return INCLUDE, then the result is included, but the first to
return ABORT or EXCLUDE short-circuits the rest
|
CompositeFilter |
Simple composite ObjectFilter - which includes only if all components include
|
ConditionalGetAnnotationFilter |
WARC file allows 2 forms of deduplication.
|
CounterFilter |
SearchResultFilter which INCLUDEs all checked records, but keeps track of
how many were seen during processing.
|
DateEmbargoFilter | |
DateRangeFilter |
SearchResultFilter that excludes records outside of start and end range.
|
DuplicateHashFilter | |
DuplicateRecordFilter |
ObjectFilter which omits exact duplicate URL+date records from a stream
of CaptureSearchResult.
|
DuplicateTimestampFilter | |
EndDateFilter |
SearchResultFilter which includes all records until 1 is found beyond end
date then it aborts processing.
|
ExclusionFilter | |
FilePrefixDateEmbargoFilter |
Blocks only files matching a given prefix, iff they are newer than a given
embargo period.
|
FilePrefixFilter | |
FileRegexFilter | |
GuardRailFilter |
SearchResultFilter which aborts processing when too many records have been
inspected.
|
HostMatchFilter |
SearchResultFilter which includes only records that have original host
matching.
|
HttpCodeFilter |
ObjectFilter which allows including or excluding results based on the
Http response code.
|
MimeTypeFilter |
SearchResultFilter which includes only records matching one or more supplied
Mime-Types.
|
OracleAnnotationFilter |
SearchResult filter class which contacts an access-control Oracle, using
information from the public comment field to annotate SearchResult objects.
|
SchemeMatchFilter |
ObjectFilter which omits CaptureSearchResult objects if their scheme does not
match the specified scheme.
|
SelfRedirectFilter |
SearchResultFilter which INCLUDEs all records, unless they redirect to
themselves, via whatever URL purification schemes are in use.
|
StartDateFilter |
SearchResultFilter which includes all records until 1 is found before start
date then it aborts processing.
|
UrlMatchFilter |
SearchResultFilter which includes only records that have url matching
aborts as soon as url does not match.
|
UrlPrefixMatchFilter |
SearchResultFilter which includes any URL which begins with a given prefix,
and aborts processing when any URL does not match the prefix.
|
UserInfoInAuthorityFilter |
Class which omits CaptureSearchResults that have and '@' in the original URL
field, if that '@' is after the scheme, and before the first '/' or ':'
|
WARCRevisitAnnotationFilter |
Filter class that observes a stream of SearchResults tracking for each
complete record, a mapping of that records Digest to:
Arc/Warc Filename
Arc/Warc offset
HTTP Response
MIME-Type
Redirect URL
If subsequent SearchResults are missing these fields ("-") and the Digest
field is in the map, then the SearchResults missing fields are replaced with
the values from the previously seen record with the same digest, and an
additional annotation field is added.
|
WindowEndFilter<T> |
SearchResultFitler that includes the first N records seen.
|
WindowStartFilter<T> |
SearchResultFitler that omits the first N records seen.
|
Copyright © 2005–2017 IIPC. All rights reserved.