public class CaptureSearchResult extends SearchResult implements Capture
Modifier and Type | Field and Description |
---|---|
protected long |
cachedCompressedLength |
protected long |
cachedDate |
protected long |
cachedOffset |
static String |
CAPTURE_CAPTURE_TIMESTAMP
Result: 14-digit timestamp when document was captured
|
static String |
CAPTURE_CLOSEST_INDICATOR
Result: flag within a SearchResult that indicates this is the closest to
a particular requested date.
|
static String |
CAPTURE_CLOSEST_VALUE |
static String |
CAPTURE_COMPRESSED_LENGTH
Result: compressed byte offset within ARC/WARC file where this document's
gzip envelope Ends.
|
static String |
CAPTURE_DIGEST
Result: some form of document fingerprint.
|
static String |
CAPTURE_DUPLICATE_ANNOTATION
Result: this key being present indicates that this particular capture was
not actually stored, and that other values within this SearchResult are
actually values from a different record which *should* be identical to
this capture, had it been stored.
|
static String |
CAPTURE_DUPLICATE_DIGEST
flag indicates that this document was downloaded and verified as
identical to a previous capture by digest.
|
static String |
CAPTURE_DUPLICATE_HTTP
flag indicates that this document was NOT downloaded, but that the origin
server indicated that the document had not changed, based on If-Modified
HTTP request headers.
|
static String |
CAPTURE_DUPLICATE_PAYLOAD_COMPRESSED_LENGTH
For identical content digest revisit records, the compressed length in
CAPTURE_DUPLICATE_PAYLOAD_LENGTH where the payload record can be found,
if known.
|
static String |
CAPTURE_DUPLICATE_PAYLOAD_FILE
For identical content digest revisit records, the file where the payload
can be found, if known.
|
static String |
CAPTURE_DUPLICATE_PAYLOAD_OFFSET
For identical content digest revisit records, the offset in
CAPTURE_DUPLICATE_PAYLOAD_FILE where the payload record can be found, if
known.
|
static String |
CAPTURE_DUPLICATE_STORED_TS
Result: this key is present when the CAPTURE_DUPLICATE_ANNOTATION is also
present, with the value indicating the last date that was actually stored
for this duplicate.
|
static String |
CAPTURE_FILE
Result: basename of ARC/WARC file containing this document.
|
static String |
CAPTURE_HTTP_CODE
Result: 3-digit integer HTTP response code. may be '0' in some fringe
conditions, old ARCs, bug in crawler, etc.
|
static String |
CAPTURE_MIME_TYPE
Result: best-guess at mime-type of this document.
|
static String |
CAPTURE_OFFSET
Result: compressed byte offset within ARC/WARC file where this document's
gzip envelope begins.
|
static String |
CAPTURE_ORACLE_POLICY |
static String |
CAPTURE_ORIGINAL_HOST |
static String |
CAPTURE_ORIGINAL_URL |
static String |
CAPTURE_REDIRECT_URL
Result: URL that this document redirected to, or '-' if it does not
redirect
|
static char |
CAPTURE_ROBOT_BLOCKED
non-standard robot-flag indicating the capture is soft-blocked
(not available for direct replay, but available as the original for
a revisits.)
|
static String |
CAPTURE_ROBOT_FLAGS
Result: String flags which indicate robot instructions found in an HTML
page.
|
static String |
CAPTURE_ROBOT_IGNORE |
static String |
CAPTURE_ROBOT_NOARCHIVE |
static String |
CAPTURE_ROBOT_NOFOLLOW |
static String |
CAPTURE_ROBOT_NOINDEX |
static String |
CAPTURE_URL_KEY
Result: canonicalized(lookup key) form of URL of captured document
|
CUSTOM_HEADER_PREFIX, data, RESULT_TRUE_VALUE
Modifier | Constructor and Description |
---|---|
|
CaptureSearchResult() |
protected |
CaptureSearchResult(boolean autocreateMap) |
Modifier and Type | Method and Description |
---|---|
void |
flagDuplicateDigest() |
void |
flagDuplicateDigest(CaptureSearchResult payload)
Mark this capture as a revisit of previous capture
payload , identified by content digest. |
void |
flagDuplicateDigest(Date storedDate)
Deprecated.
|
void |
flagDuplicateDigest(String storedTS)
Deprecated.
|
void |
flagDuplicateHTTP(Date storedDate) |
void |
flagDuplicateHTTP(String storedTS) |
Date |
getCaptureDate() |
String |
getCaptureTimestamp()
return time of capture.
|
long |
getCompressedLength() |
String |
getDigest() |
Date |
getDuplicateDigestStoredDate() |
String |
getDuplicateDigestStoredTimestamp()
same with
getDuplicateDigestStoredDate() , but
returns raw timestamp value. |
Date |
getDuplicateHTTPStoredDate() |
String |
getDuplicateHTTPStoredTimestamp() |
CaptureSearchResult |
getDuplicatePayload() |
long |
getDuplicatePayloadCompressedLength() |
String |
getDuplicatePayloadFile() |
Long |
getDuplicatePayloadOffset() |
String |
getFile() |
String |
getHttpCode() |
String |
getMimeType() |
CaptureSearchResult |
getNextResult() |
long |
getOffset() |
String |
getOraclePolicy() |
String |
getOriginalHost() |
String |
getOriginalUrl()
return the original URL (ordinary, non-SURT form) which resulted in the capture.
|
CaptureSearchResult |
getPrevResult() |
String |
getRedirectUrl() |
String |
getRobotFlags()
return robot flags field value.
|
String |
getUrlKey()
return the URL key of this capture.
|
boolean |
isClosest() |
boolean |
isDuplicateDigest()
whether this capture is a re-fetch of previously archived capture
(revisit), detected by content's digest, and replay of
that previous capture is not blocked.
|
boolean |
isDuplicateHTTP()
whether this capture is an archive of
304 Not Modified response
from the server. |
boolean |
isHttpError()
true if HTTP response code is either 4xx or 5xx . |
boolean |
isHttpRedirect()
true if HTTP response code is 3xx . |
boolean |
isHttpSuccess()
true if HTTP response code is 2xx . |
boolean |
isRevisitDigest()
whether this capture is a re-fetch of previously archived capture
(revisit), detected by content's digest.
|
boolean |
isRobotFlagSet(char flag)
test if
robotflags field has flag flag set. |
boolean |
isRobotFlagSet(String flag)
test if
robotflags field has flag flag set. |
boolean |
isRobotIgnore() |
boolean |
isRobotNoArchive() |
boolean |
isRobotNoFollow() |
boolean |
isRobotNoIndex() |
void |
removeFromList() |
void |
setCaptureDate(Date date) |
void |
setCaptureTimestamp(String timestamp) |
void |
setClosest(boolean value) |
void |
setCompressedLength(long offset) |
void |
setDigest(String digest) |
void |
setFile(String file) |
void |
setHttpCode(String httpCode) |
void |
setMimeType(String mimeType) |
void |
setNextResult(CaptureSearchResult result) |
void |
setOffset(long offset) |
void |
setOraclePolicy(String policy) |
void |
setOriginalHost(String originalHost) |
void |
setOriginalUrl(String originalUrl) |
void |
setPrevResult(CaptureSearchResult result) |
void |
setRedirectUrl(String url) |
void |
setRobotFlag(char flag)
Add a flag to
robotflags field. |
void |
setRobotFlag(String flag)
Add a flag to
robotflags field. |
void |
setRobotFlags(String robotFlags)
Set robot flags field value as a whole.
|
void |
setRobotIgnore() |
void |
setRobotNoArchive() |
void |
setRobotNoFollow() |
void |
setRobotNoIndex() |
void |
setUrlKey(String urlKey) |
String |
toString() |
dateToTS, ensureMap, fromCanonicalStringMap, get, getBoolean, getCustom, put, putBoolean, putCustom, toCanonicalStringMap, tsToDate
protected long cachedOffset
protected long cachedCompressedLength
protected long cachedDate
public static final String CAPTURE_ORIGINAL_URL
public static final String CAPTURE_ORIGINAL_HOST
public static final String CAPTURE_URL_KEY
public static final String CAPTURE_CAPTURE_TIMESTAMP
public static final String CAPTURE_FILE
public static final String CAPTURE_OFFSET
public static final String CAPTURE_COMPRESSED_LENGTH
public static final String CAPTURE_MIME_TYPE
public static final String CAPTURE_HTTP_CODE
public static final String CAPTURE_DIGEST
public static final String CAPTURE_REDIRECT_URL
public static final String CAPTURE_ROBOT_FLAGS
public static final String CAPTURE_ROBOT_NOARCHIVE
public static final String CAPTURE_ROBOT_NOFOLLOW
public static final String CAPTURE_ROBOT_NOINDEX
public static final String CAPTURE_ROBOT_IGNORE
public static final char CAPTURE_ROBOT_BLOCKED
public static final String CAPTURE_CLOSEST_INDICATOR
public static final String CAPTURE_CLOSEST_VALUE
public static final String CAPTURE_DUPLICATE_ANNOTATION
public static final String CAPTURE_DUPLICATE_STORED_TS
public static final String CAPTURE_DUPLICATE_DIGEST
public static final String CAPTURE_DUPLICATE_PAYLOAD_FILE
public static final String CAPTURE_DUPLICATE_PAYLOAD_OFFSET
public static final String CAPTURE_DUPLICATE_PAYLOAD_COMPRESSED_LENGTH
public static final String CAPTURE_DUPLICATE_HTTP
public static final String CAPTURE_ORACLE_POLICY
public CaptureSearchResult()
protected CaptureSearchResult(boolean autocreateMap)
public String getOriginalUrl()
Capture
getOriginalUrl
in interface Capture
public void setOriginalUrl(String originalUrl)
originalUrl
- as close to the original URL by which this Resource
was captured as is possiblepublic String getOriginalHost()
public void setOriginalHost(String originalHost)
public String getUrlKey()
Capture
public void setUrlKey(String urlKey)
public Date getCaptureDate()
public void setCaptureDate(Date date)
public String getCaptureTimestamp()
Capture
getCaptureTimestamp
in interface Capture
YYYYmmddHHMMSS
" format.public void setCaptureTimestamp(String timestamp)
public String getFile()
public void setFile(String file)
public long getOffset()
public void setOffset(long offset)
public long getCompressedLength()
public void setCompressedLength(long offset)
public String getMimeType()
public void setMimeType(String mimeType)
public String getHttpCode()
public void setHttpCode(String httpCode)
public String getDigest()
public void setDigest(String digest)
public String getRedirectUrl()
public void setRedirectUrl(String url)
public boolean isClosest()
public void setClosest(boolean value)
public void flagDuplicateDigest()
public void flagDuplicateDigest(CaptureSearchResult payload)
payload
, identified by content digest.
Record location information is copied from payload
so that the content can be
loaded from the record later.
ResourceIndex
implementations should call this method before returning
CaptureSearchResult
s to AccessPoint
.
payload
- capture being revisitedgetDuplicateDigestStoredTimestamp()
,
getDuplicateDigestStoredDate()
,
getDuplicatePayloadFile()
,
getDuplicatePayloadOffset()
,
getDuplicatePayloadCompressedLength()
public CaptureSearchResult getDuplicatePayload()
public String getDuplicatePayloadFile()
public Long getDuplicatePayloadOffset()
public long getDuplicatePayloadCompressedLength()
public void flagDuplicateDigest(Date storedDate)
public void flagDuplicateDigest(String storedTS)
public boolean isDuplicateDigest()
1.8.1 2014-10-02 behavior change. This method now returns
false
even for revisits, if the original capture
is blocked. Use #isRevisitDigest() for old behavior.
true
if revisitpublic boolean isRevisitDigest()
This method is meant for use by replay processing. For use in
user interface / web API code, consider isDuplicateDigest()
is more appropriate.
true
if revisitpublic Date getDuplicateDigestStoredDate()
public String getDuplicateDigestStoredTimestamp()
getDuplicateDigestStoredDate()
, but
returns raw timestamp value.public void flagDuplicateHTTP(Date storedDate)
public void flagDuplicateHTTP(String storedTS)
public boolean isDuplicateHTTP()
304 Not Modified
response
from the server.public Date getDuplicateHTTPStoredDate()
public String getDuplicateHTTPStoredTimestamp()
public String getRobotFlags()
public void setRobotFlags(String robotFlags)
setRobotFlag(char)
or
setRobotFlag(String)
.robotFlags
- new field valuepublic void setRobotFlag(String flag)
robotflags
field.
If flag
is already set, this is a no-op.flag
- a flag to add (don't put multiple flags).public void setRobotFlag(char flag)
robotflags
field.
If flag
is already set, this is a no-op.flag
- a flag to addpublic boolean isRobotFlagSet(String flag)
robotflags
field has flag flag
set.
Caveat: if flag
has more than once character,
robotflags
must have flag
as its substring
for this method to return true
(not really useful).
flag
- flag to testtrue
if flag
is set.public boolean isRobotFlagSet(char flag)
robotflags
field has flag flag
set.flag
- one flag to testtrue
if flag
is set.public boolean isRobotNoArchive()
public boolean isRobotNoIndex()
public boolean isRobotNoFollow()
public boolean isRobotIgnore()
public void setRobotNoArchive()
public void setRobotNoIndex()
public void setRobotNoFollow()
public void setRobotIgnore()
public String getOraclePolicy()
public void setOraclePolicy(String policy)
public void setPrevResult(CaptureSearchResult result)
public CaptureSearchResult getPrevResult()
public void setNextResult(CaptureSearchResult result)
public CaptureSearchResult getNextResult()
public void removeFromList()
public boolean isHttpError()
true
if HTTP response code is either 4xx
or 5xx
.public boolean isHttpRedirect()
true
if HTTP response code is 3xx
.public boolean isHttpSuccess()
true
if HTTP response code is 2xx
.Copyright © 2005–2015 IIPC. All rights reserved.