public class AggressiveUrlCanonicalizer extends Object implements UrlCanonicalizer
Constructor and Description |
---|
AggressiveUrlCanonicalizer() |
Modifier and Type | Method and Description |
---|---|
String |
canonicalize(String url)
Idempotent operation that will determine the 'fuzziest'
form of the url argument.
|
protected boolean |
doStripRegexMatch(StringBuilder url,
Matcher matcher)
Run a regex against a StringBuilder, removing group 1 if it matches.
|
boolean |
isSurtForm()
Returns true if this Canonicalizer returns SURTs, false is URLs
|
static void |
main(String[] args) |
String |
urlStringToKey(String urlString) |
protected boolean doStripRegexMatch(StringBuilder url, Matcher matcher)
url
- Url to search in.matcher
- Matcher whose form yields a group to removepublic String urlStringToKey(String urlString) throws org.apache.commons.httpclient.URIException
urlStringToKey
in interface UrlCanonicalizer
urlString
- String representation of an URL, in as original, and
unchanged form as possible.org.apache.commons.httpclient.URIException
- if the input url String is not a valid URL.public String canonicalize(String url)
url
- to be canonicalized.public static void main(String[] args)
args
- program argumentspublic boolean isSurtForm()
UrlCanonicalizer
isSurtForm
in interface UrlCanonicalizer
Copyright © 2005–2015 IIPC. All rights reserved.