- java.lang.Object
-
- org.hsqldb.lib.KMPSearchAlgorithm
-
public class KMPSearchAlgorithm extends java.lang.Object
Implements the Knuth-Morris-Pratt string search algorithm for searching streams or arrays of octets or characters.This algorithm is a good choice for searching large, forward-only access streams for repeated search using pre-processed small to medium-sized patterns.
This is because in addition to the facts that it:
- does not require pre-processing the searched data (only the pattern)
- scans strictly left-to-right
- does not need to perform back tracking
- does not need to employ reverse scan order
- does not need to perform effectively random access lookups against the searched data or pattern
- a very simple, highly predictable behavior
- an O(n) complexity once the a search pattern is preprocessed
- an O(m) complexity for pre-processing search patterns
- a worst case performance characteristic of only 2n
- an average case performance characteristic that is deemed to be
2-3 times better than the naive search algorithm employed by
String.indexOf(java.lang.String,int)
.
In particular, its higher average performance is biased toward larger search patterns, due to its ability to skip ahead further and with fewer tests under reverse pattern scan. But when searching forward-only access streams, overall performance considerations require the use a circular buffer of the same size as the search pattern to hold data from the searched stream as it is being compared in reverse order to the search pattern. Hence, Boyer-Moore requires at minimum twice the memory required by Knuth-Morris-Pratt to search for the same pattern and that factor has the greatest impact precisely on the same class of patterns (larger) for which it is most outperforms Knuth-Morris-Pratt.
- Since:
- 2.1
- Author:
- Campbell Burnet (campbell-burnet@users dot sourceforge.net)
- See Also:
- Knuth-Morris-Pratt algorithm
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int[]
computeTable(byte[] pattern)
computes the table used to optimize octet pattern searchstatic int[]
computeTable(char[] pattern)
computes the table used to optimize octet pattern searchstatic int[]
computeTable(java.lang.String pattern)
computes the table used to optimize octet pattern searchstatic int
search(byte[] source, byte[] pattern, int[] table, int start)
Searches the given octet string for the given octet pattern returning the zero-based offset from given start position at which the first match is detected.static int
search(char[] source, char[] pattern, int[] table, int start)
Searches the given character array for the given character pattern returning the zero-based offset from given start position at which the first match is detected.static long
search(java.io.InputStream inputStream, byte[] pattern, int[] table)
Searches the given octet stream for the given octet pattern returning the zero-based offset from the initial stream position at which the first match is detected.static long
search(java.io.Reader reader, char[] pattern, int[] table)
Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.static long
search(java.io.Reader reader, java.lang.String pattern, int[] table)
Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.static int
search(java.lang.String source, java.lang.String pattern, int[] table, int start)
Searches the given String object for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
-
-
-
Method Detail
-
computeTable
public static int[] computeTable(byte[] pattern)
computes the table used to optimize octet pattern search- Parameters:
pattern
- for which to compute the table.- Returns:
- the table computed from the octet pattern.
- Throws:
java.lang.IllegalArgumentException
- ifpattern == null || pattern.length < 2
.
-
computeTable
public static int[] computeTable(char[] pattern)
computes the table used to optimize octet pattern search- Parameters:
pattern
- for which to compute the table.- Returns:
- the table computed from the character pattern.
- Throws:
java.lang.IllegalArgumentException
- ifpattern == null || pattern.length < 2
.
-
computeTable
public static int[] computeTable(java.lang.String pattern)
computes the table used to optimize octet pattern search- Parameters:
pattern
- for which to compute the table.- Returns:
- the table computed from the String pattern.
- Throws:
java.lang.IllegalArgumentException
- ifpattern == null || pattern.length < 2
.
-
search
public static long search(java.io.InputStream inputStream, byte[] pattern, int[] table) throws java.io.IOException
Searches the given octet stream for the given octet pattern returning the zero-based offset from the initial stream position at which the first match is detected.Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.
- Parameters:
inputStream
- in which to searchpattern
- for which to searchtable
- computed from the pattern that optimizes the search. If null, automatically computed.- Returns:
- zero-based offset of first match; -1 if
inputStream == null || pattern == null
or no match found, ; zero (0) ifpattern.length == 0
. - Throws:
java.io.IOException
- when an error occurs accessing the input stream.
-
search
public static long search(java.io.Reader reader, char[] pattern, int[] table) throws java.io.IOException
Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.
- Parameters:
reader
- in which to searchpattern
- for which to searchtable
- computed from the pattern that optimizes the search If null, automatically computed.- Returns:
- zero-based offset of first match; -1 if
reader == null || pattern == null
or no match found, ; zero (0) ifpattern.length == 0
. - Throws:
java.io.IOException
- when an error occurs accessing the input stream.
-
search
public static long search(java.io.Reader reader, java.lang.String pattern, int[] table) throws java.io.IOException
Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.
- Parameters:
reader
- in which to searchpattern
- for which to searchtable
- computed from the pattern that optimizes the search If null, automatically computed.- Returns:
- zero-based offset of first match; -1 if
reader == null || pattern == null
or no match found, ; zero (0) ifpattern.length() == 0
. - Throws:
java.io.IOException
- when an error occurs accessing the input stream.
-
search
public static int search(byte[] source, byte[] pattern, int[] table, int start)
Searches the given octet string for the given octet pattern returning the zero-based offset from given start position at which the first match is detected.Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.
- Parameters:
source
- array in which to searchpattern
- to be matchedtable
- computed from the pattern that optimizes the search If null, automatically computed.start
- position in source at which to start the search- Returns:
- zero-based offset of first match; -1 if
source == null || pattern == null
or no match found, ; start ifpattern.length == 0 and start <= source.length
.
-
search
public static int search(char[] source, char[] pattern, int[] table, int start)
Searches the given character array for the given character pattern returning the zero-based offset from given start position at which the first match is detected.- Parameters:
source
- array in which to searchpattern
- to be matchedtable
- computed from the pattern that optimizes the search If null, automatically computed.start
- position in source at which to start the search- Returns:
- zero-based offset of first match; -1 if
source == null || pattern == null
or no match found, ; start ifpattern.length == 0 and start <= source.length
.
-
search
public static int search(java.lang.String source, java.lang.String pattern, int[] table, int start)
Searches the given String object for the given character pattern returning the zero-based offset from given start position at which the first match is detected.- Parameters:
source
- array to be searchedpattern
- to be matchedtable
- computed from the pattern that optimizes the searchstart
- position in source at which to start the search- Returns:
- zero-based offset of first match; -1 if
source == null || pattern == null
or no match found, ; start ifpattern.length == 0 and start <= source.length
.
-
-