Module org.hsqldb

Class KMPSearchAlgorithm


  • public class KMPSearchAlgorithm
    extends java.lang.Object
    Implements the Knuth-Morris-Pratt string search algorithm for searching streams or arrays of octets or characters.

    This algorithm is a good choice for searching large, forward-only access streams for repeated search using pre-processed small to medium sized patterns.

    This is because in addition to the facts that it:

    • does not require pre-processing the searched data (only the pattern)
    • scans strictly left-to-right
    • does not need to perform back tracking
    • does not need to employ reverse scan order
    • does not need to perform effectively random access lookups against the searched data or pattern
    it also has:
    • a very simple, highly predictable behavior
    • an O(n) complexity once the a search pattern is preprocessed
    • an O(m) complexity for preprocessing search patterns
    • a worst case performance characteristic of only 2n
    • a typical performance characteristic that is deemed to be 2-3 times better than the naive search algorithm employed by String.indexOf(java.lang.String,int).
    Note that the Boyer-Moore algorithm is generally considered to be the better practical, all-round exact sub-string search algorithm, but due to its reverse pattern scan order, performance considerations dictate that it requires more space and that is somewhat more complex to implement efficiently for searching forward-only access streams.

    In particular, its higher average performance is biased toward larger search patterns, due to its ability to skip ahead further and with fewer tests under reverse pattern scan. But when searching forward-only access streams, overall performance considerations require the use a circular buffer of the same size as the search pattern to hold data from the searched stream as it is being compared in reverse order to the search pattern. Hence, Boyer-Moore requires at minimum twice the memory required by Knuth-Morris-Pratt to search for the same pattern and that factor has the greatest impact precisely on the same class of patterns (larger) for which it is most outperforms Knuth-Morris-Pratt.

    Since:
    2.1
    Author:
    Campbell Burnet (campbell-burnet@users dot sourceforge.net)
    See Also:
    Knuth-Morris-Pratt algorithm
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static int[] computeTable​(byte[] pattern)
      computes the table used to optimize octet pattern search
      static int[] computeTable​(char[] pattern)  
      static int[] computeTable​(java.lang.String pattern)  
      static int search​(byte[] source, byte[] pattern, int[] table, int start)
      Searches the given octet string for the given octet pattern returning the zero-based offset from given start position at which the first match is detected.
      static int search​(char[] source, char[] pattern, int[] table, int start)
      Searches the given character array for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
      static long search​(java.io.InputStream inputStream, byte[] pattern, int[] table)
      Searches the given octet stream for the given octet pattern returning the zero-based offset from the initial stream position at which the first match is detected.
      static long search​(java.io.Reader reader, char[] pattern, int[] table)
      Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.
      static int search​(java.lang.String source, java.lang.String pattern, int[] table, int start)
      Searches the given String object for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • KMPSearchAlgorithm

        public KMPSearchAlgorithm()
    • Method Detail

      • search

        public static long search​(java.io.InputStream inputStream,
                                  byte[] pattern,
                                  int[] table)
                           throws java.io.IOException
        Searches the given octet stream for the given octet pattern returning the zero-based offset from the initial stream position at which the first match is detected.

        Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

        Parameters:
        inputStream - in which to search
        pattern - for which to search
        table - computed from the pattern that optimizes the search. If null, automatically computed.
        Returns:
        zero-based offset of first match; -1 if no match found.
        Throws:
        java.io.IOException - when an error occurs accessing the input stream.
      • search

        public static long search​(java.io.Reader reader,
                                  char[] pattern,
                                  int[] table)
                           throws java.io.IOException
        Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.

        Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

        Parameters:
        reader - in which to search
        pattern - for which to search
        table - computed from the pattern that optimizes the search If null, automatically computed.
        Returns:
        zero-based offset of first match; -1 if no match found.
        Throws:
        java.io.IOException - when an error occurs accessing the input stream.
      • search

        public static int search​(byte[] source,
                                 byte[] pattern,
                                 int[] table,
                                 int start)
        Searches the given octet string for the given octet pattern returning the zero-based offset from given start position at which the first match is detected.

        Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

        Parameters:
        source - array in which to search
        pattern - to be matched
        table - computed from the pattern that optimizes the search If null, automatically computed.
        start - position in source at which to start the search
        Returns:
        zero-based offset of first match; -1 if no match found.
      • search

        public static int search​(char[] source,
                                 char[] pattern,
                                 int[] table,
                                 int start)
        Searches the given character array for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
        Parameters:
        source - array in which to search
        pattern - to be matched
        table - computed from the pattern that optimizes the search If null, automatically computed.
        start - position in source at which to start the search
        Returns:
        zero-based offset of first match; -1 if no match found.
      • search

        public static int search​(java.lang.String source,
                                 java.lang.String pattern,
                                 int[] table,
                                 int start)
        Searches the given String object for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
        Parameters:
        source - array to be searched
        pattern - to be matched
        table - computed from the pattern that optimizes the search
        start - position in source at which to start the search
        Returns:
        zero-based offset of first match; -1 if no match found.
      • computeTable

        public static int[] computeTable​(byte[] pattern)
        computes the table used to optimize octet pattern search
        Parameters:
        pattern - for which to compute the table.
        Returns:
        the table computed from the octet pattern.
      • computeTable

        public static int[] computeTable​(char[] pattern)
      • computeTable

        public static int[] computeTable​(java.lang.String pattern)