ABSTRACT

Because of its importance, sequence classification has found the deserved attention of the researchers from various domains, including data mining, machine learning, and statistics. Earlier works on this task are mostly from the field of statistics that are concerned with classifying sequences of real numbers that vary with time. This gave rise to a series of works under the name of time series classification [15], which has found numerous applications in real life. Noteworthy examples include classification of ECG signals for classification between various states of a patient [20], and classification of stock market time series [22]. However, in recent years, particularly in data mining, the focus has shifted towards classifying discrete sequences [25, 28], where a sequence is considered as a sequence of events, and each event is composed of one or a set of items from an alphabet. Examples can be a sequence of queries in aWeb session, a sequence of events in a manufacturing process, a sequence of financial transactions from an account, and so on. Unlike time series, the events in a discrete sequence are not necessarily ordered based on their temporal position. For instance, a DNA sequence is composed of four amino acids A,C,G,T , and a DNA segment, such as ACCGTTACG, is simply a string of amino acids without any temporal connotation attached to their order in the sequence.