Fast Phonetic Similarity Search over Large Repositories
Abstract: Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.
    author    = {Hegler Tissot and Gabriel Peschl and 
                 Marcos Didonet Del Fabro},
    title     = {Fast Phonetic Similarity Search over Large Repositories},
    booktitle = {Database and Expert Systems Applications - 
                 25th International Conference, {DEXA} 2014, Munich, Germany, 
                 September 1-4, 2014. Proceedings, Part {II}},
    pages     = {74--81},
    year      = {2014},
    doi       = {10.1007/978-3-319-10085-2_6}