Fast Phonetic Similarity Search over Large Repositories


Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.






	author    = {Hegler Tissot and Gabriel Peschl and Marcos Didonet Del Fabro},
	title     = {Fast Phonetic Similarity Search over Large Repositories},
	booktitle = {Database and Expert Systems Applications - 25th International Conference, {DEXA} 2014, Munich, Germany, September 1-4, 2014. Proceedings, Part {II}},
	pages     = {74--81},
	year      = {2014},
	doi       = {10.1007/978-3-319-10085-2_6}