Towards open data discovery: a comparative study


Open Data discovery enables the retrieval of data sources most likely to contain the information needed, facilitating data access and transparency. This work presents a comparative study involving three different methods: a hybrid algorithm based on Linear Discriminant Analysis and Word2Vec, Cosine similarity measure, and a Semantic Test proposed for Open Data search. Each method was evaluated on its ability to discover, among eight open datasets, using only their metadata and descriptions, the most likely one to meet an input question. Three evaluation rounds were performed with different sets of questions and databases, showing a classification accuracy above 81% for all methods.






	author    = {Franciscatto, Maria Helena and Fabro, Marcos Didonet Del and Trois, Celio and Tissot, Hegler},
	title     = {Towards Open Data Discovery: A Comparative Study},
	year      = {2022},
	isbn      = {9781450387132},
	publisher = {Association for Computing Machinery},
	address   = {New York, NY, USA},
	booktitle = {Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing},
	pages     = {713–716},
	numpages  = {4},
	keywords  = {open data, source discovery, similarity methods},
	location  = {Virtual Event},
	series    = {SAC '22}