hextrato

Publications

Towards open data discovery: a comparative study

Abstract

Open Data discovery enables the retrieval of data sources most likely to contain the information needed, facilitating data access and transparency. This work presents a comparative study involving three different methods: a hybrid algorithm based on Linear Discriminant Analysis and Word2Vec, Cosine similarity measure, and a Semantic Test proposed for Open Data search. Each method was evaluated on its ability to discover, among eight open datasets, using only their metadata and descriptions, the most likely one to meet an input question. Three evaluation rounds were performed with different sets of questions and databases, showing a classification accuracy above 81% for all methods.

URL

https://dl.acm.org/doi/10.1145/3477314.3507351

DOI

10.1145/3477314.3507351

LaTeX

@inproceedings{Franciscatto2022SAC,
	author    = {Franciscatto, Maria Helena and Fabro, Marcos Didonet Del and Trois, Celio and Tissot, Hegler},
	title     = {Towards Open Data Discovery: A Comparative Study},
	year      = {2022},
	isbn      = {9781450387132},
	publisher = {Association for Computing Machinery},
	address   = {New York, NY, USA},
	booktitle = {Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing},
	pages     = {713–716},
	numpages  = {4},
	keywords  = {open data, source discovery, similarity methods},
	location  = {Virtual Event},
	series    = {SAC '22}
}