Evaluating the Effectiveness of Margin Parameter when Learning Knowledge Embedding Representation for Domain-specific Multi-relational Categorized Data
Learning knowledge embedding representation is an increasingly important technology.
However, the choice of hyperparameters is seldom justified and usually relies on
exhaustive search. Understanding the effect of hyperparameter combinations on embedding
quality is crucial to avoid the inefficient process and enhance practicality of
embedding representation along subsequent machine learning applications.
This work focuses on translational embedding models for multi-relational categorized
data in the clinical domain. We trained and evaluated models with different combinations
of hyperparameters on two clinical datasets. We contrasted the results by comparing
metric distributions and fitting a random forest regression model. Classifiers were
trained to assess embedding representation quality. Finally, clustering was tested as a
validation protocol. We observed consistent patterns of hyperparameter preference and
identified those that achieved better results respectively. However, results show
different patterns regarding link prediction, which is taken as strong evidence that
traditional evaluation protocol used for open-domain data does not necessarily lead to
the best embedding representation for categorized data.