Publications

SYNNER: Synthetic Data Generator Framework

 

Abstract

Objectives
Sharing medical data is hampered by technical, regulatory, and privacy challenges, including compliance with the Health Insurance Portability and Accountability Act of 1996. However, existing data anonymization methods are error-prone or vulnerable to re-identification, and synthetic data generation approaches are limited. This study introduces SYNNER, a novel synthetic data generation framework that overcomes existing limitations, preserving data utility while ensuring privacy.

Methods
We employ knowledge graph embeddings to encode data into a k-dimensional space, capturing complex relationships. For each entity, its nearest neighbors are identified, and their characteristics are used to generate a synthetic version that maintains statistical consistency. We evaluated SYNNER on seven publicly available datasets, measuring the preservation of original data signals and comparing macro-F1 scores across prediction tasks. A novel evaluation protocol for differential privacy was also introduced, simulating an adversarial attack to infer missing values.

Results
The evaluation shows that SYNNER maintains an average of 83.2% of the signals from the original datasets. In predictive tasks, models trained on SYNNER-generated data achieved a proportional average macro-F1 score of 74.4%, comparable to those trained on the original data. The proposed evaluation protocol for differential privacy assesses whether synthetic datasets meet expected privacy standards and highlights potential risks of individual data point reconstruction.

Conclusion
SYNNER provides a scalable and effective solution for generating synthetic data that maintains statistical fidelity. It overcomes the limitations of existing methods, providing a privacy-preserving solution for synthetic data generation and advancing research in sensitive domains such as healthcare.

 

URL
https://journals.sagepub.com/doi/10.1177/20552076251411621

 

DOI
10.1177/20552076251411621

 

LaTeX
@article{SYNNER2026,
    title ={SYNNER synthetic data generator framework},
    author = {Hegler Tissot and Justin Moore and Eric Benton and Sarah Alshahrani and Maria Helena Franciscatto and Marcos D Del Fabro},
    journal = {DIGITAL HEALTH},
    volume = {12},
    number = {},
    pages = {},
    year = {2026},
    doi = {10.1177/20552076251411621},
    URL = {https://journals.sagepub.com/doi/abs/10.1177/20552076251411621},
    eprint = {https://journals.sagepub.com/doi/pdf/10.1177/20552076251411621}
}