SHIELD: Evolutionary Synthesis of Privacy-Preserving Pipelines for Live Stream Data Sharing
Type:
Conf
Authors:
Silvia Perelli, , Vincenzo Gulisano
In:
20th ACM International Conference on Distributed and Event-based Systems (DEBS), held in Lisbon (Portugal)
Year:
2026
Notes:
To appear
Links and material:
Abstract # ↰
Modern data-driven organizations rely on pipelines that transform data from infrastructure, applications, and users, where correctness and performance are critical for reliability and business value. These pipelines are often developed and optimized by teams separate from the data owners defining semantics, so meaningful testing and benchmarking require sharing representative data. This creates a tension: real data is needed to validate semantics, detect subtle errors, and tune performance, yet it is often sensitive and restricted by legal and commercial constraints. To enable privacy-preserving data sharing while preserving utility, we present SHIELD, a framework that automatically constructs stream processing (SP) pipelines to transform sensitive data on the fly into shareable data while preserving the characteristics required for downstream processing and optimization. Internally, SHIELD leverages evolutionary computation to synthesize executable SP queries under predefined privacy constraints and utility requirements. Using real-world use cases, we show that SHIELD can synthesize privacy-preserving pipelines that retain analytical value while scaling to realistic workloads.