17–19 Sept 2025
Tehnical University of Moldova
Europe/Bucharest timezone

Information Retention in Trimmed Datasets

18 Sept 2025, 12:10
15m
Room 1

Room 1

Technical University of Moldova
Paper presentation Grid, Cloud & High Performance Computing in Science High Performance Computing in Science

Speaker

Florin Bogdan MANOLACHE (Carnegie Mellon University)

Description

The structure and usage scenarios of a software package for trimming datasets while having minimum information loss are described. Several information models applied to a large dataset generated by an enterprise information system are analyzed. Different strategies and procedures are compared to obtain the best compromise between computing time and information retention. A set of data profiling tests is presented
with the purpose of detecting anomalies such as data flooding. The results show that a block trimming strategy allows the preservation of most of the information while speeding up the computation by one or more orders of magnitude. The software automatically detects the optimum trimming level associated with the model, allowing autonomous real-time control of large datasets.

Author

Florin Bogdan MANOLACHE (Carnegie Mellon University)

Co-authors

Octavian Rusu (Alexandru Ioan Cuza University, Iasi, Romania) Xinran Su (Carnegie Mellon University) Zhilan Wang (Carnegie Mellon University)

Presentation materials