2021 RoEduNet Conference: Networking in Education and Research

Name: 2021 RoEduNet Conference: Networking in Education and Research
Start: 2021-11-04T12:00:00+02:00
End: 2021-11-06T14:00:00+02:00
Location: Iasi

4–6 Nov 2021

Iasi

Europe/Bucharest timezone

Contact

conference@roedu.net

Big Data Performance in Private Clouds. Some Initial Findings on Apache Spark Clusters Deployed in OpenStack

5 Nov 2021, 11:40

20m

Virtual Room A

RO-LCG 2021 "Grid, Cloud && High Performance Computing in Science" & Cloud Computing and Network Virtualisation

Prof. Marin FOTACHE (Al.I. Cuza University of Iasi)Mr Marius-Iulian CLUCI (Al.I.Cuza University of Iasi)

In recent years Apache Spark has become one the most important Big Data platform. In-memory processing performance and the ability to connect with any major data server/source/format have been two of the main drivers of Spark’s popularity. But finding the most suitable setup for a given data processing task is challenging, depending not only on the data structure and the nature/complexity of the task, but also because of the myriad of setup parameters to be tweaked. In this paper we propose a model for assessing the processing performance of a Spark-and-Hadoop cluster, deployed on a university cloud managed with OpenStack. Randomly generated SparkSQL queries on the TPC-H benchmark schema were executed for data sets of 5GB, 10GB and 50GB, varying four data source formats and two memory settings. Predictive models built with three Machine Learning techniques (Multivariate Adaptive Regression Splines, Random Forest, and Extreme Gradient Boosting) provided encouraging results. For the given data sets, the most important predictors seem to be related with the volume of processed data and the query complexity whereas the data formats and memory settings seem less important.

Prof. Marin FOTACHE (Al.I. Cuza University of Iasi) Mr Marius-Iulian CLUCI (Al.I.Cuza University of Iasi)

RoEduNet2021_Fotache_Cluci.pptx

2021 RoEduNet Conference: Networking in Education and Research

Contact

Big Data Performance in Private Clouds. Some Initial Findings on Apache Spark Clusters Deployed in OpenStack

Virtual Room A

Speakers

Description

Authors

Presentation materials

Choose timezone

2021 RoEduNet Conference: Networking in Education and Research

Contact

Speakers

Description

Authors

Presentation materials