A Counter-Based Profiling Scheme For Improving Locality Through Data and Reducer Placement

Roy, Diptendu Sinha

doi:https://doi.org/10.1007/978-981-16-8930-7_4

A Counter-Based Profiling Scheme For Improving Locality Through Data and Reducer Placement

Date Issued

2022

Author(s)

Hussain, Mir Wajahat

Roy, Diptendu Sinha

DOI

https://doi.org/10.1007/978-981-16-8930-7_4

Abstract

Hadoop has been regarded as the de-facto standard for handling data-intensive distributed applications with its popular storage and processing engine called as the Hadoop distributed File System (HDFS) and MapReduce. Hadoop’s inherent assumption of homogeneity in the cluster is a major cause of performance deterioration due to the huge shuffle required for the processing of data during map phase and reducer phase. This chapter addresses this performance deterioration by proposing a counter placement scheme (CPS) whose main contributions are enumerated as follows; (i) Profiling of nodes based on the completion of maps, (ii) Movement of high-performance nodes into a single rack for tracking higher computation, (iii) Data replacement strategy based on placing at least a single block of file in the rack with the highest computation, and (iv) Finally assigning reducers to the rack and node with highest computation. The experiments performed clearly signify the merits of CPS in terms of reduction in the average completion time, reduce time and off-local shuffle by about (1.9–22.83%), (2.1–21.5%), (4.25–24%) while running several benchmarks. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Subjects

Data placement

Hadoop

HDFS

MapReduce

Reducer placement