A Counter-Based Profiling Scheme For Improving Locality Through Data and Reducer Placement

Hussain, Mir WajahatMir WajahatHussainRoy, Diptendu SinhaDiptendu SinhaRoy2025-03-212025-03-212022Vol. 218; pp. 101-118978-981-16-8932-1978-981-16-8930-7978-981-16-8929-11868-43941868-4408https://doi.org/10.1007/978-981-16-8930-7_4https://gnanaganga.alliance.edu.in/handle/123456789/5369Hadoop has been regarded as the de-facto standard for handling data-intensive distributed applications with its popular storage and processing engine called as the Hadoop distributed File System (HDFS) and MapReduce. Hadoop’s inherent assumption of homogeneity in the cluster is a major cause of performance deterioration due to the huge shuffle required for the processing of data during map phase and reducer phase. This chapter addresses this performance deterioration by proposing a counter placement scheme (CPS) whose main contributions are enumerated as follows; (i) Profiling of nodes based on the completion of maps, (ii) Movement of high-performance nodes into a single rack for tracking higher computation, (iii) Data replacement strategy based on placing at least a single block of file in the rack with the highest computation, and (iv) Finally assigning reducers to the rack and node with highest computation. The experiments performed clearly signify the merits of CPS in terms of reduction in the average completion time, reduce time and off-local shuffle by about (1.9–22.83%), (2.1–21.5%), (4.25–24%) while running several benchmarks. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.enData placementHadoopHDFSMapReduceReducer placementA Counter-Based Profiling Scheme For Improving Locality Through Data and Reducer PlacementBook chapter