Clustering Algorithms for Big Data: A Survey

doi:10.1201/9781315368061-17

ABSTRACT

ABSTRACT There is an explosion in the data generation from multiple sources such as social media, business enterprises, and sensors. We need to analyze such vast amounts of data, termed Big Data. However, Big Data cannot be processed using traditional methods. Data mining techniques are well-known and powerful knowledge discovery tools in which clustering plays an hidden patterns and gain some meaningful and accurate information. Clustering divides the data into groups called clusters, where intracluster similarity between objects is much higher than the intercluster similarity. Most of the traditional techniques are suitable for small data sets and generally executed on a single machine. However, with the increase in data size, it becomes impractical to handle large data on a single machine as it requires huge storage and computation capacity. The main objective of this chapter is to give the readers insight into clustering techniques available for Big Data. The pros and cons of each scheme will be discussed thoroughly with a brief discussion of some algorithms in each technique. The methods for Big Data computing will also be discussed to give researchers direction and motivation to deal with intricate data sets.