The clusters formed should be highly internally homogenous and highly externally heterogeneous. The clustering techniques attempts to minimize internal variation while maximizing variation between groups.
Clustering methods may be top down and employ logical division, or bottom up and undertake aggregation. Aggregation procedures which are based upon combining cases through assessment of similarities are the most common procedures used.
Cluster Analysis is used in:
- Market Segmentation
- New Product Development and Product Positioning
Explore the Market Equations Data Analytics Solutions today and make your numbers do the talking. Talk to Our Experts
Some general approached include:
Hierarchical clustering is used on smaller samples to determine similarities or distance. The number of clusters required depends on the research objective and may increase or decrease based on the closeness or similarity of the distance between cases. Typically, after this technique is used the researcher the entire data set is analyzed using K-means.
K- means Clustering (Non-Hierarchical) :
K- means clustering uses Euclidean distance and the desired number of cluster (K) is determined in advance. After the distances are randomly chosen observations are made on based on nearest distance to the mean. The intention is to minimize variance within clusters and maximize variability between clusters. The process continues till a limit is reached.
- Multidimensional Scaling (MDS)
- Discriminant Analysis
Data Drives Decisions. Talk to our Statisticians today and make the right choice. Write to us
The two key steps within cluster analysis are the measurement of distances between objects and to group the objects based upon the resultant distances (linkages). The distances may be measured in a variety of ways, such as Euclidean and Manhatan metric distance. Linkages are based upon how the association between groups is measured.
Some Commonly Used Terminology:
Simple linkage or nearest neighbor distance, measures the distance to the nearest object in a group while furthest neighbor linkage or complete linkage, measures the distance between furthest objects.
Centroid linkage has a new value, representing the group centroid, which is compared to the ungrouped point to weigh inclusion.
Ward's method is variance based with the groups variance assessed to enable clustering. Ward's is a popular default linkage which produces compact groups of well distributed size.
Standardization of variables is undertaken to minimize the bias in weighting which may result from the use of differing measurement scales and ranges. Z score format is used to balance the differences between values and reduces the standard deviation when variables have multivariate normality. Multicollinearity will bias the clusters due to the high correlations between variables.
Output of a Cluster Analysis
The main outcome of a cluster analysis is a dendrogram, which is also called a tree diagram.
Use Extensive Analytics and Predictive modeling in your business today and stay ahead. Talk to Our Experts
Market Equations - Research | Analytics | Outsourcing
- Bangalore, Delhi