书名：Artificial Intelligence By Example
作者名：Denis Rothman
本章字数：518字
更新时间：2021-06-25 21:33:37

The k-means clustering program

k-means clustering is a powerful unsupervised learning algorithm. We often perform k-means clustering in our lives. Take, for example, a lunch you want to organize for a team of about 50 people in an open space that can just fit those people. It will be a bit cramped but a project must be finished that day and it is a good way to bind the team for the final sprint.

A friend and a friend first decide to set up a table in the middle. Your friend points out that the people in that room will form a big cluster k, and with only one table in the geometric center (or centroid) c, it will not be practical. The people near the wall will not have access to the main table.

You have an idea. You call Pert, who runs a computation to confirm your friend's intuition. Pert shows them the problem with the following one-table plan:

The people not close to the table (rectangle in the middle) will not have easy access to the table.

You go to the room with your friend and Pert and try moving two tables c₁ and c₂ in various places for two clusters of people k₁ and k₂_.

The people x₁ to x_nform a dataset X. Your friend still has doubts and says, "Look at table c₁. It is badly positioned. Some people x near the wall will be right next to it and the others too far away. We need the table c₁ to be in the center of that cluster k₁. The same goes for table c₂."

You and your friend move a table c, and then estimate that the mean distance of the people x around it will be about the same in their group or cluster k. They do the same for the other table. They draw a line with chalk on the floor to make sure that each group or cluster is at about the mean distance from its table.

Pert speaks up and says, "Gee, we can simulate that with this Python program I am writing for my project!":

Step 1: You have been drawing lines with chalk to decide which group (cluster k) each person x will be in, by looking at the mean distance from the table c
Step 2: You have been moving the tables around accordingly to optimize step 1

Pert shows them the two-table model computed by the k-means clustering program; it looks exactly like what you just did. Then they finally add a third table and it looks good. Pert says, "Look at what you did on the following screenshot.":

You and your friend look at Pert and say, "So what?"

Pert smiles and says, "Well, you just re-invented Lloyd's algorithm to solve a k-means clustering problem!"

You and your friend are curious. You want to know more. So you each get a paperboard, and ask Pert to explain to you the warehouse project or any project with this model in mathematical terms.