Artificial Intelligence
FR ES

Waste Allocation Load Lifter: Earth Class

Wall-E masters binary classification (metal vs. plastic) using logistic regression, then tackles multi-class metal identification with KNN — and learns the art of model selection.

· 13 min read

As a robot faithful to its mission and amidst occasional gold estimations, Wall-E does not forget its primary task: sorting waste. It is during its expeditions to remote corners of the planet that he encounters the ultimate challenge that stimulates his curious mind. Among the objects he gathers, Wall-E comes across a diverse collection of old electronic components — some in good condition that he brings back to the base, and others defective that he stores in a corner.

Passionate about such objects, he begins to unscrew, pull apart, and carefully categorize each component into two distinct categories (binary classification): on one side the precious metal pieces (multiclass classification), and on the other the worthless plastic. The fundamental question is: how does he accomplish such a mission?

It is strongly recommended to have read Episodes I and II before continuing!

Wall-E’s New Horizons: Classification

The epic journey of our intrepid lone robot, Wall-E, takes on a fascinating new dimension as he explores the vast realm of classification. After brilliantly mastering the art of regression in the previous episode (see Wall-E: The Little Gold Miner), Wall-E courageously embarks on a new phase by delving into binary classification between precious metals and plastics. This first preparatory step marks the beginning of a more complex quest, where Wall-E boldly deploys his classification skills.

Building on his initial successes, Wall-E decides to broaden his scope by tackling the multi-class classification of metals, making the K-Nearest Neighbors (KNN) algorithm his preferred ally. This new challenge requires Wall-E to gain a deeper understanding, as he must not only distinguish between two categories but also classify different types of metals such as bronze, gold, and silver.

The Key Element: Always the Data

Meeting the Samples

Each example kk was an element with specific characteristics, such as density x(k)x^{(k)}, thermal conductivity x2(k)x_2^{(k)}, electrical conductivity x3(k)x_3^{(k)}, etc. To illustrate our case simply, we will consider only the density variable x(k)x^{(k)}.

For each sample kk, Wall-E notes whether it is a metal (labeled y(k)=1y^{(k)}=1) or plastic (labeled y(k)=0y^{(k)}=0). Here is a concrete example of 5 samples (among the 650 he already knows):

SampleDensityMaterial
12.165747Plastic
27.151579Metal
30.901240Plastic
419.24357Metal
512.54564Metal

Decision Boundaries

The difference with the previous problem is that here we need to define what we call a decision boundary. Instead of looking at each object individually, Wall-E decides to divide this space into different regions. Each region is intended to receive a specific type of object, either metal or plastic. The limits of these regions define decision boundaries.

When Wall-E draws these boundaries, he wants to ensure that similar objects fall in the same region. Ideally, the two groups should be perfectly separated, as if in distinct boxes. A simple straight line could do the trick.

A good decision boundary classifying an object as metal or plastic
A bad decision boundary classifying an object as metal or plastic

However, reality is not always so simple. Sometimes objects end up on the wrong side of the line. Wall-E is also aware that he should not draw zigzagging, overly complex boundaries — that would lead to overfitting.

Building the Binary Classification Model

Wall-E chooses a classic logistic regression model whose associated function (ranging between 0 and 1) is expressed as:

σ(x)=11+exp(x)\sigma(x) = \frac{1}{1+\exp(-x)}

This is the sigmoid function (named for its “S” shape).

Linear regression curve on material labels as a function of density
Credits: Disney/PIXAR

The logistic / sigmoid function.

Applied to our dataset, we obtain σ(z)=11+exp(z)\sigma(z) = \frac{1}{1+\exp(-z)}, where zz can be a linear function ax+bax+b or a polynomial ax2+bx+cax^2 + bx + c, with aa, bb, and cc as parameters to fit.

The big advantage of such a function is that we can easily define a decision boundary by setting a decision threshold value. If the probability is greater than 0.5, we classify the object as “metal”; below 0.5, as plastic.

Logistic regression curve on material labels as a function of density
Credits: Disney/PIXAR

Evaluating Predictions: the Cost Function

Wall-E builds his model, but he wants to evaluate the accuracy of his predictions. He could use a classic regression cost function, MSE (see Wall-E: The Little Gold Miner):

MSE(a,b)=1nk=1n(σ(x(k))y(k))2.\text{MSE}(a, b) = \frac{1}{n}\sum_{k=1}^{n}\left(\sigma(x^{(k)}) - y^{(k)}\right)^2.

However, this cost function is not convex; it has multiple local minima. This makes gradient descent inefficient — using the mountain-relief analogy, Wall-E might get trapped in a local minimum that is not necessarily the global one.

A non-convex function with several local minima and one global minimum.

Wall-E therefore introduces a new cost function based on the logarithm (negative log-likelihood):

L(a,b)=1nk=1n[y(k)log{σ(ax(k)+b)}+(1y(k))log{1σ(ax(k)+b)}]L(a,b) = -\frac{1}{n}\sum_{k=1}^n\left[y^{(k)}\log\left\{\sigma\left(ax^{(k)}+b\right)\right\} + \left(1-y^{(k)}\right)\log\left\{1-\sigma\left(ax^{(k)}+b\right)\right\}\right]

This function has the convexity property, meaning it has a single global minimum and no local minima. This makes gradient descent much more reliable for adjusting model parameters.

The first part of the cost function, when a label is 1, is represented by y(k)log{σ(x(k))}-y^{(k)}\log\left\{\sigma\left(x^{(k)}\right)\right\}. The second part handles the case where the label is 0: (1y(k))log{1σ(x(k))}-\left(1-y^{(k)}\right)\log\left\{1-\sigma\left(x^{(k)}\right)\right\}.

First part of the cost function
Second part of the cost function

When y(k)=1y^{(k)}=1, only the first part of the function acts; when y(k)=0y^{(k)}=0, only the second one. The total cost function is the average of these two parts, taken over all nn training examples, to obtain a global measure of how well the model fits the training data.

Re-Descending the Gradient

We do exactly the same thing as for regression but with the new cost function. The gradients are:

L(a,b)a=1nk=1n(σ(ax(k)+b)y(k))x(k)\frac{\partial L(a, b)}{\partial a} = \frac{1}{n}\sum_{k=1}^n\left(\sigma\left(ax^{(k)}+b\right) - y^{(k)}\right)x^{(k)} L(a,b)b=1nk=1n(σ(ax(k)+b)y(k))\frac{\partial L(a, b)}{\partial b} = \frac{1}{n}\sum_{k=1}^n\left(\sigma\left(ax^{(k)}+b\right) - y^{(k)}\right)

Parameter updates:

a=aδ×L(a,b)a,b=bδ×L(a,b)ba^* = a - \delta \times \frac{\partial L(a, b)}{\partial a}, \quad b^* = b - \delta \times \frac{\partial L(a, b)}{\partial b}

Credits: Disney/PIXAR

The Power of the Little Waste Sorter

In matrix notation, with the vector Y\mathbf{Y}, the matrix X\mathbf{X}, and the parameter vector P=[ab]\mathbf{P} = \begin{bmatrix} a \\ b \end{bmatrix}, the sigmoid is applied component-wise:

σ(X×P)=[11+exp((ax(1)+b))11+exp((ax(2)+b))11+exp((ax(n)+b))].\mathbf{\sigma(X \times P)} = \begin{bmatrix} \frac{1}{1+\exp(-(ax^{(1)}+b))} \\ \frac{1}{1+\exp(-(ax^{(2)}+b))} \\ \vdots \\ \frac{1}{1+\exp(-(ax^{(n)}+b))} \end{bmatrix}.

The cost function becomes:

L(P)=1n[Ylog{σ(X×P)}+(1Y)log{1σ(X×P)}]L(\mathbf{P}) = -\frac{1}{n}\left[\mathbf{Y} \cdot \log\left\{\sigma(\mathbf{X} \times \mathbf{P})\right\} + (\mathbf{1}-\mathbf{Y}) \cdot \log\left\{1-\sigma(\mathbf{X} \times \mathbf{P})\right\}\right]

and its gradient:

PL(P)=1nX(σ(X×P)Y)\frac{\partial}{\partial \mathbf{P}} L(\mathbf{P}) = \frac{1}{n}\mathbf{X} \cdot \left(\sigma(\mathbf{X} \times \mathbf{P}) - \mathbf{Y}\right)

with the update rule:

P=PδPL(P)\mathbf{P^*} = \mathbf{P} - \delta \frac{\partial}{\partial \mathbf{P}} L(\mathbf{P})

Final fit of the sigmoid function on the data.

The optimal parameters obtained after gradient descent are a=16.85a = 16.85 and b=9.71b = 9.71. The model can determine, with an overall accuracy of 93%, whether the supplied material is metal or plastic.

Beyond Waste Sorting: Determining the Type of Metal

Diving deeper into metal classification, Wall-E confronts a more complex challenge: determining the specific type of metal among a wide variety of alloys including bronze, gold, silver, and many others.

To meet this challenge, Wall-E’s tool of choice becomes the K-Nearest Neighbors (KNN) algorithm. Here is the list of all pure metals in Wall-E’s reference table:

MetalElectrical Conductivity (Giga S/m)Density
Steel1.57.500 - 8.100
Aluminum37.72.700
Silver6310.500
Beryllium31.31.848
Bronze7.48.400 - 9.200
Carbon (graphite)612.250
Copper59.68.960
Tin9.177.290
Iron9.937.860
Iridium19.722.560
Lithium10.85.30
Magnesium22.61.750
Mercury1.0413.545
Molybdenum18.710.200
Nickel14.38.900
Gold45.219.300
Osmium10.922.610
Palladium9.512.000
Platinum9.6621.450
Lead4.8111.350
Potassium13.90.850
Tantalum7.6116.600
Titanium2.344.500
Tungsten8.919.300
Uranium3.819.100
Vanadium4.896.100
Zinc16.67.150

This simulates metallic alloys, where each alloy is composed of a pure metal to be determined plus impurities slightly modifying its characteristics. Wall-E’s database has 300 samples for each type of metallic alloy. Here are five samples, each with its distinctive properties:

MetalElectrical Conductivity (Giga S/m)Density
Steel2.70937.7446
Vanadium5.80007.5000
Iron9.26008.4000
Gold43.00018.500
Bronze7.51328.7000

Wall-E’s goal is therefore to classify each metallic sample he finds — based on its density and electrical conductivity — into one of the pure-metal categories.

Contrast with Binary Classification: the Complexity of Multi-Class

After mastering binary classification to distinguish precious metals from plastics, Wall-E realizes that the next step — multi-class classification — is a more complex challenge. Where binary classification simply splits objects into two distinct categories, Wall-E must now differentiate between many specific types. The simple decision boundary of a straight line is no longer enough.

Type of metal as a function of density and electrical conductivity (Giga Siemens per meter). 300 samples for each alloy.

In this new territory, Wall-E must navigate a complex feature space where metals can overlap. This complexity requires a more refined approach — and this is where Wall-E turns to a method that takes the subtleties of relationships between metals into account.

The Power of Proximity: K-Nearest Neighbors

The essential idea behind KNN is to group similar objects in feature space. In our context, if a piece of bronze shares similar characteristics with other bronze pieces, those objects will be located near each other in this multidimensional space.

The process is fairly intuitive. When a new piece of metal must be classified, Wall-E measures its specific characteristics and positions it in feature space. The algorithm then identifies the kk nearest neighbors, and KNN assigns to the new piece the type of metal that gathers the most votes among those neighbors.

Classification using the KNN algorithm with 5 neighbors and 3 classes that depend on two features.

Metal Classification in Action: the Neighbors at Work

Wall-E searches his fairly extensive database, comprising various types of metals and alloys, each associated with specific characteristics such as electrical conductivity, density, and other unique properties. When a new piece of metal arrives, Wall-E activates the KNN algorithm:

Credits: Disney/PIXAR

With an optimal parameter k=20k = 20 neighbors, Wall-E can claim with around 95% confidence to identify any metallic alloy.

Classifying an unknown metallic sample by its density and electrical conductivity — turns out to be a gold alloy.

Classifying an unknown metallic sample: an iron alloy (70% iron, 20% bronze, 10% tin).

Classifying an unknown metallic sample: a vanadium alloy (70% vanadium, 25% tin, 5% iron).

The Art of Model Selection

Train and Test

Wall-E quickly grasps the importance of never evaluating his model on the same data used for training. He splits the dataset into two parts:

Credits: Disney/PIXAR

Model Validation

To tune hyperparameters (such as the number of KNN neighbors), Wall-E introduces a third section: the validation set. He compares different models — KNN with 2, 3, 20, or 100 neighbors — following this methodology:

  1. Train the models on the training set.
  2. Select the model with the best performance on the validation set.
  3. Evaluate that chosen model on the test set.

Validation and training curves: accuracy as a function of the number of neighbors used in the KNN algorithm.

More precisely, the validation curve for the number of neighbors kk in KNN answers key questions:

In Wall-E’s case, both curves quickly reach 95.7% accuracy at around 20 to 25 neighbors.

Cross-Validation

Wall-E uses the K-fold method (with 5 partitions). During training, the model is systematically trained on K1K-1 partitions and validated on the remaining one, repeating this process KK times. He also exploits Stratified K-fold, which preserves the class distribution within each fold.

Learning Curves

Curious to know whether his model could benefit from additional data, Wall-E examines the learning curves. While adding data initially improves performance, those benefits eventually plateau.

Credits: Disney/PIXAR
Validation and training curves: accuracy as a function of the number of samples

Examining this curve, he notices that a sufficiently accurate estimate could have been obtained with fewer than 1000 metal samples. Even accumulating 300 samples for each type (8100 objects total), accuracy plateaus at 95.7%.

The End of a Trilogy, the Beginning of a Technological Era

This last episode marks the conclusion of the captivating saga of the little robot Wall-E — an adventure that began with the foundations of machine learning. From his first steps in the realm of artificial intelligence, Wall-E has evolved through several chapters, exploring the basics of supervised learning, diving deep into regression, and finally climbing the complex peaks of classification.

This saga, rich in lessons, closes with the certainty that Wall-E is now ready to face new challenges in the complex world of artificial intelligence.

Credits: Disney/PIXAR

Bibliography