π Multi-Class Classification with a Neural Network on the Iris Dataset


- data exploration
- preprocessing
- model architecture
- model evaluation
1. Data Download
We start by downloading the Iris dataset from Kaggle. This dataset includes 150 samples from each of three Iris species (Iris-setosa, Iris-versicolor, and Iris-virginica), with four features per sample:
- Sepal length
- Sepal width
- Petal length
- Petal width
kaggle datasets download -d uciml/iris
unzip iris.zip
2. Data Exploration and EDA (Exploratory Data Analysis)
Understanding the structure and distribution of the data is crucial before building a model. In this section, we’ll inspect the dataset and generate meaningful insights.
Dataset Overview
df_org = pd.read_csv('Iris.csv')
df = df_org.copy()
df.head()
The dataset contains five columns: the four numerical features (sepal length, sepal width, petal length, petal width) and one categorical feature (Species
). Each flower species can be uniquely identified based on the measurements of these four features.
df.info()
This output shows the dataset consists of 150 entries with no missing values.
Class Distribution
df['Species'].value_counts()
The dataset is balanced, with 50 samples for each Iris species. This balance helps the model avoid bias toward any specific class during training.
Descriptive Statistics and Distributions
Before preprocessing, let’s visualize the distributions of the numeric features.
plot_columns(df, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"])
The KDE plots show that all four features exhibit reasonable distributions, with slight skewness in some variables. These plots help us understand feature ranges and potential outliers.
Outlier Detection and Removal
Outliers can negatively affect model performance. We detect and remove outliers using the Interquartile Range (IQR) method.
df = remove_anomalies(df, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"])
This step ensures that the extreme values are removed, making the data cleaner for model training.
Normalization
To ensure all features are on the same scale, we normalize the data. Normalization is especially important for neural networks because it helps the model converge faster and perform better.
df = normalize(df, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"])
At this point, all numeric columns have been standardized to have a mean of 0 and a standard deviation of 1.
3. Model Architecture
Why Neural Networks?
The Iris dataset is relatively small and has been historically solved using simpler algorithms like logistic regression or k-nearest neighbors. However, we are using a feedforward neural network (FNN) to showcase how deep learning can solve multi-class classification tasks, even on simpler datasets. The advantages of neural networks include flexibility and the ability to capture complex, nonlinear relationships in the data.
Defining the Neural Network
Our neural network is a simple feedforward architecture. Here’s the breakdown:
class NNMulticlassClassifier(nn.Module):
def __init__(self, input_size):
super(NNMulticlassClassifier, self).__init__()
self.model = nn.Sequential(
nn.Linear(input_size, 64), # Fully connected layer with 64 neurons
nn.BatchNorm1d(64), # Batch normalization for faster convergence
nn.ReLU(), # ReLU activation function
nn.Dropout(0.5), # Dropout to prevent overfitting
nn.Linear(64, 128), # Second fully connected layer with 128 neurons
nn.BatchNorm1d(128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 64), # Third fully connected layer
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(64, 3), # Output layer with 3 neurons (one per class)
nn.ReLU() # ReLU for the final output
)
def forward(self, data):
output = self.model(data)
return output
This model consists of three hidden layers, each followed by batch normalization and dropout to prevent overfitting. The ReLU activation function is used to introduce non-linearity into the network, helping it learn complex patterns in the data.
Explanation of Key Components:
- Linear Layers: Each fully connected (linear) layer connects every neuron from one layer to every neuron in the next layer.
- Batch Normalization: Helps the model converge faster and stabilize the training by normalizing the inputs to each layer.
- ReLU Activation: Introduces non-linearity, allowing the model to learn more complex relationships.
- Dropout: Randomly turns off a fraction of neurons during training, preventing overfitting and improving generalization.
- Output Layer: The output layer has 3 neurons, each corresponding to one of the three Iris species. The network will output a probability distribution across these classes.
4. Training the Model
Splitting the Data
Before training, we split the data into training, validation, and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.15, random_state=42)
We use 15% of the data for testing and an additional 15% from the training set for validation. This validation set is used during training to monitor overfitting and tune hyperparameters.
Model Training Loop
We train the model for 1500 epochs using Adam as the optimizer and cross-entropy as the loss function, which is ideal for multi-class classification tasks.
for epoch in range(epochs):
multiClf.train()
epoch_train_loss = 0
for batch_X, batch_y in train_loader:
optimizer.zero_grad()
batch_y_hat = multiClf(batch_X)
batch_loss = criterion(batch_y_hat, batch_y)
batch_loss.backward()
optimizer.step()
epoch_train_loss += batch_loss.item()
if epoch % 20 == 0:
correct_preds = 0
total_preds = 0
multiClf.eval()
with torch.no_grad():
for val_batch_X, val_batch_y in val_loader:
val_batch_y_hat = multiClf(val_batch_X)
_, pred = torch.max(val_batch_y_hat.data, 1)
total_preds += val_batch_y.size(0)
correct_preds += (pred == val_batch_y).sum().item()
epoch_accuracy = correct_preds / total_preds
print(f"After epoch {epoch}, train loss: {epoch_train_loss}, accuracy: {int(epoch_accuracy * 100)}%")
In this loop:
- Training: The model is trained by minimizing the cross-entropy loss.
- Validation: Every 20 epochs, we calculate the modelβs accuracy on the validation set to monitor its progress.
5. Model Evaluation
Training and Validation Performance
After training for 1500 epochs, the model’s training loss decreases significantly, and the accuracy steadily improves. Here’s a summary of the performance:
After epoch 0, train loss: 1.120, model accuracy: 65%.
After epoch 20, train loss: 0.977, model accuracy: 70%.
After epoch 200, train loss: 0.620, model accuracy: 95%.
After epoch 1000, train loss: 0.532, model accuracy: 95%.
The model achieves a training accuracy between 95% and 100% by the 1000th epoch. This high accuracy is a strong indicator that the network is learning the patterns effectively.
Plotting Loss and Accuracy
To visualize the training process, we plot the training loss and accuracy:
plt.plot(train_loss, label="Train loss", color="red")
plt.plot(model_accuracy, label="Model accuracy", color='blue')
plt.xlabel("epoch")
plt.ylabel("loss/accuracy")
plt.title("Loss and Accuracy for Feedforward Neural Network on Iris Data")
plt.legend()
plt.show()
As seen in the graph, both the loss decreases and the accuracy improves as the number of epochs increases. This signifies that the model is learning effectively.
Test Set Evaluation
After training, we evaluate the model on the test set to see how well it generalizes.
correct_preds = 0
total_preds = 0
multiClf.eval()
with torch.no_grad():
for test_X, test_y in test_loader:
test_y_hat = multiClf(test_X)
_, pred = torch.max(test_y_hat.data, 1)
total_preds += test_y.size(0)
correct_preds += (pred == test_y).sum().item
()
test_accuracy = correct_preds / total_preds
print(f"Test accuracy: {int(test_accuracy * 100)}%")
The model achieves a test accuracy of over 95%, indicating that it generalizes well to unseen data.
Conclusion
In this blog post, we demonstrated how to build a neural network for multi-class classification using the Iris dataset. We went through data exploration, preprocessing, model design, and evaluation. The model was able to achieve high accuracy, demonstrating the power of neural networks even on simpler datasets.
By following this workflow, you can extend these concepts to more complex datasets and tasks. Happy coding!