πŸ“ˆ Multi-Class Classification with a Neural Network on the Iris Dataset

Aug 15, 2023Β·
Moitree Basu
Moitree Basu
Β· 6 min read
In this post, we will discuss the use of simple feedforward neural networks (FNNs) to solve multi-class classification tasks, on Iris dataset, specifically, identifying the species of Iris flowers based on their sepal and petal measurements. We will proceed through the following steps:
  • data exploration
  • preprocessing
  • model architecture
  • model evaluation

1. Data Download

We start by downloading the Iris dataset from Kaggle. This dataset includes 150 samples from each of three Iris species (Iris-setosa, Iris-versicolor, and Iris-virginica), with four features per sample:

  • Sepal length
  • Sepal width
  • Petal length
  • Petal width
kaggle datasets download -d uciml/iris
unzip iris.zip

2. Data Exploration and EDA (Exploratory Data Analysis)

Understanding the structure and distribution of the data is crucial before building a model. In this section, we’ll inspect the dataset and generate meaningful insights.

Dataset Overview

df_org = pd.read_csv('Iris.csv')
df = df_org.copy()
df.head()

The dataset contains five columns: the four numerical features (sepal length, sepal width, petal length, petal width) and one categorical feature (Species). Each flower species can be uniquely identified based on the measurements of these four features.

df.info()

This output shows the dataset consists of 150 entries with no missing values.

Class Distribution

df['Species'].value_counts()

The dataset is balanced, with 50 samples for each Iris species. This balance helps the model avoid bias toward any specific class during training.

Descriptive Statistics and Distributions

Before preprocessing, let’s visualize the distributions of the numeric features.

plot_columns(df, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"])

Distributions

The KDE plots show that all four features exhibit reasonable distributions, with slight skewness in some variables. These plots help us understand feature ranges and potential outliers.

Outlier Detection and Removal

Outliers can negatively affect model performance. We detect and remove outliers using the Interquartile Range (IQR) method.

df = remove_anomalies(df, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"])

This step ensures that the extreme values are removed, making the data cleaner for model training.

Normalization

To ensure all features are on the same scale, we normalize the data. Normalization is especially important for neural networks because it helps the model converge faster and perform better.

df = normalize(df, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"])

At this point, all numeric columns have been standardized to have a mean of 0 and a standard deviation of 1.


3. Model Architecture

Why Neural Networks?

The Iris dataset is relatively small and has been historically solved using simpler algorithms like logistic regression or k-nearest neighbors. However, we are using a feedforward neural network (FNN) to showcase how deep learning can solve multi-class classification tasks, even on simpler datasets. The advantages of neural networks include flexibility and the ability to capture complex, nonlinear relationships in the data.

Defining the Neural Network

Our neural network is a simple feedforward architecture. Here’s the breakdown:

class NNMulticlassClassifier(nn.Module):
    def __init__(self, input_size):
        super(NNMulticlassClassifier, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_size, 64),  # Fully connected layer with 64 neurons
            nn.BatchNorm1d(64),         # Batch normalization for faster convergence
            nn.ReLU(),                  # ReLU activation function
            nn.Dropout(0.5),            # Dropout to prevent overfitting
            nn.Linear(64, 128),         # Second fully connected layer with 128 neurons
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, 64),         # Third fully connected layer
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(64, 3),           # Output layer with 3 neurons (one per class)
            nn.ReLU()                   # ReLU for the final output
        )
    def forward(self, data):
        output = self.model(data)
        return output

This model consists of three hidden layers, each followed by batch normalization and dropout to prevent overfitting. The ReLU activation function is used to introduce non-linearity into the network, helping it learn complex patterns in the data.

Explanation of Key Components:

  • Linear Layers: Each fully connected (linear) layer connects every neuron from one layer to every neuron in the next layer.
  • Batch Normalization: Helps the model converge faster and stabilize the training by normalizing the inputs to each layer.
  • ReLU Activation: Introduces non-linearity, allowing the model to learn more complex relationships.
  • Dropout: Randomly turns off a fraction of neurons during training, preventing overfitting and improving generalization.
  • Output Layer: The output layer has 3 neurons, each corresponding to one of the three Iris species. The network will output a probability distribution across these classes.

4. Training the Model

Splitting the Data

Before training, we split the data into training, validation, and test sets.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.15, random_state=42)

We use 15% of the data for testing and an additional 15% from the training set for validation. This validation set is used during training to monitor overfitting and tune hyperparameters.

Model Training Loop

We train the model for 1500 epochs using Adam as the optimizer and cross-entropy as the loss function, which is ideal for multi-class classification tasks.

for epoch in range(epochs):
    multiClf.train()
    epoch_train_loss = 0
    
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        batch_y_hat = multiClf(batch_X)
        batch_loss = criterion(batch_y_hat, batch_y)
        batch_loss.backward()
        optimizer.step()
        epoch_train_loss += batch_loss.item()

    if epoch % 20 == 0:
        correct_preds = 0
        total_preds = 0
        multiClf.eval()
        with torch.no_grad():
            for val_batch_X, val_batch_y in val_loader:
                val_batch_y_hat = multiClf(val_batch_X)
                _, pred = torch.max(val_batch_y_hat.data, 1)
                total_preds += val_batch_y.size(0)
                correct_preds += (pred == val_batch_y).sum().item()

        epoch_accuracy = correct_preds / total_preds
        print(f"After epoch {epoch}, train loss: {epoch_train_loss}, accuracy: {int(epoch_accuracy * 100)}%")

In this loop:

  • Training: The model is trained by minimizing the cross-entropy loss.
  • Validation: Every 20 epochs, we calculate the model’s accuracy on the validation set to monitor its progress.

5. Model Evaluation

Training and Validation Performance

After training for 1500 epochs, the model’s training loss decreases significantly, and the accuracy steadily improves. Here’s a summary of the performance:

After epoch 0, train loss: 1.120, model accuracy: 65%.
After epoch 20, train loss: 0.977, model accuracy: 70%.
After epoch 200, train loss: 0.620, model accuracy: 95%.
After epoch 1000, train loss: 0.532, model accuracy: 95%.

The model achieves a training accuracy between 95% and 100% by the 1000th epoch. This high accuracy is a strong indicator that the network is learning the patterns effectively.

Plotting Loss and Accuracy

To visualize the training process, we plot the training loss and accuracy:

plt.plot(train_loss, label="Train loss", color="red")
plt.plot(model_accuracy, label="Model accuracy", color='blue')
plt.xlabel("epoch")
plt.ylabel("loss/accuracy")
plt.title("Loss and Accuracy for Feedforward Neural Network on Iris Data")
plt.legend()
plt.show()

Training Performance

As seen in the graph, both the loss decreases and the accuracy improves as the number of epochs increases. This signifies that the model is learning effectively.

Test Set Evaluation

After training, we evaluate the model on the test set to see how well it generalizes.

correct_preds = 0
total_preds = 0
multiClf.eval()
with torch.no_grad():
    for test_X, test_y in test_loader:
        test_y_hat = multiClf(test_X)
        _, pred = torch.max(test_y_hat.data, 1)
        total_preds += test_y.size(0)
        correct_preds += (pred == test_y).sum().item

()

test_accuracy = correct_preds / total_preds
print(f"Test accuracy: {int(test_accuracy * 100)}%")

The model achieves a test accuracy of over 95%, indicating that it generalizes well to unseen data.


Conclusion

In this blog post, we demonstrated how to build a neural network for multi-class classification using the Iris dataset. We went through data exploration, preprocessing, model design, and evaluation. The model was able to achieve high accuracy, demonstrating the power of neural networks even on simpler datasets.

By following this workflow, you can extend these concepts to more complex datasets and tasks. Happy coding!