Steven Lora - MSIT 675 Project 2 - Generative Adversarial NetworksΒΆ

Generate fake Fashion MNIST imagesΒΆ

The goal of this project is to develop a Conditional Generative Adversarial Neural Network (CGAN) model to generate fake images of the following 3 items: Trouser (labeled 1), Pullover (labeled 2), and Sneaker (labeled 7) in the dataset by training a model using images from the Fashion MNIST dataset.

ImportΒΆ

InΒ [Β ]:
# import libraries

import keras
from keras import layers, models, Input, optimizers, ops
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

Get dataΒΆ

Use the function get_data() to obtain images of Trousers, Pullovers, and Sneakers relabeled 0, 1, 2.

InΒ [Β ]:
def get_data():
  """Returns images of Trousers, Pullovers, and Sneakers relabeled 0, 1, 2"""
  (x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
  all_images = np.concatenate([x_train, x_test])
  all_labels = np.concatenate([y_train, y_test])
  retained_indices = np.where(np.isin(all_labels, [1, 2, 7]))
  retained_images = all_images[retained_indices]
  retained_labels = all_labels[retained_indices]
  # Map labels to new values
  label_mapping = {1: 0, 2: 1, 7: 2}
  mapped_labels = np.vectorize(label_mapping.get)(retained_labels)
  ITEMS = ['Trouser', 'Pullover', 'Sneaker']
  return retained_images, mapped_labels, ITEMS

all_images, all_class_labels, ITEMS = get_data()

print(f'Shape of images: {all_images.shape}')
print(f'Shape of labels: {all_class_labels.shape}')
print(f'Unique labels: {np.unique(all_class_labels)}')
print(f'Items: {[(i, item) for i,item in enumerate(ITEMS)]}')
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Shape of images: (21000, 28, 28)
Shape of labels: (21000,)
Unique labels: [0 1 2]
Items: [(0, 'Trouser'), (1, 'Pullover'), (2, 'Sneaker')]

Display imagesΒΆ

You may use the function displayImages to display images

InΒ [Β ]:
def displayImages(images, labels, nCols=10):
    """Displays images with labels (nCols per row)"""
    nRows = np.ceil(len(labels)/nCols).astype('int') # number of rows
    plt.figure(figsize=(nCols,nRows)) # figure size
    for i in range(len(labels)):
        plt.subplot(nRows,nCols,i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(images[i], interpolation='spline16', cmap='gray_r')
        plt.xlabel(f'{labels[i]}', fontsize=12)
    plt.tight_layout()
    plt.show()
    return

# display the first k images with labels
k = 30
images = all_images[:k]
labels = [ITEMS[label] for label in all_class_labels[:k]]
displayImages(images, labels)
No description has been provided for this image

Specify parameters [2 Points]ΒΆ

Specify parameters to create your CGAN model in the code cell below

InΒ [Β ]:
# Specify parameters to create your CGAN model in this code cell

batch_size = 64 # batch size used for training
num_channels = 1 # grayscale images (3 for RGB)
num_classes = 3 # 3 Classes (0, 'Trouser'), (1, 'Pullover'), (2, 'Sneaker')
image_size = 28 # width and height of images
latent_dim = 512 # number of dimensions of hypersphere for sampling
generator_in_channels = latent_dim + num_classes # number of channels in generator input
discriminator_in_channels = num_channels + num_classes # number of channels in discriminator input

In this section, we define the core parameters for our Conditional GAN model. The image shape is based on the Fashion MNIST dataset (28Γ—28 grayscale), and we use a 512-dimensional noise vector as input to the generator. Since we are focusing on only three specific classes (Trouser, Pullover, Sneaker), the number of classes is set to 3. These parameters will guide the architecture and conditioning of both the generator and discriminator models.

Preprocess [3 Points]ΒΆ

Type in the code to preprocess the data in the code cell below. Scale the pixel values of images to [0, 1] range, add a channel dimension to the images, and one-hot encode the labels. Print the shape of processed images and the shape of processed labels.

InΒ [Β ]:
# Code to preprocess the data

# Scale the pixel values to [0, 1] range
all_images = all_images.astype("float32") / 255.0

# add a channel dimension to he images
all_images = np.reshape(all_images, (-1, 28, 28, 1))

# one-hot encode the labels
all_labels = keras.utils.to_categorical(all_class_labels, 3)

# Create tf.data.Dataset
dataset = tf.data.Dataset.from_tensor_slices((all_images, all_labels))
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)

# print the shapes of the resulting images labels
print(f"Shape of images: {all_images.shape}")
print(f"Shape of labels: {all_labels.shape}")
Shape of images: (21000, 28, 28, 1)
Shape of labels: (21000, 3)

In this section, we preprocess the dataset to prepare it for training. First, we normalize the image pixel values to the [0, 1] range by dividing by 255.0. Since the Fashion MNIST images are grayscale, we also add a single channel dimension to match the expected input shape for convolutional layers.

Next, we one-hot encode the class labels to use them effectively during training, especially for conditional generation. The final step converts the processed images and labels into a tf.data.Dataset, enabling efficient batching and shuffling during model training. The printed shapes confirm that the images are now in (21000, 28, 28, 1) format and the labels in (21000, 3) format, representing three target classes.

Create discriminator [5 Points]ΒΆ

In the code cell below create your discriminator model, print the shape of the model input, and display the summary of the model.

InΒ [Β ]:
# Create the discriminator model
discriminator = keras.Sequential(
    [
        # input shape 28 x 28 x 4
        keras.layers.InputLayer((28, 28, discriminator_in_channels)),

        # First Concolution Layer
        # Number of Parameters = (3 * 3 * 4 + 1) * 64 = 2,368
        layers.Conv2D(64, (3, 3), strides=(2, 2), padding="same", name='Conv1'),
        layers.LeakyReLU(negative_slope=0.2),
        layers.Dropout(0.25),
        # Output shape = 14 x 14 x 64

        # Second Convolution Layer
        # Number of Parameters = (3 * 3 * 64 + 1) * 126 = 73,856
        layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same", name='Conv2'),
        layers.LeakyReLU(negative_slope=0.2),
        layers.Dropout(0.25),
        # Output shape = 7 x 7 x 128

        # Third Convolution Layer
        # Number of Parameters = (3 * 3 * 128 + 1) * 256 = 295,168
        layers.Conv2D(256, (3, 3), strides=(1, 1), padding="same", name='Conv3'),
        layers.LeakyReLU(negative_slope=0.2),
        layers.Dropout(0.25),
        # Output shape = 7 x 7 x 256

        # Fourth Convolution Layer
        # Number of Parameters = (3 * 3 * 256 + 1) * 512 = 1,180,160
        layers.Conv2D(512, (3, 3), strides=(1, 1), padding="same", name='Conv4'),
        layers.LeakyReLU(negative_slope=0.2),
        layers.Dropout(0.25),
        # Output shape = 7 x 7 x 512

        # Global Max Pooling takes max value from the 512 Channels
        layers.GlobalMaxPooling2D(),
        # Output score: logit(score = probability that image is real)
        layers.Dense(1),
    ],
    name="discriminator",
)

# print the shape of the model input
print(f'Input shape for discriminator: {discriminator.input_shape}')

# display the summary of the model.
discriminator.summary()
Input shape for discriminator: (None, 28, 28, 4)
Model: "discriminator"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ Conv1 (Conv2D)                       β”‚ (None, 14, 14, 64)          β”‚           2,368 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu (LeakyReLU)              β”‚ (None, 14, 14, 64)          β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dropout (Dropout)                    β”‚ (None, 14, 14, 64)          β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Conv2 (Conv2D)                       β”‚ (None, 7, 7, 128)           β”‚          73,856 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_1 (LeakyReLU)            β”‚ (None, 7, 7, 128)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dropout_1 (Dropout)                  β”‚ (None, 7, 7, 128)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Conv3 (Conv2D)                       β”‚ (None, 7, 7, 256)           β”‚         295,168 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_2 (LeakyReLU)            β”‚ (None, 7, 7, 256)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dropout_2 (Dropout)                  β”‚ (None, 7, 7, 256)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Conv4 (Conv2D)                       β”‚ (None, 7, 7, 512)           β”‚       1,180,160 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_3 (LeakyReLU)            β”‚ (None, 7, 7, 512)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dropout_3 (Dropout)                  β”‚ (None, 7, 7, 512)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ global_max_pooling2d                 β”‚ (None, 512)                 β”‚               0 β”‚
β”‚ (GlobalMaxPooling2D)                 β”‚                             β”‚                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense (Dense)                        β”‚ (None, 1)                   β”‚             513 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 1,552,065 (5.92 MB)
 Trainable params: 1,552,065 (5.92 MB)
 Non-trainable params: 0 (0.00 B)

In this section, we define the architecture of the discriminator model. The discriminator takes a 28Γ—28 grayscale image with an additional channel for the class label (resulting in 4 channels total) and outputs a single scalar value indicating whether the input image is real or fake.

The architecture consists of four convolutional blocks, each followed by a LeakyReLU activation and Dropout for regularization. The number of filters increases progressively (64 β†’ 128 β†’ 256 β†’ 512), allowing the model to learn increasingly abstract spatial features. After the final convolutional layer, a GlobalMaxPooling2D layer compresses the spatial features into a vector, which is passed through a Dense layer to produce the final real/fake prediction score.

The model summary confirms the input shape (28, 28, 4) and shows a total of approximately 1.55 million trainable parameters, indicating a reasonably expressive discriminator suited for this task.

Create generator [5 Points]ΒΆ

In the code cell below create your generator model, print the shape of the model input, and display the summary of the model.

InΒ [Β ]:
# Create the generator model
generator = keras.Sequential(
    [
        # Input Layer 512 + 3 = 515
        keras.layers.InputLayer((generator_in_channels,)),
        layers.Dense(7 * 7 * latent_dim),
        layers.BatchNormalization(),
        layers.LeakyReLU(negative_slope=0.2),
        layers.Reshape((7, 7, latent_dim)),
        # Output shape 7 x 7 x 515


        # First Convolution 2D Transpose Layer
        layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same", name='C2DT_1'),
        layers.BatchNormalization(),
        layers.LeakyReLU(negative_slope=0.2),
        # Output 14 x 14 x 128

        # Second Convolution 2D Transpose Layer
        layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), padding="same", name='C2DT_2'),
        layers.BatchNormalization(),
        layers.LeakyReLU(negative_slope=0.2),
        # Output 28 x 28 x 64

        # Third Convolution 2D Transpose Layer
        layers.Conv2DTranspose(32, (4, 4), strides=(1, 1), padding="same", name='C2DT_3'),
        layers.BatchNormalization(),
        layers.LeakyReLU(negative_slope=0.2),
        # Output 28 x 28 x 32

        # Fourth Convolution 2D Transpose Layer
        # layers.Conv2DTranspose(16, (4, 4), strides=(1, 1), padding="same", name='C2DT_4'),
        # layers.BatchNormalization(),
        # layers.LeakyReLU(negative_slope=0.2),
        # Output 28 x 28 x 16

        # Convolution Layer
        layers.Conv2D(1, (3, 3), padding="same", activation="sigmoid", name='Conv1'),
    ],
    name="generator",
)

# print the shape of the model input
print(f'Input shape for generator: {generator.input_shape}')

# display the summary of the model
print(generator.summary())
Input shape for generator: (None, 515)
Model: "generator"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ dense_1 (Dense)                      β”‚ (None, 25088)               β”‚      12,945,408 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ batch_normalization                  β”‚ (None, 25088)               β”‚         100,352 β”‚
β”‚ (BatchNormalization)                 β”‚                             β”‚                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_4 (LeakyReLU)            β”‚ (None, 25088)               β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ reshape (Reshape)                    β”‚ (None, 7, 7, 512)           β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ C2DT_1 (Conv2DTranspose)             β”‚ (None, 14, 14, 128)         β”‚       1,048,704 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ batch_normalization_1                β”‚ (None, 14, 14, 128)         β”‚             512 β”‚
β”‚ (BatchNormalization)                 β”‚                             β”‚                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_5 (LeakyReLU)            β”‚ (None, 14, 14, 128)         β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ C2DT_2 (Conv2DTranspose)             β”‚ (None, 28, 28, 64)          β”‚         131,136 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ batch_normalization_2                β”‚ (None, 28, 28, 64)          β”‚             256 β”‚
β”‚ (BatchNormalization)                 β”‚                             β”‚                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_6 (LeakyReLU)            β”‚ (None, 28, 28, 64)          β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ C2DT_3 (Conv2DTranspose)             β”‚ (None, 28, 28, 32)          β”‚          32,800 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ batch_normalization_3                β”‚ (None, 28, 28, 32)          β”‚             128 β”‚
β”‚ (BatchNormalization)                 β”‚                             β”‚                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ leaky_re_lu_7 (LeakyReLU)            β”‚ (None, 28, 28, 32)          β”‚               0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Conv1 (Conv2D)                       β”‚ (None, 28, 28, 1)           β”‚             289 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 14,259,585 (54.40 MB)
 Trainable params: 14,208,961 (54.20 MB)
 Non-trainable params: 50,624 (197.75 KB)
None

In this section, we define the architecture of the generator model, which takes as input a random noise vector concatenated with a one-hot encoded class label. The model learns to generate 28Γ—28 grayscale images conditioned on the specified class.

The generator begins with a dense layer that projects the latent space into a 7Γ—7Γ—latent_dim feature map, followed by three Conv2DTranspose (also known as "deconvolution") layers to progressively upsample the image to the target resolution. Each upsampling layer is followed by Batch Normalization and a LeakyReLU activation to promote stable training and improve feature diversity.

Although a fourth Conv2DTranspose layer was originally considered, it was commented out during experimentation. Including the additional layer tended to degrade image quality, likely due to over-smoothing or unnecessary complexity given the 28Γ—28 output resolution. Therefore, the final architecture retains only three upsampling blocks, which yielded sharper and more distinguishable results for the target classes (Trousers, Pullovers, Sneakers).

Create ConditionalGAN model [5 Points]ΒΆ

In the code cell below specify the create your conditional GAN model, and display the summary of the model.

InΒ [Β ]:
# class ConditionalGAN

class ConditionalGAN(keras.Model):
    def __init__(self, discriminator, generator, latent_dim):
        super().__init__()
        self.discriminator = discriminator
        self.generator = generator
        self.latent_dim = latent_dim
        self.seed_generator = keras.random.SeedGenerator(1337)
        self.gen_loss_tracker = keras.metrics.Mean(name="generator_loss")
        self.disc_loss_tracker = keras.metrics.Mean(name="discriminator_loss")

    @property
    def metrics(self):
        return [self.gen_loss_tracker, self.disc_loss_tracker]

    def compile(self, d_optimizer, g_optimizer, loss_fn):
        super().compile()
        self.d_optimizer = d_optimizer
        self.g_optimizer = g_optimizer
        self.loss_fn = loss_fn

    def train_step(self, data):
        # Unpack the data.
        real_images, one_hot_labels = data

        # Add dummy dimensions to the labels so that they can be concatenated with
        # the images. This is for the discriminator.
        image_one_hot_labels = one_hot_labels[:, :, None, None]
        image_one_hot_labels = ops.repeat(
            image_one_hot_labels, repeats=[image_size * image_size]
        )
        image_one_hot_labels = ops.reshape(
            image_one_hot_labels, (-1, image_size, image_size, num_classes)
        )

        # Sample random points in the latent space and concatenate the labels.
        # This is for the generator.
        batch_size = ops.shape(real_images)[0]
        random_latent_vectors = keras.random.normal(
            shape=(batch_size, self.latent_dim), seed=self.seed_generator
        )
        random_vector_labels = ops.concatenate(
            [random_latent_vectors, one_hot_labels], axis=1
        )

        # Decode the noise (guided by labels) to fake images.
        generated_images = self.generator(random_vector_labels)

        # Combine them with real images. Note that we are concatenating the labels
        # with these images here.
        fake_image_and_labels = ops.concatenate(
            [generated_images, image_one_hot_labels], -1
        )
        real_image_and_labels = ops.concatenate([real_images, image_one_hot_labels], -1)
        combined_images = ops.concatenate(
            [fake_image_and_labels, real_image_and_labels], axis=0
        )

        # Assemble labels discriminating real from fake images.
        labels = ops.concatenate(
            [ops.ones((batch_size, 1)), ops.zeros((batch_size, 1))], axis=0
        )

        # Train the discriminator.
        with tf.GradientTape() as tape:
            predictions = self.discriminator(combined_images)
            d_loss = self.loss_fn(labels, predictions)
        grads = tape.gradient(d_loss, self.discriminator.trainable_weights)
        self.d_optimizer.apply_gradients(
            zip(grads, self.discriminator.trainable_weights)
        )

        # Sample random points in the latent space.
        random_latent_vectors = keras.random.normal(
            shape=(batch_size, self.latent_dim), seed=self.seed_generator
        )
        random_vector_labels = ops.concatenate(
            [random_latent_vectors, one_hot_labels], axis=1
        )

        # Assemble labels that say "all real images".
        misleading_labels = ops.zeros((batch_size, 1))

        # Train the generator (note that we should *not* update the weights
        # of the discriminator)!
        with tf.GradientTape() as tape:
            fake_images = self.generator(random_vector_labels)
            fake_image_and_labels = ops.concatenate(
                [fake_images, image_one_hot_labels], -1
            )
            predictions = self.discriminator(fake_image_and_labels)
            g_loss = self.loss_fn(misleading_labels, predictions)
        grads = tape.gradient(g_loss, self.generator.trainable_weights)
        self.g_optimizer.apply_gradients(zip(grads, self.generator.trainable_weights))

        # Monitor loss.
        self.gen_loss_tracker.update_state(g_loss)
        self.disc_loss_tracker.update_state(d_loss)
        return {
            "g_loss": self.gen_loss_tracker.result(),
            "d_loss": self.disc_loss_tracker.result(),
        }

# create model
cond_gan = ConditionalGAN(
    discriminator = discriminator, generator = generator, latent_dim = latent_dim
)

cond_gan.compile(
    d_optimizer = keras.optimizers.Adam(learning_rate = 0.0003),
    g_optimizer = keras.optimizers.Adam(learning_rate = 0.0003),
    loss_fn = keras.losses.BinaryCrossentropy(from_logits=True),
    )

# display the summary of the model
cond_gan.summary()
Model: "conditional_gan"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ discriminator (Sequential)           β”‚ (None, 1)                   β”‚       1,552,065 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ generator (Sequential)               β”‚ (None, 28, 28, 1)           β”‚      14,259,585 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 15,811,650 (60.32 MB)
 Trainable params: 15,761,026 (60.12 MB)
 Non-trainable params: 50,624 (197.75 KB)

In this section, we define the ConditionalGAN class, which encapsulates the custom training logic for our Conditional GAN using Keras' subclassed model API. The overall structure of this class is based on open-source examples provided in the official Keras documentation and tutorials on conditional GANs. We adapted the implementation to work with our specific dataset, generator and discriminator designs, and project parameters. Minor modifications were made to suit our data shape and training preferences.

The class tracks generator and discriminator losses as custom metrics and is compiled with two separate optimizers (one for each network) and a shared binary cross-entropy loss function.

The custom train_step() method controls the alternating training of both networks. Real and fake images are paired with label information using one-hot encoding and spatial expansion so that the discriminator receives both the image and class context. The generator is conditioned on the class labels via concatenation with noise vectors.

Though an alternative dual-input structure was explored, this one-hot + concatenation strategy proved more straightforward and yielded strong results during training.

Function to generate fake images [5 Points]ΒΆ

In the code cell below define a function generate_fake_images that takes 2 arguments CGAN_model and class_label_list and returns fake images generated by the CGAN_model corresponding to the labels specified in the list class_label_list

InΒ [Β ]:
# Define function: generate_fake_images
def generate_fake_images(generator_model, class_label_list):
  """Returns fake images of digits in digit_list using model"""
  labels = keras.utils.to_categorical(class_label_list, num_classes)
  labels = ops.cast(labels, "float32")
  noise = keras.random.normal(shape=(len(class_label_list), latent_dim))
  noise_and_labels = ops.concatenate([noise, labels], 1)
  fake = generator_model.predict(noise_and_labels, verbose=0)
  return fake

This function generates fake images from the generator model based on a list of class labels. It first one-hot encodes the labels, samples random noise vectors, and concatenates the two. The combined input is then passed to the generator to produce class-conditioned outputs. This function is used to visualize the generator’s performance during and after training.

Train model [5 Points]ΒΆ

In the code cell below type in your code to train the model over 30 epochs. After each epoch save the weights of the generator and display fake images generated for classes with labels specified in class_label_list.

InΒ [Β ]:
# Train the model over 30 epochs.

class_label_list = 3*[0] + 3*[1] + 3*[2] # display 3 images of each class
print(f"class_label_list: {class_label_list}")

epochs = 30 # number of epochs specified by assignment

for i in range(epochs):
  cond_gan.fit(dataset, epochs=1) # train model
  cond_gan.generator.save_weights(f"generator_epoch_{i+1}.weights.h5") # save weights
  images = generate_fake_images(cond_gan.generator, class_label_list) # fake images
  print(f'Fake images after epoch {i+1}:') # epoch heading
  displayImages(images, class_label_list) # display images
  print()
class_label_list: [0, 0, 0, 1, 1, 1, 2, 2, 2]
329/329 ━━━━━━━━━━━━━━━━━━━━ 35s 55ms/step - d_loss: 0.2747 - g_loss: 5.4549
Fake images after epoch 1:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 39ms/step - d_loss: 0.2564 - g_loss: 3.4446
Fake images after epoch 2:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 40ms/step - d_loss: 0.4253 - g_loss: 2.2602
Fake images after epoch 3:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 14s 41ms/step - d_loss: 0.4115 - g_loss: 2.4745
Fake images after epoch 4:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 40ms/step - d_loss: 0.4104 - g_loss: 2.0053
Fake images after epoch 5:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 40ms/step - d_loss: 0.4202 - g_loss: 2.1133
Fake images after epoch 6:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 40ms/step - d_loss: 0.4556 - g_loss: 1.9841
Fake images after epoch 7:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4009 - g_loss: 1.9964
Fake images after epoch 8:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4124 - g_loss: 2.0065
Fake images after epoch 9:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.3626 - g_loss: 2.0917
Fake images after epoch 10:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.3498 - g_loss: 2.3558
Fake images after epoch 11:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 14s 41ms/step - d_loss: 0.3973 - g_loss: 2.4597
Fake images after epoch 12:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 14s 41ms/step - d_loss: 0.4216 - g_loss: 1.9138
Fake images after epoch 13:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4130 - g_loss: 1.8849
Fake images after epoch 14:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4309 - g_loss: 1.8939
Fake images after epoch 15:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4244 - g_loss: 1.9779
Fake images after epoch 16:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4285 - g_loss: 1.7745
Fake images after epoch 17:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 14s 41ms/step - d_loss: 0.4279 - g_loss: 1.9556
Fake images after epoch 18:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4262 - g_loss: 1.8995
Fake images after epoch 19:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 40ms/step - d_loss: 0.4291 - g_loss: 1.7767
Fake images after epoch 20:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4544 - g_loss: 1.6738
Fake images after epoch 21:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4406 - g_loss: 1.7131
Fake images after epoch 22:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4405 - g_loss: 1.6870
Fake images after epoch 23:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4561 - g_loss: 1.6931
Fake images after epoch 24:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4401 - g_loss: 1.7038
Fake images after epoch 25:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 14s 41ms/step - d_loss: 0.4221 - g_loss: 1.9136
Fake images after epoch 26:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 14s 41ms/step - d_loss: 0.4587 - g_loss: 1.6694
Fake images after epoch 27:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4585 - g_loss: 1.6672
Fake images after epoch 28:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4639 - g_loss: 1.6952
Fake images after epoch 29:
No description has been provided for this image
329/329 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - d_loss: 0.4380 - g_loss: 1.6682
Fake images after epoch 30:
No description has been provided for this image

This section performs the training of the Conditional GAN over 30 epochs. After each epoch, the generator’s weights are saved and a batch of class-conditioned images is generated for visual evaluation. The printed loss values and generated images help monitor training progress.

It's important to note that GAN training is inherently adversarial and stochastic. The generator and discriminator are locked in a dynamic competition, where improvements in one lead to challenges for the other. As a result, the generator’s performance may not consistently improve in every single epoch β€” some generated images may look better or worse than those from the previous epoch. However, the overall quality typically improves over time as the generator learns to better fool the discriminator.

This iterative, game-like training process is central to GANs and distinguishes them from traditional supervised models.

Use trained generator [5 Points]ΒΆ

Create a generator with weights saved in an epoch that you consider generates the most authentic fake images and use it to generate fake digits for classes specified in class_label_list.

InΒ [Β ]:
# choosing the weights and defining new generator
chosen_epoch = 30 # chosen epoch
chosen_generator = generator # reused the same generator instance defined earlier to ensure the architecture matches

# Load the weights
chosen_generator.load_weights(f"generator_epoch_{chosen_epoch}.weights.h5")

# Display fake images generated for classes specified in class_label_list.
class_label_list = 10*[0] + 10*[1] + 10*[2]
print(f"class_label_list: {class_label_list}")

images = generate_fake_images(chosen_generator, class_label_list)
print(f'Fake images:')
displayImages(images, [ITEMS[i] for i in class_label_list])
class_label_list: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Fake images:
No description has been provided for this image

In this final step, we load the generator weights from the epoch that produced the most convincing outputs (in this case, epoch 30). Using the generate_fake_images() function, we produce and visualize a batch of fake images conditioned on each of the three target classes.

This allows us to evaluate the final quality of the generator's outputs and verify that it has learned to produce distinct and class-specific representations. The generated trousers, pullovers, and sneakers demonstrate the model's ability to synthesize realistic images that align with the corresponding labels.

ConclusionΒΆ

In this project, we successfully implemented a Conditional Generative Adversarial Network (CGAN) to generate grayscale Fashion MNIST images conditioned on class labels. We focused on three specific classes β€” trousers, pullovers, and sneakers β€” and used a one-hot encoded label strategy to guide the generator and discriminator.

The generator was trained to produce images from random noise and class labels, while the discriminator learned to distinguish between real and synthetic images using both pixel content and label information. Over the course of 30 epochs, we observed the generator improve its ability to produce visually coherent and class-specific outputs.

While we explored more advanced generator and discriminator structures, we ultimately found that a simpler one-hot concatenation approach yielded the most stable results for this dataset. Loss values and generated image samples provided insight into the training dynamics, and the final model was able to consistently produce distinguishable images for each target class.

This project reinforced key concepts in GAN training, conditional generation, adversarial learning, and image preprocessing β€” all while demonstrating how generative models can be guided using label information to produce more meaningful outputs.


Future Improvements:

  • Experiment with label embedding and dual-input models
  • Add regularization techniques like spectral normalization
  • Try more complex architectures (e.g., ResNet-based generators)
  • Extend to color datasets or higher resolutions