Steven Lora - MSIT 675 Project 2 - Generative Adversarial NetworksΒΆ
Generate fake Fashion MNIST imagesΒΆ
The goal of this project is to develop a Conditional Generative Adversarial Neural Network (CGAN) model to generate fake images of the following 3 items: Trouser (labeled 1), Pullover (labeled 2), and Sneaker (labeled 7) in the dataset by training a model using images from the Fashion MNIST dataset.
ImportΒΆ
# import libraries
import keras
from keras import layers, models, Input, optimizers, ops
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
Get dataΒΆ
Use the function get_data() to obtain images of Trousers, Pullovers, and Sneakers relabeled 0, 1, 2.
def get_data():
"""Returns images of Trousers, Pullovers, and Sneakers relabeled 0, 1, 2"""
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
all_images = np.concatenate([x_train, x_test])
all_labels = np.concatenate([y_train, y_test])
retained_indices = np.where(np.isin(all_labels, [1, 2, 7]))
retained_images = all_images[retained_indices]
retained_labels = all_labels[retained_indices]
# Map labels to new values
label_mapping = {1: 0, 2: 1, 7: 2}
mapped_labels = np.vectorize(label_mapping.get)(retained_labels)
ITEMS = ['Trouser', 'Pullover', 'Sneaker']
return retained_images, mapped_labels, ITEMS
all_images, all_class_labels, ITEMS = get_data()
print(f'Shape of images: {all_images.shape}')
print(f'Shape of labels: {all_class_labels.shape}')
print(f'Unique labels: {np.unique(all_class_labels)}')
print(f'Items: {[(i, item) for i,item in enumerate(ITEMS)]}')
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 29515/29515 ββββββββββββββββββββ 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26421880/26421880 ββββββββββββββββββββ 1s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 5148/5148 ββββββββββββββββββββ 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4422102/4422102 ββββββββββββββββββββ 0s 0us/step Shape of images: (21000, 28, 28) Shape of labels: (21000,) Unique labels: [0 1 2] Items: [(0, 'Trouser'), (1, 'Pullover'), (2, 'Sneaker')]
Display imagesΒΆ
You may use the function displayImages to display images
def displayImages(images, labels, nCols=10):
"""Displays images with labels (nCols per row)"""
nRows = np.ceil(len(labels)/nCols).astype('int') # number of rows
plt.figure(figsize=(nCols,nRows)) # figure size
for i in range(len(labels)):
plt.subplot(nRows,nCols,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(images[i], interpolation='spline16', cmap='gray_r')
plt.xlabel(f'{labels[i]}', fontsize=12)
plt.tight_layout()
plt.show()
return
# display the first k images with labels
k = 30
images = all_images[:k]
labels = [ITEMS[label] for label in all_class_labels[:k]]
displayImages(images, labels)
Specify parameters [2 Points]ΒΆ
Specify parameters to create your CGAN model in the code cell below
# Specify parameters to create your CGAN model in this code cell
batch_size = 64 # batch size used for training
num_channels = 1 # grayscale images (3 for RGB)
num_classes = 3 # 3 Classes (0, 'Trouser'), (1, 'Pullover'), (2, 'Sneaker')
image_size = 28 # width and height of images
latent_dim = 512 # number of dimensions of hypersphere for sampling
generator_in_channels = latent_dim + num_classes # number of channels in generator input
discriminator_in_channels = num_channels + num_classes # number of channels in discriminator input
In this section, we define the core parameters for our Conditional GAN model. The image shape is based on the Fashion MNIST dataset (28Γ28 grayscale), and we use a 512-dimensional noise vector as input to the generator. Since we are focusing on only three specific classes (Trouser, Pullover, Sneaker), the number of classes is set to 3. These parameters will guide the architecture and conditioning of both the generator and discriminator models.
Preprocess [3 Points]ΒΆ
Type in the code to preprocess the data in the code cell below. Scale the pixel values of images to [0, 1] range, add a channel dimension to the images, and one-hot encode the labels. Print the shape of processed images and the shape of processed labels.
# Code to preprocess the data
# Scale the pixel values to [0, 1] range
all_images = all_images.astype("float32") / 255.0
# add a channel dimension to he images
all_images = np.reshape(all_images, (-1, 28, 28, 1))
# one-hot encode the labels
all_labels = keras.utils.to_categorical(all_class_labels, 3)
# Create tf.data.Dataset
dataset = tf.data.Dataset.from_tensor_slices((all_images, all_labels))
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)
# print the shapes of the resulting images labels
print(f"Shape of images: {all_images.shape}")
print(f"Shape of labels: {all_labels.shape}")
Shape of images: (21000, 28, 28, 1) Shape of labels: (21000, 3)
In this section, we preprocess the dataset to prepare it for training. First, we normalize the image pixel values to the [0, 1] range by dividing by 255.0. Since the Fashion MNIST images are grayscale, we also add a single channel dimension to match the expected input shape for convolutional layers.
Next, we one-hot encode the class labels to use them effectively during training, especially for conditional generation. The final step converts the processed images and labels into a tf.data.Dataset, enabling efficient batching and shuffling during model training. The printed shapes confirm that the images are now in (21000, 28, 28, 1) format and the labels in (21000, 3) format, representing three target classes.
Create discriminator [5 Points]ΒΆ
In the code cell below create your discriminator model, print the shape of the model input, and display the summary of the model.
# Create the discriminator model
discriminator = keras.Sequential(
[
# input shape 28 x 28 x 4
keras.layers.InputLayer((28, 28, discriminator_in_channels)),
# First Concolution Layer
# Number of Parameters = (3 * 3 * 4 + 1) * 64 = 2,368
layers.Conv2D(64, (3, 3), strides=(2, 2), padding="same", name='Conv1'),
layers.LeakyReLU(negative_slope=0.2),
layers.Dropout(0.25),
# Output shape = 14 x 14 x 64
# Second Convolution Layer
# Number of Parameters = (3 * 3 * 64 + 1) * 126 = 73,856
layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same", name='Conv2'),
layers.LeakyReLU(negative_slope=0.2),
layers.Dropout(0.25),
# Output shape = 7 x 7 x 128
# Third Convolution Layer
# Number of Parameters = (3 * 3 * 128 + 1) * 256 = 295,168
layers.Conv2D(256, (3, 3), strides=(1, 1), padding="same", name='Conv3'),
layers.LeakyReLU(negative_slope=0.2),
layers.Dropout(0.25),
# Output shape = 7 x 7 x 256
# Fourth Convolution Layer
# Number of Parameters = (3 * 3 * 256 + 1) * 512 = 1,180,160
layers.Conv2D(512, (3, 3), strides=(1, 1), padding="same", name='Conv4'),
layers.LeakyReLU(negative_slope=0.2),
layers.Dropout(0.25),
# Output shape = 7 x 7 x 512
# Global Max Pooling takes max value from the 512 Channels
layers.GlobalMaxPooling2D(),
# Output score: logit(score = probability that image is real)
layers.Dense(1),
],
name="discriminator",
)
# print the shape of the model input
print(f'Input shape for discriminator: {discriminator.input_shape}')
# display the summary of the model.
discriminator.summary()
Input shape for discriminator: (None, 28, 28, 4)
Model: "discriminator"
ββββββββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββ³ββββββββββββββββββ β Layer (type) β Output Shape β Param # β β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© β Conv1 (Conv2D) β (None, 14, 14, 64) β 2,368 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu (LeakyReLU) β (None, 14, 14, 64) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β dropout (Dropout) β (None, 14, 14, 64) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β Conv2 (Conv2D) β (None, 7, 7, 128) β 73,856 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_1 (LeakyReLU) β (None, 7, 7, 128) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β dropout_1 (Dropout) β (None, 7, 7, 128) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β Conv3 (Conv2D) β (None, 7, 7, 256) β 295,168 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_2 (LeakyReLU) β (None, 7, 7, 256) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β dropout_2 (Dropout) β (None, 7, 7, 256) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β Conv4 (Conv2D) β (None, 7, 7, 512) β 1,180,160 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_3 (LeakyReLU) β (None, 7, 7, 512) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β dropout_3 (Dropout) β (None, 7, 7, 512) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β global_max_pooling2d β (None, 512) β 0 β β (GlobalMaxPooling2D) β β β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β dense (Dense) β (None, 1) β 513 β ββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββ
Total params: 1,552,065 (5.92 MB)
Trainable params: 1,552,065 (5.92 MB)
Non-trainable params: 0 (0.00 B)
In this section, we define the architecture of the discriminator model. The discriminator takes a 28Γ28 grayscale image with an additional channel for the class label (resulting in 4 channels total) and outputs a single scalar value indicating whether the input image is real or fake.
The architecture consists of four convolutional blocks, each followed by a LeakyReLU activation and Dropout for regularization. The number of filters increases progressively (64 β 128 β 256 β 512), allowing the model to learn increasingly abstract spatial features. After the final convolutional layer, a GlobalMaxPooling2D layer compresses the spatial features into a vector, which is passed through a Dense layer to produce the final real/fake prediction score.
The model summary confirms the input shape (28, 28, 4) and shows a total of approximately 1.55 million trainable parameters, indicating a reasonably expressive discriminator suited for this task.
Create generator [5 Points]ΒΆ
In the code cell below create your generator model, print the shape of the model input, and display the summary of the model.
# Create the generator model
generator = keras.Sequential(
[
# Input Layer 512 + 3 = 515
keras.layers.InputLayer((generator_in_channels,)),
layers.Dense(7 * 7 * latent_dim),
layers.BatchNormalization(),
layers.LeakyReLU(negative_slope=0.2),
layers.Reshape((7, 7, latent_dim)),
# Output shape 7 x 7 x 515
# First Convolution 2D Transpose Layer
layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same", name='C2DT_1'),
layers.BatchNormalization(),
layers.LeakyReLU(negative_slope=0.2),
# Output 14 x 14 x 128
# Second Convolution 2D Transpose Layer
layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), padding="same", name='C2DT_2'),
layers.BatchNormalization(),
layers.LeakyReLU(negative_slope=0.2),
# Output 28 x 28 x 64
# Third Convolution 2D Transpose Layer
layers.Conv2DTranspose(32, (4, 4), strides=(1, 1), padding="same", name='C2DT_3'),
layers.BatchNormalization(),
layers.LeakyReLU(negative_slope=0.2),
# Output 28 x 28 x 32
# Fourth Convolution 2D Transpose Layer
# layers.Conv2DTranspose(16, (4, 4), strides=(1, 1), padding="same", name='C2DT_4'),
# layers.BatchNormalization(),
# layers.LeakyReLU(negative_slope=0.2),
# Output 28 x 28 x 16
# Convolution Layer
layers.Conv2D(1, (3, 3), padding="same", activation="sigmoid", name='Conv1'),
],
name="generator",
)
# print the shape of the model input
print(f'Input shape for generator: {generator.input_shape}')
# display the summary of the model
print(generator.summary())
Input shape for generator: (None, 515)
Model: "generator"
ββββββββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββ³ββββββββββββββββββ β Layer (type) β Output Shape β Param # β β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© β dense_1 (Dense) β (None, 25088) β 12,945,408 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β batch_normalization β (None, 25088) β 100,352 β β (BatchNormalization) β β β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_4 (LeakyReLU) β (None, 25088) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β reshape (Reshape) β (None, 7, 7, 512) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β C2DT_1 (Conv2DTranspose) β (None, 14, 14, 128) β 1,048,704 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β batch_normalization_1 β (None, 14, 14, 128) β 512 β β (BatchNormalization) β β β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_5 (LeakyReLU) β (None, 14, 14, 128) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β C2DT_2 (Conv2DTranspose) β (None, 28, 28, 64) β 131,136 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β batch_normalization_2 β (None, 28, 28, 64) β 256 β β (BatchNormalization) β β β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_6 (LeakyReLU) β (None, 28, 28, 64) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β C2DT_3 (Conv2DTranspose) β (None, 28, 28, 32) β 32,800 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β batch_normalization_3 β (None, 28, 28, 32) β 128 β β (BatchNormalization) β β β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β leaky_re_lu_7 (LeakyReLU) β (None, 28, 28, 32) β 0 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β Conv1 (Conv2D) β (None, 28, 28, 1) β 289 β ββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββ
Total params: 14,259,585 (54.40 MB)
Trainable params: 14,208,961 (54.20 MB)
Non-trainable params: 50,624 (197.75 KB)
None
In this section, we define the architecture of the generator model, which takes as input a random noise vector concatenated with a one-hot encoded class label. The model learns to generate 28Γ28 grayscale images conditioned on the specified class.
The generator begins with a dense layer that projects the latent space into a 7Γ7Γlatent_dim feature map, followed by three Conv2DTranspose (also known as "deconvolution") layers to progressively upsample the image to the target resolution. Each upsampling layer is followed by Batch Normalization and a LeakyReLU activation to promote stable training and improve feature diversity.
Although a fourth Conv2DTranspose layer was originally considered, it was commented out during experimentation. Including the additional layer tended to degrade image quality, likely due to over-smoothing or unnecessary complexity given the 28Γ28 output resolution. Therefore, the final architecture retains only three upsampling blocks, which yielded sharper and more distinguishable results for the target classes (Trousers, Pullovers, Sneakers).
Create ConditionalGAN model [5 Points]ΒΆ
In the code cell below specify the create your conditional GAN model, and display the summary of the model.
# class ConditionalGAN
class ConditionalGAN(keras.Model):
def __init__(self, discriminator, generator, latent_dim):
super().__init__()
self.discriminator = discriminator
self.generator = generator
self.latent_dim = latent_dim
self.seed_generator = keras.random.SeedGenerator(1337)
self.gen_loss_tracker = keras.metrics.Mean(name="generator_loss")
self.disc_loss_tracker = keras.metrics.Mean(name="discriminator_loss")
@property
def metrics(self):
return [self.gen_loss_tracker, self.disc_loss_tracker]
def compile(self, d_optimizer, g_optimizer, loss_fn):
super().compile()
self.d_optimizer = d_optimizer
self.g_optimizer = g_optimizer
self.loss_fn = loss_fn
def train_step(self, data):
# Unpack the data.
real_images, one_hot_labels = data
# Add dummy dimensions to the labels so that they can be concatenated with
# the images. This is for the discriminator.
image_one_hot_labels = one_hot_labels[:, :, None, None]
image_one_hot_labels = ops.repeat(
image_one_hot_labels, repeats=[image_size * image_size]
)
image_one_hot_labels = ops.reshape(
image_one_hot_labels, (-1, image_size, image_size, num_classes)
)
# Sample random points in the latent space and concatenate the labels.
# This is for the generator.
batch_size = ops.shape(real_images)[0]
random_latent_vectors = keras.random.normal(
shape=(batch_size, self.latent_dim), seed=self.seed_generator
)
random_vector_labels = ops.concatenate(
[random_latent_vectors, one_hot_labels], axis=1
)
# Decode the noise (guided by labels) to fake images.
generated_images = self.generator(random_vector_labels)
# Combine them with real images. Note that we are concatenating the labels
# with these images here.
fake_image_and_labels = ops.concatenate(
[generated_images, image_one_hot_labels], -1
)
real_image_and_labels = ops.concatenate([real_images, image_one_hot_labels], -1)
combined_images = ops.concatenate(
[fake_image_and_labels, real_image_and_labels], axis=0
)
# Assemble labels discriminating real from fake images.
labels = ops.concatenate(
[ops.ones((batch_size, 1)), ops.zeros((batch_size, 1))], axis=0
)
# Train the discriminator.
with tf.GradientTape() as tape:
predictions = self.discriminator(combined_images)
d_loss = self.loss_fn(labels, predictions)
grads = tape.gradient(d_loss, self.discriminator.trainable_weights)
self.d_optimizer.apply_gradients(
zip(grads, self.discriminator.trainable_weights)
)
# Sample random points in the latent space.
random_latent_vectors = keras.random.normal(
shape=(batch_size, self.latent_dim), seed=self.seed_generator
)
random_vector_labels = ops.concatenate(
[random_latent_vectors, one_hot_labels], axis=1
)
# Assemble labels that say "all real images".
misleading_labels = ops.zeros((batch_size, 1))
# Train the generator (note that we should *not* update the weights
# of the discriminator)!
with tf.GradientTape() as tape:
fake_images = self.generator(random_vector_labels)
fake_image_and_labels = ops.concatenate(
[fake_images, image_one_hot_labels], -1
)
predictions = self.discriminator(fake_image_and_labels)
g_loss = self.loss_fn(misleading_labels, predictions)
grads = tape.gradient(g_loss, self.generator.trainable_weights)
self.g_optimizer.apply_gradients(zip(grads, self.generator.trainable_weights))
# Monitor loss.
self.gen_loss_tracker.update_state(g_loss)
self.disc_loss_tracker.update_state(d_loss)
return {
"g_loss": self.gen_loss_tracker.result(),
"d_loss": self.disc_loss_tracker.result(),
}
# create model
cond_gan = ConditionalGAN(
discriminator = discriminator, generator = generator, latent_dim = latent_dim
)
cond_gan.compile(
d_optimizer = keras.optimizers.Adam(learning_rate = 0.0003),
g_optimizer = keras.optimizers.Adam(learning_rate = 0.0003),
loss_fn = keras.losses.BinaryCrossentropy(from_logits=True),
)
# display the summary of the model
cond_gan.summary()
Model: "conditional_gan"
ββββββββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββ³ββββββββββββββββββ β Layer (type) β Output Shape β Param # β β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© β discriminator (Sequential) β (None, 1) β 1,552,065 β ββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ€ β generator (Sequential) β (None, 28, 28, 1) β 14,259,585 β ββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββ
Total params: 15,811,650 (60.32 MB)
Trainable params: 15,761,026 (60.12 MB)
Non-trainable params: 50,624 (197.75 KB)
In this section, we define the ConditionalGAN class, which encapsulates the custom training logic for our Conditional GAN using Keras' subclassed model API. The overall structure of this class is based on open-source examples provided in the official Keras documentation and tutorials on conditional GANs. We adapted the implementation to work with our specific dataset, generator and discriminator designs, and project parameters. Minor modifications were made to suit our data shape and training preferences.
The class tracks generator and discriminator losses as custom metrics and is compiled with two separate optimizers (one for each network) and a shared binary cross-entropy loss function.
The custom train_step() method controls the alternating training of both networks. Real and fake images are paired with label information using one-hot encoding and spatial expansion so that the discriminator receives both the image and class context. The generator is conditioned on the class labels via concatenation with noise vectors.
Though an alternative dual-input structure was explored, this one-hot + concatenation strategy proved more straightforward and yielded strong results during training.
Function to generate fake images [5 Points]ΒΆ
In the code cell below define a function generate_fake_images that takes 2 arguments CGAN_model and class_label_list and returns fake images generated by the CGAN_model corresponding to the labels specified in the list class_label_list
# Define function: generate_fake_images
def generate_fake_images(generator_model, class_label_list):
"""Returns fake images of digits in digit_list using model"""
labels = keras.utils.to_categorical(class_label_list, num_classes)
labels = ops.cast(labels, "float32")
noise = keras.random.normal(shape=(len(class_label_list), latent_dim))
noise_and_labels = ops.concatenate([noise, labels], 1)
fake = generator_model.predict(noise_and_labels, verbose=0)
return fake
This function generates fake images from the generator model based on a list of class labels. It first one-hot encodes the labels, samples random noise vectors, and concatenates the two. The combined input is then passed to the generator to produce class-conditioned outputs. This function is used to visualize the generatorβs performance during and after training.
Train model [5 Points]ΒΆ
In the code cell below type in your code to train the model over 30 epochs. After each epoch save the weights of the generator and display fake images generated for classes with labels specified in class_label_list.
# Train the model over 30 epochs.
class_label_list = 3*[0] + 3*[1] + 3*[2] # display 3 images of each class
print(f"class_label_list: {class_label_list}")
epochs = 30 # number of epochs specified by assignment
for i in range(epochs):
cond_gan.fit(dataset, epochs=1) # train model
cond_gan.generator.save_weights(f"generator_epoch_{i+1}.weights.h5") # save weights
images = generate_fake_images(cond_gan.generator, class_label_list) # fake images
print(f'Fake images after epoch {i+1}:') # epoch heading
displayImages(images, class_label_list) # display images
print()
class_label_list: [0, 0, 0, 1, 1, 1, 2, 2, 2] 329/329 ββββββββββββββββββββ 35s 55ms/step - d_loss: 0.2747 - g_loss: 5.4549 Fake images after epoch 1:
329/329 ββββββββββββββββββββ 13s 39ms/step - d_loss: 0.2564 - g_loss: 3.4446 Fake images after epoch 2:
329/329 ββββββββββββββββββββ 13s 40ms/step - d_loss: 0.4253 - g_loss: 2.2602 Fake images after epoch 3:
329/329 ββββββββββββββββββββ 14s 41ms/step - d_loss: 0.4115 - g_loss: 2.4745 Fake images after epoch 4:
329/329 ββββββββββββββββββββ 13s 40ms/step - d_loss: 0.4104 - g_loss: 2.0053 Fake images after epoch 5:
329/329 ββββββββββββββββββββ 13s 40ms/step - d_loss: 0.4202 - g_loss: 2.1133 Fake images after epoch 6:
329/329 ββββββββββββββββββββ 13s 40ms/step - d_loss: 0.4556 - g_loss: 1.9841 Fake images after epoch 7:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4009 - g_loss: 1.9964 Fake images after epoch 8:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4124 - g_loss: 2.0065 Fake images after epoch 9:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.3626 - g_loss: 2.0917 Fake images after epoch 10:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.3498 - g_loss: 2.3558 Fake images after epoch 11:
329/329 ββββββββββββββββββββ 14s 41ms/step - d_loss: 0.3973 - g_loss: 2.4597 Fake images after epoch 12:
329/329 ββββββββββββββββββββ 14s 41ms/step - d_loss: 0.4216 - g_loss: 1.9138 Fake images after epoch 13:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4130 - g_loss: 1.8849 Fake images after epoch 14:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4309 - g_loss: 1.8939 Fake images after epoch 15:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4244 - g_loss: 1.9779 Fake images after epoch 16:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4285 - g_loss: 1.7745 Fake images after epoch 17:
329/329 ββββββββββββββββββββ 14s 41ms/step - d_loss: 0.4279 - g_loss: 1.9556 Fake images after epoch 18:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4262 - g_loss: 1.8995 Fake images after epoch 19:
329/329 ββββββββββββββββββββ 13s 40ms/step - d_loss: 0.4291 - g_loss: 1.7767 Fake images after epoch 20:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4544 - g_loss: 1.6738 Fake images after epoch 21:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4406 - g_loss: 1.7131 Fake images after epoch 22:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4405 - g_loss: 1.6870 Fake images after epoch 23:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4561 - g_loss: 1.6931 Fake images after epoch 24:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4401 - g_loss: 1.7038 Fake images after epoch 25:
329/329 ββββββββββββββββββββ 14s 41ms/step - d_loss: 0.4221 - g_loss: 1.9136 Fake images after epoch 26:
329/329 ββββββββββββββββββββ 14s 41ms/step - d_loss: 0.4587 - g_loss: 1.6694 Fake images after epoch 27:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4585 - g_loss: 1.6672 Fake images after epoch 28:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4639 - g_loss: 1.6952 Fake images after epoch 29:
329/329 ββββββββββββββββββββ 13s 41ms/step - d_loss: 0.4380 - g_loss: 1.6682 Fake images after epoch 30:
This section performs the training of the Conditional GAN over 30 epochs. After each epoch, the generatorβs weights are saved and a batch of class-conditioned images is generated for visual evaluation. The printed loss values and generated images help monitor training progress.
It's important to note that GAN training is inherently adversarial and stochastic. The generator and discriminator are locked in a dynamic competition, where improvements in one lead to challenges for the other. As a result, the generatorβs performance may not consistently improve in every single epoch β some generated images may look better or worse than those from the previous epoch. However, the overall quality typically improves over time as the generator learns to better fool the discriminator.
This iterative, game-like training process is central to GANs and distinguishes them from traditional supervised models.
Use trained generator [5 Points]ΒΆ
Create a generator with weights saved in an epoch that you consider generates the most authentic fake images and use it to generate fake digits for classes specified in class_label_list.
# choosing the weights and defining new generator
chosen_epoch = 30 # chosen epoch
chosen_generator = generator # reused the same generator instance defined earlier to ensure the architecture matches
# Load the weights
chosen_generator.load_weights(f"generator_epoch_{chosen_epoch}.weights.h5")
# Display fake images generated for classes specified in class_label_list.
class_label_list = 10*[0] + 10*[1] + 10*[2]
print(f"class_label_list: {class_label_list}")
images = generate_fake_images(chosen_generator, class_label_list)
print(f'Fake images:')
displayImages(images, [ITEMS[i] for i in class_label_list])
class_label_list: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] Fake images:
In this final step, we load the generator weights from the epoch that produced the most convincing outputs (in this case, epoch 30). Using the generate_fake_images() function, we produce and visualize a batch of fake images conditioned on each of the three target classes.
This allows us to evaluate the final quality of the generator's outputs and verify that it has learned to produce distinct and class-specific representations. The generated trousers, pullovers, and sneakers demonstrate the model's ability to synthesize realistic images that align with the corresponding labels.
ConclusionΒΆ
In this project, we successfully implemented a Conditional Generative Adversarial Network (CGAN) to generate grayscale Fashion MNIST images conditioned on class labels. We focused on three specific classes β trousers, pullovers, and sneakers β and used a one-hot encoded label strategy to guide the generator and discriminator.
The generator was trained to produce images from random noise and class labels, while the discriminator learned to distinguish between real and synthetic images using both pixel content and label information. Over the course of 30 epochs, we observed the generator improve its ability to produce visually coherent and class-specific outputs.
While we explored more advanced generator and discriminator structures, we ultimately found that a simpler one-hot concatenation approach yielded the most stable results for this dataset. Loss values and generated image samples provided insight into the training dynamics, and the final model was able to consistently produce distinguishable images for each target class.
This project reinforced key concepts in GAN training, conditional generation, adversarial learning, and image preprocessing β all while demonstrating how generative models can be guided using label information to produce more meaningful outputs.
Future Improvements:
- Experiment with label embedding and dual-input models
- Add regularization techniques like spectral normalization
- Try more complex architectures (e.g., ResNet-based generators)
- Extend to color datasets or higher resolutions