Transfer learning experiment: ResNet with CIFAR-10
Abstract:
Using transfer learning, i trained a ResNet-50 with weights from the imagenet competition to the CIFAR-10 which has 10 classes only as opposed to the 1000 classes in the competition, the result was a Top-1 accuracy of 90%.
Introduction:
Making and training a model from scratch is a very time consuming task let alone a CNN in which depth is very important.
- This is explained by the variety of the implementation choices:
- The design and the architecture of the network.
- Regularization and Optimization
- Hyper parameter tuning
thatās why we use already trained models and thatās called transfer learning, in this article i will be covering how i trained a Resnet-50 model for the CIFAR-10 dataset in Keras and the results.
Materials and Methods:
The first thing to consider in transfer learning is choosing the model, for keras i had multiple choices of already trained models in the keras.applications
API[1].
For this project i chosen the ResNet50[2] because of itās minimal size and effeciency. Other models that i considered were EffecientNets[3] which where introduced in 2019, i didnāt opt for them because i was already having good results with ResNet and i donāt think they will be of much improvement although this has to be tested.
Iāve organized the project into four parts: fetching data, pre-processing data, fetching the base model and integrating it and lastly the training process.
Fetching data:
Since iām using CIFAR-10[4], iām fetching the data through keras with itās keras.datasets
API, which returns training and validation data which are 50000 images for training and 10000 for validation.
Preprocessing:
For labels i had to apply one-hot encoding and for the images i had to apply the same process as the resnet paper which consists of changing the color channels placements from RGB to BGR then you zero center each color channel (subtracting the mean) with respect to the imagenet dataset.
Modeling:
For modeling i first fetched the resnet model with ready weights then i add a layer on top to change the image size to fit the model.
I added the final softmax layer then I bundled all the layers together into a new model, another important note is that iām returning model and base model (resnet) for ability to freeze/unfreeze the base model.
Training:
This is the most important and most time-consuming part of transfer learning, as here where the transfer happens, first i compiled the model with very low learning rate of 0.00001 and no frozen layers as in all the parameters are trainable, this process is called fine-tuning.
From my testing doing this really helped the model performance and i think this is because the model is readjusting the weights for a new problem domain, and from my testing 2 epochs of fine-tuning were optimal.
after the initial fine tuning i froze the base model and only trained the last added layer which pushed the accuracy further, with this the training is done.
Important:
This method was not recommended by the keras/tensorflow[5] tutorial instead they recommend doing fine-tuning after training frozen layers, this is because in the tutorial they didnāt use the top layer of the base model and added their own layers on top of the model.
Results:
With this approach i got an accuracy of 90% on the validation dataset and 92% accuracy in the training dataset, from my observation this can be further improved by performing fine-tuning again but i didnāt experiment with this.