Human age estimation from a photo using neural networks

Yevhenii Verbenko

ORCID: https://orcid.org/0009-0001-8438-4990

Oles Honchar Dnipro National University

Olga Matsuga

ORCID: https://orcid.org/0000-0001-6444-8566

Oles Honchar Dnipro National University

The aim of this work was to compare different neural network architectures for the task of age estimation from face images. Since age is a continuous variable, the task of determining a human age from images of their face is treated as a regression problem. The UTKFaces dataset was used in this work. This dataset contains 24,000 annotated images categorized by gender, race, and age. To solve the task, four architectures were chosen for training: AlexNet, VGG-19, ResNet-50, and Inception-v4. These convolutional neural network architectures have shown significant advancements in image classification on the ImageNet dataset. AlexNet introduced the use of ReLU activation, dropout, and max-pooling, while VGG-19 emphasized deeper architectures with small filters. ResNet-50 addressed the vanishing gradient problem with residual connections, and Inception-v4 improved efficiency and gradient flow with optimized blocks and residual connections. In all networks, the last layer was replaced with a fully connected layer with one neuron and a linear activation function. The mean squared error (MSE) was used as the loss function during training, and the mean absolute error (MAE) was used as the quality metric. The data was split into training and testing sets in a 90% to 10% ratio. Before training, the images were normalized and resized to fit each neural network’s requirements. AlexNet and VGG-19 were trained using the SGD optimizer with a learning rate of 0.2, ResNet-50 was trained using the Adam optimizer with a learning rate of 0.02, and Inception-v4 was trained using the Adadelta optimizer with a learning rate of 0.02. These methods and their parameters were chosen as the best after computational experiments. Each network was trained for a different number of epochs, as needed for convergence. After training, VGG-19 and ResNet-50 achieved MAE values of 2.7 and 3.5, respectively, while Inception-v4 had an MAE of 3.87. AlexNet exhibited significant overfitting. ResNet-50 processed images the fastest.



RELATED PAPERS