Automatic lipreading using convolutional neural networks and orthogonal moments

Y. Ait Khayi; O. El Ogri; J. El-Mekkaoui; M. Benslimane; A. Hjouji

MMC.

2025;

: pp. 90–100

Received: January 07, 2024
Revised: January 10, 2025
Accepted: January 13, 2025

Ait Khayi Y., El Ogri O., El-Mekkaoui J., Benslimane M., Hjouji A. Automatic lipreading using convolutional neural networks and orthogonal moments. Mathematical Modeling and Computing. Vol. 12, No. 1, pp. 90–100 (2025)

Authors:

1

TI, Laboratory, EST, Sidi Mohamed Ben Abdellah University, Fez, Morocco

2

TI, Laboratory, EST, Sidi Mohamed Ben Abdellah University, Fez, Morocco; CED-ST, STIC, Laboratory of Information, Signals, Automation and Cognitivism LISAC, Dhar El Mahrez Faculty of Science, Sidi Mohamed Ben Abdellah-Fez University, Fez, Morocco

3

TI, Laboratory, EST, Sidi Mohamed Ben Abdellah University, Fez, Morocco

4

TI, Laboratory, EST, Sidi Mohamed Ben Abdellah University, Fez, Morocco

5

Sidi Mohamed Ben Abdellah University, Fez, Morocco

Recently, understanding speech from a speaker's mouth using only visual interpretation of the lips movement has become one of the most complex computer vision tasks. In the present paper, we suggest a new approach named Optimized Quaternion Meixner Moments Convolutional Neural Networks (OQMMCNN) in order to develop a lipreading system based only on video images. This approach is based on Quaternion Meixner Moments (QMMs) that we use as a filter in the Convolutional Neural Networks (CNN) architecture. In addition, we use the Grey Wolf optimization algorithm (GWO) with the aim of ensuring high accuracy of classification through the optimization of the Quaternion Meixner Moments (QMMs) filter local parameters. We show that this method is an effective solution to decrease the high dimensionality of the video images and the training time. This approach is tested on a public dataset and compared to different methods that use complex models and deep architecture in the literature.

quaternion representations