Phone: (+61 8) 6488 4775
Fax: (+61 8) 6488 1089
Audio-visual Biometrics Using Reliability-based Late Fusion and Deep Neural Networks
Online fraudulent activities are rapidly increasing with the growing use of web based applications. This is one example area that explains why biometrics are becoming essential in various commercial, government and forensic applications to verify the identity of an unknown person. Audio-visual biometrics provides a natural choice for person recognition as both inputs (i.e., speech and facial image) are non-intrusive and provide complimentary and correlated information. The recent advancements in mobile phone technology and the emergence of low-cost data acquisition devices have facilitated further the non-intrusive data acquisition for audio-visual biometric systems. However, the captured data may be of poor quality due to, in the case of face recognition, variations in the pose, illumination and background. A quality based fusion approach can be used as a solution to this problem: the estimation of a quality index for each input and the use of these indices in the fusion of the two modalities. However, measuring the quality at the signal level is difficult, particularly for the visual inputs, because the source of variation (e.g., illumination, pose and background) is difficult to model. This thesis analyses the impact of noisy inputs on the matching scores and presents a reliability-based score-level fusion. Then, a late fusion framework is proposed to incorporate both score- and rank-level fusion. In addition, a multimodal deep neural network which infers joint features (i.e., feature-level fusion) is trained using a novel three-step algorithm.
This research studies person recognition using audio-visual signals captured using mobile phones. Our proposed methods can protect valuable information or systems making use of unobtrusive data modalities.