hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Faraj, Maycel Isaac
Alternative names
Publications (10 of 11) Show all publications
Faraj, M. I. & Bigun, J. (2009). Lip Motion Features for Biometric Person Recognition. In: Alan Wee-Chung Liew & Shilin Wang (Ed.), Visual Speech Recognition: Lip Segmentation and Mapping (pp. 495-532). Hershey, PA: Medical Information Science Reference
Open this publication in new window or tab >>Lip Motion Features for Biometric Person Recognition
2009 (English)In: Visual Speech Recognition: Lip Segmentation and Mapping / [ed] Alan Wee-Chung Liew & Shilin Wang, Hershey, PA: Medical Information Science Reference , 2009, p. 495-532Chapter in book (Other academic)
Abstract [en]

The present chapter reports on the use of lip motion as a stand alone biometric modality as well as a modality integrated with audio speech for identity recognition using digit recognition as a support. First, the auhtors estimate motion vectors from images of lip movements. The motion is modeled as the distribution of apparent line velocities in the movement of brightness patterns in an image. Then, they construct compact lip-motion features from the regional statistics of the local velocities. These can be used as alone or merged with audio features to recognize identity or the uttered digit. The author’s present person recognition results using the XM2VTS database representing the video and audio data of 295 people. Furthermore, we present results on digit recognition when it is used in a text prompted mode to verify the liveness of the user. Such user challenges have the intention to reduce replay attack risks of the audio system. © 2009, IGI Global.

Place, publisher, year, edition, pages
Hershey, PA: Medical Information Science Reference, 2009
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-14939 (URN)10.4018/978-1-60566-186-5.ch017 (DOI)2-s2.0-84900290172 (Scopus ID)9781605661865 (ISBN)9781605661872 (ISBN)
Available from: 2011-04-04 Created: 2011-04-04 Last updated: 2018-03-23Bibliographically approved
Faraj, M. I. (2008). Lip-motion biometrics for audio-visual identity recognition. (Doctoral dissertation). Göteborg: Chalmers university of technology
Open this publication in new window or tab >>Lip-motion biometrics for audio-visual identity recognition
2008 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Biometric recognition systems have been established as powerful security tools to prevent unknown users from entering high risk systems and areas. They are increasingly being utilized in surveillance and access management (city centers, banks, etc.) by using individuals' physical or biological characteristics. The present study reports on the use of lip motion as a standalone biometric modality as well as a modality integrated with audio speech for identity and digit recognition. First, we estimate motion vectors from a sequence of lip-movement images. The motion is modelled as the distribution of apparent line velocities in the movement of brightness patterns in an image. Then, we construct compact lip-motion features from the regional statistics of the local velocities. These can be used alone or merged with audio features to recognize individuals or speech (digits). In this work, we utilized two classifiers for identification and verification of identity as well as with digit recognition. Although the study is focused on processing lip movements in a video sequence, significant speech processing is a prerequisite given that the contribution of video analysis to speech analysis is studied in conjunction with recognition of humans and what they say (digits). Such integration is necessary to understand multimodel biometric systems to the benefit of recognition performance and robustness against noise. Extensive experiments utilizing one of the largest available databases, XM2VTS, are presented.

Place, publisher, year, edition, pages
Göteborg: Chalmers university of technology, 2008. p. 161
Series
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie, ISSN 0346-718X ; 2842
Keywords
Biometrics, Lip motion, Audio-visual signals, Speech recognition, Speaker recognition, Digit recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-1980 (URN)2082/2375 (Local ID)978-91-7385-161-9 (ISBN)2082/2375 (Archive number)2082/2375 (OAI)
Public defence
2008-09-16, Rum R1107, Högskolan i Halmstad, Kristian IV:s väg 3, Halmstad, 13:15 (English)
Opponent
Available from: 2008-09-29 Created: 2008-09-29 Last updated: 2018-03-23Bibliographically approved
Faraj, M. I. & Bigun, J. (2007). Audio–visual person authentication using lip-motion from orientation maps. Pattern Recognition Letters, 28(11), 1368-1382
Open this publication in new window or tab >>Audio–visual person authentication using lip-motion from orientation maps
2007 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 28, no 11, p. 1368-1382Article in journal (Refereed) Published
Abstract [en]

This paper describes a new identity authentication technique by a synergetic use of lip-motion and speech. The lip-motion is defined as the distribution of apparent velocities in the movement of brightness patterns in an image and is estimated by computing the velocity components of the structure tensor by 1D processing, in 2D manifolds. Since the velocities are computed without extracting the speaker’s lip-contours, more robust visual features can be obtained in comparison to motion features extracted from lip-contours. The motion estimations are performed in a rectangular lip-region, which affords increased computational efficiency. A person authentication implementation based on lip-movements and speech is presented along with experiments exhibiting a recognition rate of 98%. Besides its value in authentication, the technique can be used naturally to evaluate the “liveness” of someone speaking as it can be used in text-prompted dialogue. The XM2VTS database was used for performance quantification as it is currently the largest publicly available database (≈300 persons) containing both lip-motion and speech. Comparisons with other techniques are presented.

Place, publisher, year, edition, pages
Amsterdam: North-Holland, 2007
Keywords
Audio–visual recognition, Biometrics, Biometric recognition, Speaker verification, Speaker authentication, Person identification, Lip-movements, Motion, Structure tensor, Orientation, Optical flow, Hidden Markov model, Gaussian Markov model
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-1335 (URN)10.1016/j.patrec.2007.02.017 (DOI)000247807500013 ()2-s2.0-34249752774 (Scopus ID)2082/1714 (Local ID)2082/1714 (Archive number)2082/1714 (OAI)
Available from: 2008-04-16 Created: 2008-04-16 Last updated: 2018-03-23Bibliographically approved
Faraj, M. I. & Bigun, J. (2007). Lip Biometrics for Digit Recognition. In: Computer Analysis of Images and Patterns, Proceedings: . Paper presented at 12th International Conference on Computer Analysis of Images and Patterns, Vienna, AUSTRIA, AUG 27-29, 2007 (pp. 360-365). Berlin: Springer Berlin/Heidelberg, 4673
Open this publication in new window or tab >>Lip Biometrics for Digit Recognition
2007 (English)In: Computer Analysis of Images and Patterns, Proceedings, Berlin: Springer Berlin/Heidelberg, 2007, Vol. 4673, p. 360-365Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a speaker-independent audio-visual digit recognition system that utilizes speech and visual lip signals. The extracted visual features are based on line-motion estimation obtained from video sequences with low resolution (128 ×128 pixels) to increase the robustness of audio recognition. The core experiments investigate lip motion biometrics as stand-alone as well as merged modality in speech recognition system. It uses Support Vector Machines, showing favourable experimental results with digit recognition featuring 83% to 100% on the XM2VTS database depending on the amount of available visual information.

Place, publisher, year, edition, pages
Berlin: Springer Berlin/Heidelberg, 2007
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 4673
Keywords
Image processing, Computer vision, Optical pattern recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-2126 (URN)10.1007/978-3-540-74272-2_45 (DOI)000249585600045 ()2-s2.0-38149002975 (Scopus ID)2082/2521 (Local ID)978-3-540-74271-5 (ISBN)2082/2521 (Archive number)2082/2521 (OAI)
Conference
12th International Conference on Computer Analysis of Images and Patterns, Vienna, AUSTRIA, AUG 27-29, 2007
Available from: 2008-11-12 Created: 2008-11-12 Last updated: 2018-03-23Bibliographically approved
Kollreider, K., Fronthaler, H., Faraj, M. & Bigun, J. (2007). Real-Time Face Detection and Motion Analysis With Application in “Liveness” Assessment. IEEE Transactions on Information Forensics and Security, 2(3 part 2), 548-558
Open this publication in new window or tab >>Real-Time Face Detection and Motion Analysis With Application in “Liveness” Assessment
2007 (English)In: IEEE Transactions on Information Forensics and Security, ISSN 1556-6013, E-ISSN 1556-6021, Vol. 2, no 3 part 2, p. 548-558Article in journal (Refereed) Published
Abstract [en]

A robust face detection technique along with mouth localization, processing every frame in real time (video rate), is presented. Moreover, it is exploited for motion analysis onsite to verify "liveness" as well as to achieve lip reading of digits. A methodological novelty is the suggested quantized angle features ("quangles") being designed for illumination invariance without the need for preprocessing (e.g., histogram equalization). This is achieved by using both the gradient direction and the double angle direction (the structure tensor angle), and by ignoring the magnitude of the gradient. Boosting techniques are applied in a quantized feature space. A major benefit is reduced processing time (i.e., that the training of effective cascaded classifiers is feasible in very short time, less than 1 h for data sets of order 104). Scale invariance is implemented through the use of an image scale pyramid. We propose "liveness" verification barriers as applications for which a significant amount of computation is avoided when estimating motion. Novel strategies to avert advanced spoofing attempts (e.g., replayed videos which include person utterances) are demonstrated. We present favorable results on face detection for the YALE face test set and competitive results for the CMU-MIT frontal face test set as well as on "liveness" verification barriers.

Place, publisher, year, edition, pages
New York: IEEE Press, 2007
Keywords
AdaBoost, antispoofing, face detection, landmark detection, lip reading, liveness, object detection, optical flow of lines, quantized angles, real-time processing, support vector machine, SVM
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-2021 (URN)10.1109/TIFS.2007.902037 (DOI)000248832500007 ()2-s2.0-34548094310 (Scopus ID)2082/2416 (Local ID)2082/2416 (Archive number)2082/2416 (OAI)
Note

©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Available from: 2008-10-06 Created: 2008-10-06 Last updated: 2018-03-23Bibliographically approved
Faraj, M. I. & Bigun, J. (2007). Speaker and Digit Recognition by Audio-Visual Lip Biometrics. In: Lee, SW and Li, SZ (Ed.), Advances in biometrics: international conference, ICB 2007, Seoul, Korea, August 27-29, 2007 ; proceedings. Paper presented at International Conference on Biometrics Location, Seoul, SOUTH KOREA, AUG 27-29, 2007 (pp. 1016-1024). Berlin: Springer
Open this publication in new window or tab >>Speaker and Digit Recognition by Audio-Visual Lip Biometrics
2007 (English)In: Advances in biometrics: international conference, ICB 2007, Seoul, Korea, August 27-29, 2007 ; proceedings / [ed] Lee, SW and Li, SZ, Berlin: Springer, 2007, p. 1016-1024Conference paper, Published paper (Other (popular science, discussion, etc.))
Abstract [en]

This paper proposes a new robust bi-modal audio visual digit and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of digit and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for digit recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.

Place, publisher, year, edition, pages
Berlin: Springer, 2007
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; Volume 4642/2007
Keywords
Biometric identification, Pattern recognition systems, Computer networks
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-2129 (URN)10.1007/978-3-540-74549-5_106 (DOI)000249584900106 ()2-s2.0-37849028022 (Scopus ID)2082/2524 (Local ID)978-3-540-74548-8 (ISBN)2082/2524 (Archive number)2082/2524 (OAI)
Conference
International Conference on Biometrics Location, Seoul, SOUTH KOREA, AUG 27-29, 2007
Available from: 2008-11-12 Created: 2008-11-12 Last updated: 2018-03-23Bibliographically approved
Faraj, M. & Bigun, J. (2007). Synergy of lip motion and acoustic features in biometric speech and speaker recognition. I.E.E.E. transactions on computers (Print), 56(9), 1169-1175
Open this publication in new window or tab >>Synergy of lip motion and acoustic features in biometric speech and speaker recognition
2007 (English)In: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 56, no 9, p. 1169-1175Article in journal (Refereed) Published
Abstract [en]

This paper presents the scheme and evaluation of a robust audio-visual digit-and-speaker-recognition system using lip motion and speech biometrics. Moreover, a liveness verification barrier based on a person's lip movement is added to the system to guard against advanced spoofing attempts such as replayed videos. The acoustic and visual features are integrated at the feature level and evaluated first by a support vector machine for digit and speaker identification and, then, by a Gaussian mixture model for speaker verification. Based on ap300 different personal identities, this paper represents, to our knowledge, the first extensive study investigating the added value of lip motion features for speaker and speech-recognition applications. Digit recognition and person-identification and verification experiments are conducted on the publicly available XM2VTS database showing favorable results (speaker verification is 98 percent, speaker identification is 100 percent, and digit identification is 83 percent to 100 percent).

Place, publisher, year, edition, pages
New York: IEEE Press, 2007
Keywords
GMM, SVM, Speech recognition, biometrics, Lip motion, Lip reading, Motion estimation, Normal image flow, normal image velocity, Speaker recognition
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-2058 (URN)10.1109/TC.2007.1074 (DOI)000248208300003 ()2-s2.0-34548205797 (Scopus ID)2082/2453 (Local ID)2082/2453 (Archive number)2082/2453 (OAI)
Note

©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Available from: 2008-10-17 Created: 2008-10-17 Last updated: 2018-03-23Bibliographically approved
Teferi, D., Faraj, M. I. & Bigun, J. (2007). Text Driven Face-Video Synthesis Using GMM and Spatial Correlation. In: Ersboll, B K, Pedersen, K S (Ed.), Image analysis: 15th Scandinavian Conference, SCIA 2007, Aalborg, Denmark, June 10-14, 2007 ; proceedings. Paper presented at 15th Scandinavian Conference on Image Analysis, Aalborg, Denmark, June 10-14, 2007 (pp. 572-580). Berlin: Springer Berlin/Heidelberg
Open this publication in new window or tab >>Text Driven Face-Video Synthesis Using GMM and Spatial Correlation
2007 (English)In: Image analysis: 15th Scandinavian Conference, SCIA 2007, Aalborg, Denmark, June 10-14, 2007 ; proceedings / [ed] Ersboll, B K, Pedersen, K S, Berlin: Springer Berlin/Heidelberg, 2007, p. 572-580Conference paper, Published paper (Refereed)
Abstract [en]

Liveness detection is increasingly planned to be incorporated into biometric systems to reduce the risk of spoofing and impersonation. Some of the techniques used include detection of motion of the head while posing/speaking, iris size in varying illumination, fingerprint sweat, text-prompted speech, speech-to-lip motion synchronization etc. In this paper, we propose to build a biometric signal to test attack resilience of biometric systems by creating a text-driven video synthesis of faces. We synthesize new realistic looking video sequences from real image sequences representing utterance of digits. We determine the image sequences for each digit by using a GMM based speech recognizer. Then, depending on system prompt (sequence of digits) our method regenerates a video signal to test attack resilience of a biometric system that asks for random digit utterances to prevent play-back of pre-recorded data representing both audio and images. The discontinuities in the new image sequence, created at the connection of each digit, are removed by using a frame prediction algorithm that makes use of the well known block matching algorithm. Other uses of our results include web-based video communication for electronic commerce and frame interpolation for low frame rate video.

Place, publisher, year, edition, pages
Berlin: Springer Berlin/Heidelberg, 2007
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 4522
Keywords
Image analysis
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-2130 (URN)000247364000058 ()2-s2.0-38049080023 (Scopus ID)2082/2525 (Local ID)978-3-540-73039-2 (ISBN)2082/2525 (Archive number)2082/2525 (OAI)
Conference
15th Scandinavian Conference on Image Analysis, Aalborg, Denmark, June 10-14, 2007
Available from: 2008-11-12 Created: 2008-11-12 Last updated: 2018-03-23Bibliographically approved
Faraj, M. I. (2006). Lip-motion and speech biometrics in person recognition. (Licentiate dissertation). Göteborg: Department of Signals and Systems, Chalmers University of Technology
Open this publication in new window or tab >>Lip-motion and speech biometrics in person recognition
2006 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Biometric identification techniques are frequently used to improve security, e.g. in financial transactions, computer networks and secure critical locations. The purpose of biometric authentication systems is to verify an individual by her biological characteristics including those generating characterisitic behaviour. It is not only fingerprints that are used for authentication; our lips, eyes, speech, signatures and even facial temperature are now being used to identify us. This presumably increases security since these traits are harder to copy, steal or lose.

This thesis attempts to present an effective scheme to extract descriminative features based on a novel motion estimation algorithm for lip movement. Motion is defined as the distribution of apparent velocities in the changes of brightness patterns in an image. The velocity components of a lip sequence are computed by the well-known 3D structure tensor using 1D processing, in 2D manifolds. Since the velocities are computed without extracting the speaker's lip contours, more robust visual features can be obtained. The velocity estimation is performed in rectangular lip regions, which affords increased computational efficiency.

To investigate the usefulness of the proposed motion features we implement a person authentication system based on lip movements information with (and without) speech information. It yields a speaker verification rate of 98% with lip and speech information. Comparisons are made with an alternative motion estimation technique and a description of our proposed feature fusion technique is given. Beside its value in authentication, the technique can be used naturally to evaluate the liveness i.e. to determine if the biometric data is be captured from a legitimate user, live user who is physically present at the point of acquisition, of a speaking person as it can be used in a text-prompted dialog.

Place, publisher, year, edition, pages
Göteborg: Department of Signals and Systems, Chalmers University of Technology, 2006. p. 51
Series
Technical report R, ISSN 1403-266X ; 2006:20
Keywords
Audio-visual recognition, biometrics, Biometric recognition, Speaker verification, Speaker authentication, Person identification, Lip movements, Motion, Structure tensor, Orientation, Optical flow, Hidden Markov Model, Gaussian Markov Model, Lip-motion
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-1977 (URN)2082/2372 (Local ID)2082/2372 (Archive number)2082/2372 (OAI)
Presentation
(English)
Available from: 2008-09-29 Created: 2008-09-29 Last updated: 2018-03-23Bibliographically approved
Faraj, M. I. & Bigun, J. (2006). Motion Features from Lip Movement for Person Authentication. In: Y Y Tang (Ed.), The 18th International Conference on Pattern Recognition: proceedings : 20 - 24 August, 2006, Hong Kong. Paper presented at The 18th International Conference on Pattern Recognition, 20 - 24 August, 2006, Hong Kong (pp. 1059-1062). Washington, D.C.: IEEE Computer Society
Open this publication in new window or tab >>Motion Features from Lip Movement for Person Authentication
2006 (English)In: The 18th International Conference on Pattern Recognition: proceedings : 20 - 24 August, 2006, Hong Kong / [ed] Y Y Tang, Washington, D.C.: IEEE Computer Society, 2006, p. 1059-1062Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a new motion based feature extraction technique for speaker identification using orientation estimation in 2D manifolds. The motion is estimated by computing the components of the structure tensor from which normal flows are extracted. By projecting the 3D spatiotemporal data to 2D planes, we obtain projection coefficients which we use to evaluate the 3D orientations of brightness patterns in TV like image sequences. This corresponds to the solutions of simple matrix eigenvalue problems in 2D, affording increased computational efficiency. An implementation based on joint lip movements and speech is presented along with experiments which confirm the theory, exhibiting a recognition rate of 98% on the publicly available XM2VTS database

Place, publisher, year, edition, pages
Washington, D.C.: IEEE Computer Society, 2006
Series
International Conference on Pattern Recognition, ISSN 1051-4651 ; 2006
Keywords
eigenvalues and eigenfunctions, feature extraction, image sequences, matrix algebra, motion estimation, peaker recognition
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-2120 (URN)10.1109/ICPR.2006.814 (DOI)000240705600254 ()2-s2.0-34147180471 (Scopus ID)2082/2515 (Local ID)0-7695-2521-0 (ISBN)2082/2515 (Archive number)2082/2515 (OAI)
Conference
The 18th International Conference on Pattern Recognition, 20 - 24 August, 2006, Hong Kong
Note

©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Available from: 2008-11-11 Created: 2008-11-11 Last updated: 2018-03-23Bibliographically approved
Organisations

Search in DiVA

Show all publications