Face Recognition Vendor Test 2002 March 16, 2002
Face Recognition Vendor Tests (FRVT) provide independent U.S Government evaluations of commercially available and mature prototype face recognition systems.
These evaluations are designed to provide U.S Government and law enforcement agencies with information to assist them in determining where and how facial recognition
technology can best be deployed. In addition, FRVT results help identify future research directions for the face recognition community.
By March 2003 scientists from NIST, DARPA and DoD Counterdrug Technology Development Program Office have completed the most comprehensive evaluation to date of
commercially available face recognition systems and released the results of the Face Recognition Vendor Test 2002.
Brief History of Face Recognition Vendor Tests
FRVT 2002 follows four previous face recognition technology evaluations - three FERET evaluations (1994, 1995 and 1996) and FRVT 2000.
The FERET Program introduced evaluations to the face recognition community and helped advance face recognition from its infancy to the prototype system stage. By 2000, face recognition technology had matured from prototype systems to commercial systems.
The Face Recognition Vendor Test 2000 (FRVT 2000) measured the capabilities of these systems and their technical progress since the last FERET evaluation. Public interest in face recognition technology had risen significantly by 2002.
Each successive evaluation increased in size, difficulty and complexity, reflecting the maturing of face recognition technology as well as evaluation theory.
FRVT 2002 was designed to measure technical progress since 2000 to evaluate performance on real-life large-scale databases and to introduce new experiments to help understand face recognition performance better.
Overview of the Face Recognition Vendor Test 2002
FRVT 2002 was announced on 25 April 2002 and was open to all developers and providers of core face recognition technology. This included academia, research laboratories and commercial companies.
FRVT 2002 was an independently administered technology evaluation. 10 participants were evaluated under the direct supervision of the FRVT 2002 organizers at a U.S Naval Base in Dahlgren, Virginia in between July 10 and August 9 2002.
FRVT 2002 consisted of two sub-tests: the High Computational Intensity (HCInt) Test and Medium Computational Intensity (MCInt) test. Each sub-test was designed to encourage broad participation in the evaluation.
The HCInt was designed to evaluate the performance of state of the art systems on extremely challenging real-world problems. It consisted of 121,589 operational images of 37,437 people.
The images were provided from the U.S. Department of State’s Mexican non-immigrant Visa archive. From this data, real-world performance figures on a very large data set were computed. Performance statistics were computed for verification, identification, and watch list task.
The MCInt was designed to provide an understanding of a participant's capability to perform face recognition tasks with several different formats of imagery (still and video) under varying conditions. The MCInt was also designed to help identify promising new face recognition technologies not identified in the HCInt.
The HCInt had to be performed on the equivalent of three high-end workstations, the MCInt on a single workstation.
Participants were given 11 days to complete each test and they were tested using data that they had not previously seen.
All images and video sequences in FRVT 2002 were sequestered prior to the test. Testing on sequestered data has a number of advantages; It provides a level playing field and it ensures that systems are evaluated on the general face recognition task, not the ability to tune a system to a particular data set.
Participant |
MCInt |
HCInt |
AcSys Biometrics Corp |
+ |
|
Cognitec Systems GmbH |
+ |
+ |
C-VIS Computer Vision und Automation GmbH |
+ |
+ |
Dream Mirh Co. Ltd |
+ |
+ |
Eyematic Interfaces Inc. |
+ |
+ |
Iconquest |
+ |
|
Identix |
+ |
+ |
Imagis Technologies Inc. |
+ |
+ |
Viisage Technology |
+ |
+ |
VisionSphere Technologies Inc. |
+ |
+ |
Image Data Sets
The HCInt data set consisted of 121,589 images of 37,437 individuals with at least 3 images of each person. The images were of good quality and were gathered in a consistent manner. The background was universally uniform.
The MCInt data set was composed of a heterogeneous set of still images and video sequences of subjects in a variety of poses, activities and illumination conditions. These images originated from two sources:
The first was the still facial image data set collected at the National Institute of Standards and Technology (NIST), Naval Surface Warfare Center (NSWC, Dahlgren) and the University of South Florida (USF) between 1999 and 2002.
The second set was from The University of Texas at Dallas and consists of video sequences and still images taken in 2001. The NIST-NSWC-USF data set was comprised of images taken indoors and outdoors.
The outdoor stills were characterized by changing background and directional sunlight illumination.
Key Findings
FRVT 2002 results show that normal changes in indoor lighting do not significantly affect performance of the top systems. Approximately the same performance results were obtained using two indoor data sets with different lighting. In both experiments, the best performer had a 90% verification rate at a false accept rate of 1%.
For the best face recognition systems, the recognition rate for faces captured outdoors at a false accept rate of 1% was only 50%. Thus, face recognition from outdoor imagery remains a research challenge area.
The FRVT 2002 database also consisted of images of the same person taken on different days. The performance results in this case, using indoor imagery, shows improvement in the capabilities
of the face recognition systems over the last two years. Compared with similar experiments conducted two years earlier in FRVT 2000, the results of FRVT 2002 indicate there has been a 50% reduction in error rates.
A very important question for real-world applications is the rate of decrease in performance as time increases between the acquisition of the database of image and new images presented to a system. FRVT 2002 found that for the top systems, performance degraded at approximately 5% points per year.
How Does Database and Watch List Size Effect Performance ?
For the best system, the top-rank identification rate was 85% on a database of 800 people, 83% on a database of 1,600, and 73% on a database of 37,437. For every doubling of database size, performance decreases by two to three overall percentage points. In mathematical terms,
identification performance decreases linearly with respect to the logarithm of the database size.
A similar effect was observed for the watch list task. As the watch list size increases, performance decreases. For the best system, the identification and detection rate was 77% at a false alarm rate of 1% for a watch list of 25 people. For a watch list of 300 people,
the identification and detection rate was 69% at a false alarm rate of 1%. In general, a watch list with 25 to 50 people will perform better than a larger size watch list.
The Role of the Demographics
Previous evaluations have reported face recognition performance as a function of imaging properties like indoor versus outdoor images or frontal versus non-frontal images. In FRVT 2002, effects of demographics on performance examined for the first time.
Two major effects were found:
First, recognition rates for males were higher than females. For the top systems, identification rates for males were 6% to 9% points higher than that of females. For the best system, identification performance on males was 78% and for females was 79%.
Second, recognition rates for older people were higher than younger people. For 18 to 22 year olds, the average identification rate for the top systems was 62% and for 38 to 42 year olds was 74%. For every ten years increase in age, on average performance increases approximately 5% through age 63.
3D Morphable Models
Since FRVT 2000 new techniques and approaches to assist face recognition have emerged. FRVT 2002 looked at two of these new techniques. The first was the three-dimensional morphable models technique of Blanz and Vetter.
Morphable models are a technique for improving recognition of non-frontal images. It was found that Blanz and Vetter’s technique significantly increased recognition performance.
The second technique is recognition from video sequences. It was found that, using FRVT 2002 data sets, recognition performance using video sequences was the same as the performance using still images.
Summary
Given reasonable controlled indoor lighting, the current state of the art in face recognition is 90% verification at a 1% false accept rate.
Better face recognition systems do not appear to be sensitive to normal indoor lighting changes.
Outdoor face recognition performance needs improvement.
The use of 3D morphable models can significantly improve non-frontal face recognition.
Watch list performance decreases as a function of size performance using smaller watch lists is better than performance using larger watch lists.
In face recognition applications, accommodations should be made for demographic information since characteristics such as age and sex can significantly affect performance.
Males are easier to recognize than females and younger people are harder to recognize than older people.
Sample Results
|
|