Executive Summary
EuclidVision
eFLEX Codec
Technology Road Map

EuclidVision

EuclidVision is a suite of tools, algorithms and processing pipelines that draw on Computer Vision, Image Understanding, and Pattern Recognition algorithms and modeling techniques to provide a higher video compression without a loss in visual quality.

Object-based and feature-based tracking and modeling identifies objects and sub-objects, or features of interest (for example, a person's head, or an eye or ear) in the video. The algorithms then utilize higher level modeling to represent these parts of the video, relieving the conventional encoder of the costly task of encoding these greatly complex regions of the video. In this way, the modeling focuses its compression on the most critical and complex objects and sub-objects while maintaining or enhancing the quality of the coincident background. This approach makes it possible to reduce the size of the file without compromising the video quality. The conventional encoder is still allowed to perform its optimal compression apart from EuclidVision and also on top of EuclidVisions modeling, where this is advantageous.

 

eFLEX Feature Tracker - Feature/Object Detection and Analysis

 

The fundamental difference in EuclidVision video compression compared to conventional block-based video coding technology is that EuclidVision is able to employ object and feature-based appearance and structural modeling directly of the data it processes. Without the recognition of the phenomena occurring in the video, conventional compression is limited to implicit representations that are not able to directly leverage the inherent redundancy that occurs in most video.  In the minority of videos that do not allow for this high level modeling, EuclidVision allows the conventional encoding capabilities to handle these videos.

The following diagram shows the high level steps EuclidVision undertakes to process video data compared to H.264. 

 
H.264 compression compared to EuclidVision compression techniques
 

EuclidVision runs alongside a conventional encoder, monitoring the complexity of different regions of the video. When EuclidVision determines that the complexity of the video has exceeded the conventional encoder’s capabilities, EuclidVision applies higher level modeling to those regions. The higher level of modeling is based on features and objects that are tracked throughout the video. Models for those features and objects are created, removing redundant data and further simplifying the video for conventional compression in the process. Structural models are derived through a variety of methods from simple motion integration up through 3D Structure from Motion techniques. Further, EuclidVision “learns” about the video being encoded, creating feature/object libraries which can then be utilized to eliminate the need to re-transmit those objects when they reappear in the video stream. Higher level modeling of the video data allows EuclidVision to provide a significant gain  in many circumstances over conventional solutions alone. The conventional solutions are limited to the low level modeling of the video data, processing the data with arbitrary regions defined by macroblocks. These limitations obscure the redundancy available when persistent object models are utilized.

EuclidVision is a foundation technology on which a whole host of technologies can be built. It provides a higher level representation of the video allowing a whole new set of options for compressing, processing, distributing, storing, and indexing high quality video and image data, but it does not necessarily require the use of specialized hardware. EuclidVision can be used with hardware, software, and networking equipment commonly used today.
 
Research

Euclid’s research and development initially focused on object modeling of single subject face videos and has been reduced to practice within the prototype’s Phase II implementation. The Phase II prototype creates training sets, ensembles of examples of the detected faces, which are utilized to model the face in a video stream during encoding. This encoding takes the previously generated ensembles and applies EuclidVision encoding techniques based on this training data. Euclid engineers have created a reduction to practice and an accompanying Disposition Report which documents the testing methodology and quantifies the processing, compression, and quality values achieved.

The disposition report documents a 460% compression improvement for single subject videos tested in a lab environment over MPEG-4, the standard for digital video, or a 600% improvement over MPEG-2, which is the DVD standard.

 

 

Quality Comparison of Original vs Synthesized Video with eFLEX

 

Prediction Test - H.264 Synthesis compared with eFLEX Hybrid Synthesis

 

Macroblock Analysis: Comparing H.264 & eFLEX

Other Resources

Please refer to our Notes Section for more detailed technical information.