The fundamental difference in EuclidVision video compression compared to conventional block-based video coding technology is that EuclidVision is able to employ object and feature-based appearance and structural modeling directly of the data it processes. Without the recognition of the phenomena occurring in the video, conventional compression is limited to implicit representations that are not able to directly leverage the inherent redundancy that occurs in most video. In the minority of videos that do not allow for this high level modeling, EuclidVision allows the conventional encoding capabilities to handle these videos.
The following diagram shows the high level steps EuclidVision undertakes to process video data compared to H.264.

EuclidVision runs alongside a conventional encoder, monitoring the complexity of different regions of the video. When EuclidVision determines that the complexity of the video has exceeded the conventional encoder’s capabilities, EuclidVision applies higher level modeling to those regions. The higher level of modeling is based on features and objects that are tracked throughout the video. Models for those features and objects are created, removing redundant data and further simplifying the video for conventional compression in the process. Structural models are derived through a variety of methods from simple motion integration up through 3D Structure from Motion techniques. Further, EuclidVision “learns” about the video being encoded, creating feature/object libraries which can then be utilized to eliminate the need to re-transmit those objects when they reappear in the video stream. Higher level modeling of the video data allows EuclidVision to provide a significant gain in many circumstances over conventional solutions alone. The conventional solutions are limited to the low level modeling of the video data, processing the data with arbitrary regions defined by macroblocks. These limitations obscure the redundancy available when persistent object models are utilized.
EuclidVision is a foundation technology on which a whole host of technologies can be built. It provides a higher level representation of the video allowing a whole new set of options for compressing, processing, distributing, storing, and indexing high quality video and image data, but it does not necessarily require the use of specialized hardware. EuclidVision can be used with hardware, software, and networking equipment commonly used today.
Research
Euclid’s research and development initially focused on object modeling of single subject face videos and has been reduced to practice within the prototype’s Phase II implementation. The Phase II prototype creates training sets, ensembles of examples of the detected faces, which are utilized to model the face in a video stream during encoding. This encoding takes the previously generated ensembles and applies EuclidVision encoding techniques based on this training data. Euclid engineers have created a reduction to practice and an accompanying Disposition Report which documents the testing methodology and quantifies the processing, compression, and quality values achieved.
The disposition report documents a 460% compression improvement for single subject videos tested in a lab environment over MPEG-4, the standard for digital video, or a 600% improvement over MPEG-2, which is the DVD standard.