Ensenso Operation

How do Ensenso cameras work?

Stereo Vision

Projection rays of the Ensenso cameras
The projection rays show the different viewing
angles of the cameras on the scene objects
Ensenso cameras operate using Stereo Vision, which imitates the human vision. Two cameras acquire images from the same scene from two different positions. Although the cameras see the same scene content, there are different object positions according to the cameras projection rays. Special matching algorithms compare the two images, search for corresponding points and visualize all point displacements in a Disparity Map.
Knowing the distance and viewing angle of the cameras in addition to the lens focal length, the Ensenso software converts these disparities in length units using the triangulation principle. So the 3D coordinates of each image pixel could be determined. The result is a 3D point cloud, which is the foundation for further applications based on 3D object information.
Less textured or reflecting surfaces result in incomplete depth information.
Less textured or reflecting surfaces result
in incomplete depth information.
The matching process during the image comparison is based on contrast- and brightness graduations of the sensor pixels. So the Stereo Vision quality directly depends on the scene’s light condition and object surface textures. Finding and calculating coordinates of corresponding points on less textured or reflecting surfaces is very difficult. The disparity cannot be uniquely determined. The result is an incomplete depth information of the scene.
Ensenso cameras improve the classic Stereo Vision principle by additional techniques to achieve a higher quality depth information and more precise measurement results. As a consequence Stereo Vision can be used in a wider range of applications.

Pattern Projector

A light-intensive projector produces a high-contrast texture on the object surface by using a pattern mask, even under difficult light conditions. The projected texture supplements the weak or non-existent object surface structure.
Therefore this principle is also called “Projected Texture Stereo Vision”. The result is a more detailed disparity map and a more complete and homogeneous depth information of the scene.
The auxillary pattern projected on the cup’s surface results in more complete, homogeneous depth information.
The auxillary pattern projected on the cup’s surface results in more complete, homogeneous depth information.


The FlexView technology can further improve the detail level of the disparity map of static scenes. The position of the pattern mask in the projection rays can be translated in small steps by a mechanical system using a piezoelectric actuator. The result is a varying texture on the object surface. Acquiring multiple image pairs with different textures of the same object scene produce a lot more image points. The resolution increases. The matching algorithm calculates significantly improved disparity maps by using all captured image pairs.

As a consequence of the texture displacement which produces additional structure information on glossy, dark or reflecting surfaces, the resolution and also the robustness of the resulting data will increase. A lot of processing algorithms benefit from the higher resolution and the lower noise. FlexView reduces post processing steps of the point cloud and further 3D processing time.

Comparison of FlexView1, 2 and Single Shot Data

Ensenso offers cameras with and without FlexView technology. Each solution is optimized and adapted to particular applications. The object movement plays a decisive role in this respect.
Cameras without FlexView and respectively with FlexView1 technology produce a high-contrast texture by using a random dot-pattern. It allows calculating depth information very fast even with only one image pair. Both camera variants are equally suitable for application with moving objects.
Using static objects FlexView1 cameras additionally benefit from algorithms, which produce a higher resolution combining multiple image pairs acquired with translated dot-pattern. With only 3 to 5 image pairs, the X-, Y- and Z-resolution can be doubled. But with each additional image pair the acquisition and processing will increase. With approximately 8 image pairs the resulting quality doesn’t increase any further with FlexView1.
Cameras implementing FlexView2 technology use a specially designed pattern-mask with appropriate algorithms able to double the resolution in X-, Y- and Z-direction of static objects compared with FlexView1.
Constraints: Due to the special pattern, the optimization is effective only with at least 5 image pairs.

Without FlexView


FlexView1 and FlexView2

(in Multi-Acquisition Mode)

  • Very fast image acquisition and processing by using only one image pair
  • Usable for moving and still standing objects
  • Projector pattern optimized for single shot data
  • Highly improved resolution and quality of depth information on static scenes
  • Finer object details and contours
  • More robust to dark, reflective, less textured surfaces
  • Only suitable for stationary standing objects
  • Longer acquisition and processing time

Suitable for:

fast applications and moving objects

Suitable for:

applications with stationary standing objects and the need for higher accuracy results


The Ensenso camera selector helps you to choose the right model for your application.

Ensenso camera selector