Because the fact is that today everyone is already able to design their own AI-based image processing applications, even without special knowledge in artificial intelligence and the application programming that is also necessary. And while artificial intelligence can speed up many work processes and minimise sources of error, edge computing also makes it possible to dispense with expensive industrial computers and the complex infrastructure that would be required for high-speed image data transmission.

New and different

However, AI or machine learning (ML) works quite differently from classical, rule-based image processing. This also changes the approach and handling of image processing tasks. The quality of the results is no longer the product of a manually developed programme code by an image processing expert, as was previously the case, but is determined by the learning process of the neural networks used with suitable image data. In other words, the object features relevant for inspection are no longer predetermined by predefined rules, but the AI must be taught to recognise them itself in a training process. And the more varied the training data, the more likely the ML algorithms are to recognise the really relevant features later in operation. But what sounds so simple everywhere also only leads to the desired goal with sufficient expertise and experience. Without a skilled eye for the right image data, errors will occur here as well. This means that the key competences for working with machine learning methods are no longer the same as those for rule-based image processing. But not everyone has the time or manpower to read into the subject from scratch to build up new key competencies for working with machine learning methods. Unfortunately, that's the problem with new things - they can't be used productively immediately. And if they actually deliver good results without much effort, but unfortunately cannot be clearly reproduced, you can hardly believe it and don't trust the thing.

Complex and misunderstood

As a rationally thinking person, one would like to know how this AI vision works. But without recognisable, comprehensible explanations, results are difficult to evaluate. Confidence in a new technology is based on skills and experience that sometimes have to be built up over years before one knows what a technology can do, how it works, how to use it and also how to control it. Complicating matters further is the fact that the AI vision is set against an established system, for which suitable environmental conditions have been created in recent years with knowledge, documentation, training, hardware, software and development environments. AI, on the other hand, still comes across as very raw and puristic, and despite the well-known advantages and the high level of accuracy that can be achieved with seeing AI, it is often difficult to diagnose errors. The lack of insight into the way it works or inexplicable results are the other side of the coin, which inhibits the expansion of the algorithms.

(Not) a black box

The way neural networks work is therefore often wrongly perceived as a black box whose decisions are not comprehensible. "Although DL models are undoubtedly complex, they are not black boxes. In fact, it would be more accurate to call them glass boxes, because we can literally look inside and see what each component is doing." [Quote from "The black box metaphor in machine learning"]. Inference decisions of neural networks are not based on classical comprehensible rules, and the complex interactions of their artificial neurons may not be easy for humans to understand, but they are nevertheless results of a mathematical system and thus reproducible and analysable. All that is (still) missing are the right tools to support us. There is still a lot of room for improvement in this area of AI. This shows how well the various AI systems on the market can support users in their endeavours.

Software makes AI explainable

For this reason, IDS Imaging Development GmbH is researching and working in this field together with institutes and universities to develop precisely these tools. The IDS NXT ocean inference camera system already contains the results of this cooperation. Statistical analyses using a so-called confusion matrix make it possible to determine and understand the quality of a trained neural network. After the training process, the network can be validated with a previously determined series of images with already known results. Both the expected results and the results actually determined by inference are compared in a table. This makes it clear how often the test objects were recognised correctly or incorrectly for each trained object class. From these hit rates, an overall quality of the trained CNN can then be given. Furthermore, the matrix clearly shows where the recognition accuracy could still be too low for productive use. However, it does not show the reason for this.

Confusion Matrix
Confusion Matrix

This confusion matrix of a CNN classifying screws shows where the identification quality can be improved by retraining with more images.

This is where the attention map comes in, showing a kind of heat map that highlights the areas or image contents that get the most attention from the neural network and thus influence the decisions. During the training process in IDS NXT lighthouse, the creation of this visualisation form is activated based on the decision paths generated during training, allowing the network to generate such a heat map from each image during analysis. This makes it easier to understand critical or inexplicable decisions made by the AI, ultimately increasing the acceptance of neural networks in industrial environments.

It can also be used to detect and avoid data biases (see figure " attention maps"), which would cause a neural network to make incorrect decisions during inference. This is because a neural network does not become smart by itself. Poor quality input leads to poor output. An AI system, in order to recognise patterns and make predictions, relies on data from which it can learn "correct behaviour". If an AI is built in under laboratory conditions with data that is not representative of subsequent applications, or worse, if the patterns in the data reflect biases, the system will adapt those biases.

Heat Map
Heat Map

This heat map shows a classic data bias. The heat map visualises a high attention on the Chiquita label of the banana and thus a good example of a data bias. Through false or underrepresentative training images of bananas, the CNN used has obviously learned that this Chiquita label always suggests a banana.

With the help of such software tools, users can trace the behaviour and results of the AI vision more directly back to weaknesses within the training data set and correct them in a targeted manner. This makes AI more explainable and comprehensible for everyone. Because basically it is just mathematics and statistics. Following and understanding the mathematics is often not easy, but with confusion matrix and heat maps there are tools to make decisions and reasons for decisions visible and thus understandable.

We are only at the beginning

Used correctly, AI vision has the potential to improve many vision processes. But providing hardware alone is not enough to infect the industry with AI across the board. Manufacturers are challenged to support users by sharing their expertise in the form of user-friendly software and built-in processes. Compared to best practices, which have evolved over years and built up a loyal customer base with a lot of documentation, knowledge transfer and many software tools, there is still a lot of catching up to do for AI, but it is already under development. Standards and certifications are also currently being worked on to further increase acceptance and explainability and to bring AI to the big table. IDS is helping with this. With IDS NXT ocean, an embedded AI system is already available that can be used quickly and easily as an industrial tool by any user group with a comprehensive and user-friendly software environment - even without in-depth knowledge of machine learning, image processing or application programming.