A look at how AI 'sees' and understands images and video, including applications like facial recognition and autonomous vehicles.
When we look at a photograph, our brains automatically identify objects, people, and scenes. It’s a complex process we often take for granted. But how can we train machines to ‘see’ and make sense of images and videos just like we do? The answer lies in a field of artificial intelligence known as Computer Vision,.
In essence, computer vision strives to replicate the complexities of human sight and beyond. Machines are trained to interpret and understand the digital world in a more visually intensive way.
Let’s delve into this by taking a walk through a bustling city. As we amble, we pass by countless faces, each unique and different. To us, it’s a simple task to recognize a familiar face in the crowd, but for a computer, this task represents a substantial challenge. The solution to this problem gave birth to facial recognition systems. Today, we find these systems used in a plethora of applications from unlocking smartphones to identifying suspects in security footage (source: O’Toole AJ, et al., “Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007).
Perhaps one of the most exhilarating applications of computer vision lies in the realm of autonomous vehicles. Imagine a car whisking through traffic, making split-second decisions, all without a human driver. That’s what autonomous vehicles aspire to do. These vehicles utilize computer vision to understand their environment. They ‘see’ the lanes, identify traffic signs, and detect other vehicles or pedestrians, all in real-time. It’s as if the car has eyes of its own (source: Chen, T., et al., “DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving”, Proceedings of the IEEE International Conference on Computer Vision, 2015).
Despite these exciting advancements, computer vision is not without its challenges. Unlike humans who learn from a few examples, machines need vast amounts of data to understand and learn. Moreover, any bias in the training data can lead to biased algorithms, a significant concern in applications like facial recognition. Addressing these challenges is part of the ongoing research in the field (source: Buolamwini, J., Gebru, T., “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification”, Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 2018).
As we continue to refine these technologies, the possibilities are boundless. Perhaps soon, machines will ‘see’ and understand our world even better than we do. And as we stand on the precipice of this exciting future, we can only marvel at how far we’ve come and imagine where we’ll go next.