Challenge of Image Recognition in Machine Learning

The branch of artificial intelligence that allows machines to self learn, improve and perform activities completely free of human intervention is called machine learning. In other words, when a machine trains itself instead of a human modifying it, the process is machine learning. We interact with it daily without actively realising we are doing it. For example, every time we read an email and get several options clicking on one of which would lead to the automatic generation of a reply is an example of the machine doing work for us without our asking it to. It analyses the content of the email in order to determine what the suggested response should be, in the same way as it warns us with a ‘you talk of an attachment in your email, but there isn’t an attachment; attach?’ dialogue box if we prepare to send an email to our boss but forget attaching the content that had to be sent. Machines like our smartphones or even some laptops know to protect our data unless facial or voice recognition comes back positive thereby prioritising data over consumer, a preference that gets inverted once the device is unlocked. All of these are subtle examples of machines acting of their own accord.

Machines self learn by analysing the data out there. They either search for and collect different data such as different kinds of images of the same thing, or analyse a large number of things and keep sorting them into different headers. This is what enables them to keep getting ‘smarter’. ImageNet test was created keeping this in mind, to check how quick this data recognition and sorting can be. In this test, there are images from over a thousand different backgrounds that have to be sorted into categories. The diversity of the images made the test challenging.

To check how well machines dealt with fine-grained and specific cases, inter-class variability was introduced. Instead of simply being asked to identify and label birds, there were different kinds of birds into which data had to be sorted like flamingo, cock, ruffled grouse, quail, partridge. Similarly, for dogs and cats. Further, there were objects of the same class that looked vastly different from each other. This was intra-class variability that would’ve been hard to catch for a machine since, well, it doesn’t have a human brain. For example, the image of an orange farm and one slice of an orange look almost nothing the same apart from probably the colour- orange.

This test and its results were taken as a base to check how far along we were on the path to deep learning in machines. Inspired by these results, various researchers came up with ways to make it more efficient. One example would be AlexNet, which was the first step in this direction and reduced error by 50%. It proposed training the model using parallel computations on two GPUs. They used the non-linear tanh function instead of linear functions and used data augmentation techniques such as imagine translation, horizontal reflections, mean subtractions, etc. They also proposed successive convolution and pooling layers.

The research didn’t stop with AlexNet. Various researchers followed, further reducing error and making this technology stronger than anybody would’ve thought. Machines started performing these tests better than humans in some cases, which speaks volumes for our progress in this field.

Solulab Inc

Machine learning

Challenge of Image Recognition in Machine Learning

Solulab Inc

Machine learning

Challenge of Image Recognition in Machine Learning

Sign Up or Login