Santa Monica, California (CNN) -- Computers used to be blind, and now they can see.
Thanks to increasingly sophisticated algorithms, computers today can recognize and identify the Eiffel Tower, the Mona Lisa or a can of Budweiser.
Still, despite huge technological strides in the last decade or so, visual search has plenty more hurdles to clear.
At this point, it would be quicker to describe the types of things an image-search engine can interpret instead of what it can't. But rapid progress, coupled with the growing number of brilliant minds taking up the challenge, is making intelligent robo-eyesight within reach.
Hartmut Neven, an engineering director leading visual-search initiatives for Google, predicts that near-perfection could come in the next decade. Says Neven,
"Within 10 years, we can pretty much recognize, in principle, pretty much any object we're interested in. Scientific and technical progress is accelerating in an exponential (pace)."
Neven began his research in 1992, and under his own forecasted timeline, is essentially more than halfway to meeting his goal.
Google Goggles
The product of his work and of a team of engineers is contained in a service called Goggles. You can watch Google's 'Googles' video HERE. It exists as a standalone application for Android phones and as a feature of the Google Mobile App for the iPhone.
With Goggles, the user snaps a picture, which is transmitted across cellular networks to Google's servers. Google's computers then tell the phone what they recognized in the photo. This process can take only a second or two -- and sometimes even less.
Google's algorithms, the lines of code that break down data into bits recognizable by machines, are good at picking out certain things.
Iconic buildings and artwork, products on store shelves, barcodes and magazine advertisements are a breeze. The system can recognize text on a poster and search the Web for a page with similar writing, or translate the menu at a French restaurant.
Microsoft also has a visual-search app for Bing, though its features are more limited.
So far, these computer systems are less skilled at recognizing humans. But the Google Goggles team is working on a system that can identify faces in photos, as long as those people say it's OK for Google to include them in its database, Neven said.
Unrecognizable objects
But Google's algorithms return no results for loads of common things. Furniture, clothing, accessories, gadgets, food, animals, cars, trees and many everyday objects are seen as foreign objects by the system. Says Neven,
"Our ambition is nothing less than being able to recognize any object on the planet," Neven said. "But today, computer vision is not in that state yet. There are many things that, unfortunately, we cannot properly recognize."
The biggest obstacles are in one category: Objects without a strong "visual texture," and with few distinct markings. These include many products that are hard to identify without colorful packaging, such as purses, shoes and cell phones. Says Neven,
"Unpackaged products is something that has been a priority for a while, but it's not easy to solve. If we get that done much better, then suddenly 90% of relevant objects are in our reach."
Google developers are hammering away at the problem. Neven is excited about the potential to enable the system to identify which species of tree a leaf fell from, or the model of car parked on the street.
In the meantime, Google lists the Goggles app in its Labs section, meaning the project is still in an experimental phase, a Google spokesman said. Allowing access this way lowers expectations and avoids exposing the technology to too many people who may find themselves turned off by the fact that it often fails to return accurate results.
The app displays a quick tutorial when it's first launched. Google also showcases Goggles features that aren't all that practical but that create buzz for the technology, such as a version that can solve Sudoku puzzles.
The Goggles app can also read QR codes, those black-and-white squares found on ads and posters that, when scanned by smartphones, access videos and other interactive content. Until Goggles can recognize everything, QR codes serve an interim need: Take a picture of something and get its digital counterpart.
Other image-search uses
This underlying image-search technology is important to many Google products.
The image-recognition algorithms help to recognize cars and people in Google's Street View service in order to blur license plates and faces. They also help raise red flags when a photo reveals too much human skin, categorizing them for Google Images's "adult" filter.
Neven joined Google in 2006 when the search giant acquired his company, called Neven Vision. His former colleague, Orang Dialameh, is the CEO at IPPLEX, a holding company that also has teams of engineers working on image-recognition projects.
Dialameh's developers have employed cameras to build apps that can help identify objects, such as cash or cereal boxes, without requiring the user to snap a picture. Some of these apps are being marketed as utilities for blind people. IPPLEX's next venture, Nantworks, will allow users to tag objects using a cell phone's camera, Dialameh said.
Dialameh, who, like his colleague, is based in Southern California, faces many of the same obstacles that Google does -- not the least of which is convincing people to use the apps in their daily lives. Says Dialemah,
"How will this become a consumer behavior? We're not used to taking out a camera and showing stuff to our phone."
Others are embedding this kind of technology in more obvious applications. Face.com can examine photos on Facebook to identify people in pictures who weren't manually tagged. In the same way, Neven's technology at Google can be used to identify faces in a Picasa user's personal photo collection.
But this facial-recognition technology, which sometimes thinks your sister is actually Grandpa, has a ways to go. And not everyone is sold on its usefulness.
Facebook CEO Mark Zuckerberg said in an interview with several reporters after a news conference in November.
"Before people-tagging came out, I think most people would have said that the best way to figure out who's in photos was to have some face-recognition algorithm, but it actually turns out that the best way is to just have people tagged."
COMMENTARY: Google Googles is definitely interesting visual search technology, but I think the idea of developing an all encompasing app that can visually identify so many different tings, namely, text, landmarks, faces, artwork and objects is going to be a challenge because it's just too broad. This is another example of Google trying to do too much. Tackling an impossible task. I bet Goggle Googles could not visually identify a bowl full of flour from one full of baking soda. The result would say something like, "a bowl of something..." Sort of reminds me of "Data" the humanoid robot in Star Trek would saying, "Captain, there's a 90% probablity that it's a Klingon battlestar". They should stay focused on one thing, and be good at it. Several limitations have been pointed out, and are going to be problematic, and may never be solved. There are many of these cute visual identification apps out there. I have experimented with a few myself. All of them have the same problems. If human beings can't visually identify everything in their environment, you can't expect an app and a web cam to be able to do this.
Courtesy of an article dated April 14, 2011 appearing in CNN Tech
By TwitterButtons.com
Recent Comments