Is there a need for a dedicated video search engine? People on Internet usually prefer to use just one search engine?
Search is all down to relevance and accuracy. The existing search engines were developed for text search and are optimized for textual web pages. Video search engines are optimized for analyzing all the components – moving pictures and sounds. This is where the technology from Blinkx comes into play. Our technology literally watches and listens to the video. We are able to understand frame by frame what is inside that video. That understanding allows us to deliver higher accuracy level and position on the video searching.
But Google and MSN type companies have also added video search capabilities.
If you look at Google for example, its focus on video search is around searching their own video content from YouTube and a little bit of external content. Yahoo has a video search engine which they acquired 8 years ago. That has never pushed on the front page because the accuracy is low. There have been products in the video space but not truly video search engines.
The only big player having a true video search engine is AOL. AOL got it by acquiring Truveo 3 years ago. But even AOL has not really pushed video search capabilities because Truveo is also dependent on reading and analyzing text rather than looking at the video content itself. For all these reasons the average user using text search engine isn’t really finding the videos they want.
How does your search engine really crawl the videos? I mean how is the visual analysis carried out?
We have a web spider which basically crawls the web. Every time it finds a web page it analyses the page for a video and if it finds video files on that page it goes further looking at how that video is positioned and analyses it from a visual point of view. Based on that it extracts words or text around that, which seem relevant to the video: things like titles, descriptions, tags, etc.
Once that is done, our software plays back that video. It is similar to a human being watching that video. We literally sit back and press the play button. When the video plays, the Blinkx system analyses the visual elements. It looks at things like text on the screen, it can also do some limited facial recognition as well that can spot the faces of famous people. Other than this we also analyze the speech track of the video. That helps in knowing the words spoken in the video.
We check all this information - text, visuals, speech - and combine them into a single conceptual record about the video. It checks what the video seems to be about, what the main context is that is all indexed into a search engine index. So when you come along with your search, you can do exact search to all these conceptual records we have.