Mapping the TV genome.

Research at the MIT Media Lab led to Bluefin Labs and Social TV analytics.

Research origins: MIT Media Lab

As director of the MIT Media Lab's Cognitive Machines Group, Deb Roy, with his team of researchers, set out to understand human language development. The Human Speechome Project that arose was an ambitious undertaking: it made a complete digital record of the first years of a child's home life (with various privacy provisions in place) and analyzed that record computationally. Roy and his wife captured much of their son's life from birth to age three – over 240,000 total hours of audio and video. Buried in this massive corpus lay insights into how their son learned language at a level of detail never before captured.

To analyze the audio-video data, Roy and his MIT team developed numerous deep machine learning algorithms designed to uncover the relationships between spoken language and context.

When Michael Fleischman (Bluefin's President & CTO) joined Roy's MIT group, he adapted the analysis concepts designed for child language acquisition and applied them to another massive video data set: broadcast video. Fleischman's PhD work garnered the attention of the National Science Foundation, which awarded Roy and Fleischman a Small Business Innovation Research grant and effectively bootstrapped the launch of Bluefin Labs.

The science behind Bluefin Labs

Just as the Speechome project created algorithms that can "ground" the meaning of a word within a larger context, Bluefin is using deep machine learning to ground the meaning of comments pulled out from social media. By looking at the context of words expressed by individuals, Bluefin can use the meaning of these words to connect comments back to the events, people, products, brands, and viewing contexts that caused those words to be expressed in the first place.

The semantic barrier – Building a technology to listen to everything that happens in social media is relatively simple. It's mostly a matter of ingesting huge streams of data and producing various reports crunched from that data. But building a technology that maps social media comments to the TV stimulus that caused those comments is much more of a challenge. For example, let's say a person posts the comment "Awesome pass!" in social media. Any person who reads this comment in the right context – let's say during a fall Sunday in New England at 2:45pm – will correctly infer that the words are about that Tom Brady pass in that Patriots football game. Making this correct inference is easy for humans, but hard for machines. That's the semantic barrier.

Language grounding – Cognitive scientists call this process of linking words to their intended referents "language grounding". Philosophers consider grounding the second hardest problem in philosophy (consciousness gets top honors). So can machines be taught to successfully ground language and thus break through the semantic barrier? Roy's research at MIT has focused on this topic for 15 years and in the process, spearheaded a research program to create machines that learn to link language to context by observing and modeling human communication strategies.

Deep machine learning – It's one of the underlying ideas that drove Roy's research. His MIT lab developed algorithms that learned to find connections between different modalities of data (such as video and speech) in order to capture deep semantic structure. Using this idea, the lab created some of the first robots that learn grounded language from show-and-tell interactions with humans.

For more details about the foundational research at the MIT Media Lab and the formation of Bluefin Labs as a commercial venture, please see “A Social Media Decoder” [PDF] in Technology Review and “Bluefin Mines Social Media to Improve TV Analytics” in Fast Company.

The TV Genome and its applications

Bluefin Labs has developed an automated media analysis platform to break through the semantic barrier at scale. Bluefin's core technology applies language grounding techniques to achieve successful mapping of social media commentary to mass media stimuli on TV, specifically TV shows and commercials. The data produced by this mapping of social media to TV media is known as the TV Genome and it brings rise to a number of important commercial applications.

The TV Genome is essentially a huge dataset that quantifies and organizes all social media conversations about TV. Want to find TV networks and specific shows that drive the most social media conversations? Want to find the contexts on TV that have the most social engagement by different audience segments, such as Moms vs. Hardcore Gamers vs. Auto Enthusiasts? This data, and more, is contained within the TV Genome.

Leading brands, advertising agencies, and TV networks have partnered with Bluefin Labs to leverage the TV Genome data for media planning and buying, TV campaign optimization, TV audience insights, and TV ad sales. To learn more, visit the Solutions section of this website.