Microsoft open-sources one of the core algorithms powering Bing
Microsoft Corp. today open-sourced one of the cornerstone algorithms powering its Bing search engine in an effort to help developers build faster, more easily navigable applications.
The Space Partition Tree And Graph algorithm, or SPTAG for short, is available under the permissive MIT License. Microsoft has bundled it into a library that includes tools to help developers to incorporate the code into their projects.
SPTAG is what allows Bing to instantly display relevant search results even when a user enters a query that can’t be processed by simply matching keywords to web pages. Looking up the phrase “largest lake in the United States,” for instance, brings up a panel with information about Lake Superior even though there is only one shared word.
SPTAG makes that possible by transforming queries into data constructs known as vectors. A vector is essentially a long sequence of numbers that can encapsulate various kinds of information, from individual words to entire web pages.
Translating different records into a common numerical format has the benefit of allowing them to be compared more easily. The vector for the phrase “largest lake in the United States” will share similarities with, among others, the vector that Bing generates from the text of the Wikipedia page “List of largest lakes of the United States by area.” And that Wikipedia page has Lake Superior at the top of the ranking.
Bing groups the vectors representing web content based on similarity to speed up searches. “Once the numerical point has been assigned to a piece of data, vectors can be arranged, or mapped, with close numbers placed in proximity to one another to represent similarity. These proximal results get displayed to users, improving search outcomes,” Microsoft detailed in a blog post.
According to the company, SPTAG enables Bing to sift through billions of pieces of data in just a few milliseconds. The search engine has access to a repository of more than 150 billion vectors that is continuously expanded with new content from the web.
One obvious application for SPTAG is improving the search experience for users of collaboration services, email clients and other text-heavily applications. But the algorithm is not limited to processing written content. SPTAG is also capable of generating vectors for images and audio files, which means developers can use it to build advanced capabilities such as automated photo comparison.
SPTAG is available on GitHub.
Photo: Pixabay
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.