What Can You Use Hadoop For? How About a Scalable Vertical Search Engine?
What if your wanted to build its own custom search engine? It could be a public facing vertical search engine like Indeed.com, or it could be some sort internal search engine that helps you search your companies’ own documentation or customer information. Either way, you’d probably turn to some of the usual suspects like Lucene.
But what do you do if that information starts to get really large, really fast?
A presentation (embedded below) from Ivan de Prado of the services firm Datasalt explains why you might want to use Apache Hadoop along with Lucene to build a scalable search engine. According to Datasalt, this approach can create a more scalable, bug tolerant and flexible solution.
Wikibon analyst Jeff Kelly has written about the lack of Hadoop applications ServicesAngle:
Let’s say, for example, you’re a business analyst at a pharmaceutical maker and you’ve come up with an idea to correlate sales data with demographic data with social media data to identify new revenue opportunities. You present your idea to the CEO, who gives you the green light. “Get it done,” she says.
Fantastic. You talk to IT, spin-up an inexpensive Hadoop cluster, then collect, process and store the needed data. Next you take a look at the Hadoop application market and …. and you quickly realize you’re out of luck. You discover there are no compelling applications on the market to suit your innovative use case. Developing the application internally maybe isn’t an option. Your great idea for leveraging Hadoop is DOA.
A highly scalable search engine based on Hadoop and Lucene is exactly the sort of ready-made Hadoop application many enterprises will likely want. It’s the sort of thing that could easily be delivered as a virtual appliance or as a hosted service, and since it’s based on open source, could be built by many different service providers.
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.