DataTorrent debuts free data integration service to complement its ultra-fast Spark alternative
DataTorrent Inc. is wasting no time adjusting to its new status as an open-source company. Merely two months after releasing its homegrown data crunching engine for Hadoop under an Apache 2.0 license, the analytics provider is launching a free companion tool designed to help users move their information into the analytics framework more easily.
There are already plenty of options in the open-source ecosystem and beyond for transferring large amounts of information among applications, which constitutes the crux of the problem. The large enterprises and other tech-savvy organizations where Hadoop is finding use nowadays draw data from a wide range of sources that handle distribution in different ways.
DataTorrent dtIngest provides a high-level interface for managing the flow of information across all the protocols, messaging services and storage systems involved in the process. The list of supported technologies includes Kafka, which is often used in combination of Hadoop to handle real-time information, Amazon Inc.’s cloud-based S3 object store and the JMS transfer standard, among others.
The unified nature of dtIngest has the added benefit of facilitating centralized security in the form of encryption and compression algorithms that are automatically applied to data ingested through the system. The software also performs a number of optimizations on top of that, including fusing small files into large batches so to avoid a situation where Hadoop runs out of memory addresses in which to keep the information.
That’s a very serious problem in stream processing use cases where upwards of billions of individual data points are crisscrossing the network at any given second, which happens to be one of the main applications for DataTorrent’s analytics engine. RTS, known as Project Apex in its recently introduced open-source incarnation, is described as being able to easily handle that kind of traffic with high reliability and low latency.
Both the data crunching engine and dtIngest can run on any Hadoop cluster running version 2.0 or above. The solutions are available for download immediately from DataTorrent’s site.
Photo via Geralt
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.