Let's play with particle physics! Kubernetes and Google Cloud open CERN research to everyone

UPDATED 12:59 EDT / MAY 24 2019

BIG DATA

Let’s play with particle physics! Kubernetes and Google Cloud open CERN research to everyone

Winning the Nobel Prize for physics isn’t a goal most people can reach. But thanks to Google Cloud and Kubernetes, performing the same experiments as award-winning scientists is now possible. Open access to data from the CERN Large Hadron Collider experiments that led to discovery of the Higgs boson elementary particle in 2012 means that proving the existence of the Higgs Boson particle can now be done by anyone, anywhere.

“All this containerized infrastructure … is getting our soul together, because computing is getting much easier in terms of how to share pieces of software and even infrastructure,” said Ricardo Rocha (pictured, right), computing engineer at The European Organization for Nuclear Research, known as CERN.

Rocha and Lukas Heinrich (pictured, left), physicist at CERN, spoke with Stu Miniman (@stu), co-host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, and guest host and cloud economist Corey Quinn (@QuinnyPig) during the KubeCon + CloudNativeCon event in Barcelona, Spain. They discussed how CERN manages the massive amounts of data generated by the LHC (see the full interview with transcript here). (* Disclosure below.)

Heinrich is a member of the Atlas research team, which along with CERN’s CSM experiment, discovered evidence of the Higgs boson. He and Rocha recently replicated the experiment that proved the existence of the Higgs boson during their keynote address at this week’s KubeCon event.

CERN science creates super-sized data

Scale, latency and performance are concerns for any enterprise, but at CERN they take on a much larger significance. Two high-energy particle beams travel at close to the speed of light inside the 27 km ring of the LHC, with 1.7 billion particle collisions occurring per second.

“The machines can generate something around a petabyte [of data] a second,” Rocha said.

Analyzing this data is the task of the Atlas trigger and data acquisition system. “We cannot write out all the collision data to disk; we don’t have enough disk space,” Heinrich said. Instead, the trigger system analyzes the data in real time and selects only the most interesting collisions to channel into storage.

The trigger system reduces this to around 10 gigabytes a second. “That’s what my side has to handle,” Rocha stated.

Businesses that think they have data storage issues will feel insignificant compared to CERN’s massive data inflow. “We’re collecting something like 70 petabytes a year,” Rocha said. “Our challenge is to make sure that all the effort physicists put into building this large machine, that in the end it’s not the computing that is breaking the world system. We have to keep up.”

Currently, CERN has one giant data center with around 300,000 cores and capacity of around 400 petabytes. “That’s not enough,” Rocha stated.

Linking institutes and research labs around the globe has doubled the storage capacity, but with a major upgrade to the LHC underway, the pressure is on to expand. “Very soon we’ll be talking about exabytes, so the amount of computing we will need there is just going to explode,” Rocha explained.

Kubernetes to the rescue

All options are on the table to solve the problem, as the engineers at CERN tend to be result-orientated, according to Rocha. “It’s a more open-minded community than traditional IT. So we don’t care so much about which technology we use as long as the job gets done,” he said.

CERN had distributed infrastructure years before everyone adopted cloud, but in the past they had to write all their own system software. Having access to open-source communities means CERN teams can focus on application development.

“If we start writing software using Kubernetes, then not only do we get this flexibility of choosing different public clouds or different infrastructures, but also we don’t have to care so much about the core infrastructure, all the monitoring. We can remove a lot of the software we were depending on for many years,” Rocha stated.

Heinrich agreed. “What’s kind of special about scientific applications is that we don’t usually just have our entire code base on one software stack. Sometimes you have a complete mix between C++, Python, Fortran, and all that stuff. So this idea that we can build the software stack as we want is pretty important.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the KubeCon + CloudNativeCon event. (* Disclosure: While this segment is unsponsored, Red Hat Inc. is the headline sponsor for theCUBE’s live broadcast at KubeCon + CloudNativeCon. Red Hat nor any other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.