Hartwig Medical Foundation

Massive scaling-up the hybrid way


Hartwig Medical Foundation (HMF) plays an important supporting role in the development of more personalized cancer treatments. HMF and Schuberg Philis together manage the complex and large IT pipeline that processes and stores the raw data coming from the DNA-sequencers. To benefit from the improved analysis methods and achieve a consistent database, a re-run was initiated of the huge amount of available patient data.

Much research has been done in recent years into a more personalized cancer treatment. By looking at the individual characteristics of the patient and illness, more tailored treatment can be offered. The mapping of all cancer-related DNA abnormalities of patients is an important key to address this problem. Hartwig Medical Foundation has developed a national database which currently contains all genetic and clinical data from more than two thousand patients with metastatic cancer in the Netherlands.

Identifying anomalies

HMF offers the participating hospitals whole genome sequencing and bioinformatic analysis. The entire DNA of the cancer patient, consisting of 3.2 billion positions is mapped after multiple scanning. This is done both for the good DNA, which is taken from the blood, as well as for the tumor material. By comparing these files, the anomalies that can cause cancer can be identified. They make it possible to identify and improve relevant biomarkers, a specific characteristic of the tumor, such as a mutated gene.

Practitioners obtain a patient report with a complete overview of the specific cancerrelated DNA abnormalities of their patient, and the possible treatments or studies/ trials in which a patient could participate. In addition, researchers can use the database to conduct further scientific research, thus contributing to a learning care system.

IT pipeline

A few years ago, Hartwig Medical Foundation established an IT pipeline to process the large amount of raw data coming from the sequencers. In the pipeline, the data is structured, analyzed, and then stored in combination with the available clinical data as a patient report. This includes the anamnesis, treatment sequence, and outcomes. "We developed the software components ourselves and these have always been under our control," says Hartwig CEO Hans van Snellenberg. "For the hardware, processing and storage, we benefit from the quality, capacity, speed, and scalability of Schuberg Philis."

Re-run of data

"The tumor material, with a mixture of healthy and sick cells, is rather heterogeneous. There are various validation steps during the data processing. The algorithms that help with the validation have become more and more accurate over the years. That means that the analysis of the initial material was good, but the analyses of the most recent patient are even better. To achieve a consistent database, we wanted to put the raw data that was still available for all of the 1,600 patients through the latest version of our IT pipeline."

Smart solution

In the end the re-run was completed for a good price and in an unbelievably short time.

The existing infrastructure, consisting of dynamic switchable stacks of clusters of 19 linked systems, was not suitable. "The analysis involves a few hundred gigabytes of data for each patient. For all the patients combined, the analysis would take about three months, using five extra stacks," says Omar Wit, Mission Critical Engineer for Schuberg Philis. "That was over the deadline and it was too expensive. We thought outside the box and proposed using the Amazon Web Services Cloud (AWS) in Frankfurt." His colleague Andy Repton adds: "Naturally with fully protected data and using their variable price mechanism (Spot Instances Pricing), which means you can offer a lower price to use their reserve capacity." Hartwig Medical Foundation was open to the idea, providing there was a clear agreement covering the processing, with the necessary legal restrictions regarding security and location. Hans van Snellenberg: "At one point during the re-run we had 4,000 cores running, which was unique even for AWS. It was a challenge for Schuberg Philis too, because they had never done anything like this, on this scale, externally." In the end the re- run was completed for a good price and in an unbelievably short time.

The Mission Critical Engineers do not foresee the present high-performance IT pipeline being shifted to the cloud for now. "But we would certainly consider it, given the right performance and lower costs per patient." Hans van Snellenberg is happy to leave the choice to the experts at Schuberg Philis: "What matters for me is that the use of our data should lead to a breakthrough in the personalized treatment of cancer."

Spot Instances Pricing

Spot Instances enable users to ‘buy’ unused Amazon EC2 instances. A Spot Instance runs as soon as capacity is available and the maximum price offered by the user, exceeds the Spot price. The Spot price per hour is set by Amazon EC2 and is adjusted continuously, based on demand. Amazon EC2 can interrupt an individual Spot Instance as the price or availability changes.