Walter_LioenWalter Lioen

The welcome and introduction to the GPGPU-day 2013 is done by sponsor SURFsara.

An overview is given what connects HPC in the Netherlands, Europe, the world – and what SURFsara’s role is in all this.


Monique Dewanchand & Jeroen Bakker

AtMind bv

Blender Cycles & Tiles: Enhancing user experience

Blender is an open source community driven 3d content suite. Cycles, a path tracer, and Tiles ,the compositor, have been developed last year and both use the benefits of GPGPU. These benefits are used to increase the user experience and improve performance. The used architecture differs between both solutions. Cycles uses task-parallelization and Tiles uses data-parallelization.

During the session both solutions will be demonstrated.

EvgheniiEvghenii Gaburov PhD


XeonPhi vs K20: The fight of the titans

Last year at the Supercomputing conference, Intel released XeonPhi which may be considered as its response to NVIDIA K20. A year later we can test first-hand what XeonPhi can do that K20 can’t do. In this presentation, I will compare XeonPhi to K20 on their ability to deliver on promised results, and present common barriers that one needs to be aware of when trying to achieve performance.


JA-PinksterProf.dr.ir. J.A.Pinkster

PMH bv

GPU as a means to realize real-time ship-ship and ship-shore interaction effects on ship bridge simulators

Since the early 70’s ship maneuvering simulators using fully equipped bridge structures are being used to study the real-time behavior of ships in open water and in ports and to train ships crews to carry out maneuvers safely and efficiently. The basic process of a real-time maneuvering simulator is the mathematical model which represents the behaviour of a vessel sailing at variable speeds in deep or shallow water, in current, at high and at low speed without or with the effects of port structures, bottom irregularities and of other ships included. All simulators make use of mathematical models based on Newton’s equations of motion for a body moving in the horizontal plane or, in some cases, in all 6 degrees of freedom. Hydrodynamic forces due to flow around the hull, rudder action and variations in propeller speeds are incorporated based on empirical data derived from model tests or from analysis of full scale data. An increasingly important effect on vessels moving in ports is due to ship-ship interactions or ship-port structure interactions , the last being, for instance, bank suction effects where banks can also be submerged structures or local water depth changes. To date it is common practice to include such effects based on tabulated interaction data derived from model tests or off-line computations using more or less complicated hydrodynamic models ranging from strip-theory-based interaction models to double-body potential flow using panel models or even full-blown CFD computations. The present contribution concerns the development of a computational procedure for real-time ship-ship and ship-port structure interaction using a linear, double-body potential flow method and its application in a maneuvering simulator. The purpose is to be able to dispense with the need to generate interaction data bases and such an approach is expected to increase flexibility with respect to cases studied in the simulator. The method is applicable to multi-body cases involving ships and port structures. The flow equations are solved by discretising the wetted hull using standard zero-order panels and Rankine sources with or without the effect of restricted water depth. A crucial aspect of the application of such computational methods in real time is the computational load, governed by the number of ships, port structure elements and the total number of unknowns (source strength on the panels) that need to be solved at each time step. To this end a Fortran code has been developed to run both on a quad-core CPU (i7) (single thread and multi-threaded versions using OMP) as well as on a GPU which at present is a single NVIDIA TITAN with 6 GB of memory on the card.

In the presentation a short overview will be given of the type of ship-ship interactions encountered by ships in ports. An overview will be given of the code, what is modelled and how is it solved. Some details of the GPU code will be treated as well as the problems encountered. Examples will be given of the speed-up achieved using OMP and the GPU compared to using a single thread on the CPU.


AnaBalevicGTC2013Ana Balevic & Ivan Dimkovic


DigiCortex – A Hybrid (CPU/GPU) Acceleration of Biological Spiking Neural Networks on Desktop Supercomputers

Imagine that you could peek inside the human brain. That you could see the signals travelling and neurons firing. Imagine that we could not only aid neuroscientists in the war against brain diseases, such as Alzheimer’s, Huntington’s, and Parkinson’s, but also enable them to reconstruct damaged parts of the human nervous system in sillico to cure vision, hearing and motoric problems, to name a few. The dramatic increase in understanding of the inner workings of the neural system in the last decade and the emergence of massively parallel computing accelerators (such as GPUs) make possible simulation of more plausible biological nervous system models than ever before. The DigiCortex project enables computational simulation of biological nervous systems on modern computing platforms with the aim of providing a utility vehicle for in silico experimentation. In this presentation, we give an introduction into biological neural networks, illustrate the inner workings of the DigiCortex engine on the model of the early visual system of a cat with acceleration on NVIDIA GK110 Titan GPUs using CUDA, and provide our insights on using the computational power of GPU accelerators for high speed simulation of millions of cortical neurons on a desktop supercomputer.

Anton-WijsAnton Wijs PhD

TU Eindhoven

Efficient Reconstruction of Biological Networks via Transitive Reduction on General Purpose Graphics Processors

Techniques for the reconstruction of biological networks which are based on so-called genetic perturbation experiments often predict direct interactions between nodes that do not exist. Transitive reduction removes such relations if they can be explained by an indirect path of influences. The existing algorithms for transitive reduction are sequential and might suffer from too long run times for large networks. They also exhibit the anomaly that some existing direct interactions are removed. In the Computational Biology group of the Eindhoven University of Technology, we developed efficient scalable parallel algorithms for transitive reduction on general purpose graphics processing units (GPGPUs) for both standard (unweighted) and weighted graphs. Edge weights are regarded as uncertainties of interactions. A direct interaction is removed only if there exists an indirect interaction path between the same nodes which is strictly more certain than the direct one. This is a refinement of the removal condition that avoids to a great extent the erroneous elimination of edges. Our experiments show that: i) taking into account the edge weights improves the reconstruction quality compared to the unweighted case; ii) the GPGPU implementation is about 100 times faster than the sequential one when reducing graphs with 10,000 nodes.

geenenThomas Geenen


Running Petsc on GPUs with an example from fluid dynamics

The Petsc framework is a widely used linear algebra package for applications modeled by partial differential equations.

It provides (parallel) data structures and a wide range of solvers and preconditioners focused on solving large sparse systems of equations in parallel. Recently Petsc has been ported to GPU’s using CUDA. The implementation is done with Thrust a C++ template library for CUDA based on the Standard Template Library and Cusp an open source C++ library of generic parallel algorithms for sparse linear algebra. Using these libraries allowed the Petsc developers to transparently expose data structures on the GPU to Petsc running on the host. The use of CUSP allowed them to access a set of efficient preconditioners and solvers for sparse systems running on the GPU. We will present the design and implementation of the CUDA port of Petsc and how this can be applied to a fluid dynamics application written in the Petsc framework.


Jaap van de Loosdrecht

Van de Loosdrecht Machine Vision BVNHL Centre of Expertise in Computer Vision

Connected Component Labelling, an embarrassingly sequential algorithm

Many research projects are in a quest for one domain specific algorithm to compare the best sequential with best parallel implementation on a specific hardware platform. This project is distinctive because it investigates how to speed up a whole Computer Vision library by parallelizing the algorithms in an economical way and execute them on multiple platforms. The library consists of more than 100,000 lines of sequential C++ code.

Many low level Computer Vision algorithms are embarrassingly parallel. But the important class of connectivity based algorithms is embarrassingly sequential. An often used connectivity based algorithm is Connect Component Labeling. In this presentation a recent published Connect Component Labeling GPU algorithm is discussed and some improvements are suggested.


Wouter-OuwensWouter Ouwens

TU – Dynamics and Control

Visualizing sound and vibrations using a GPU and a 1024 channel microphone array

Visualizing sound waves around and vibrations on a product helps to understand and improve the dynamic behavior of a product (e.g. localize and reduce sound vibrations in consumer electronics, resulting in quieter products). Sorama, a TU Eindhoven (TU/e) spin-off, developed algorithms that can visualize sound and vibrations produced by a source with great accuracy, using a 1024 channel microphone array. Currently, TU/e and Sorama are investigating the possibilities to use a GPU for acoustic algorithms (e.g. far-field beam forming and near-field acoustic holography). A GPU can potentially speed up calculations significantly and helps the user to gain faster information about the vibrations in a product, resulting in shorter development time. Real-time analysis of vibrations also opens new applications for this technique, such as condition monitoring during production. This presentation discusses the implementation of acoustic algorithms (beam forming and near-field acoustic holography) on a GPU. At the end of the presentation there is a live demo with a 1024 channel microphone array, demonstrating acoustic algorithms and the benefits of using a GPU for this application.


bedorfDrs. Jeroen Bédorf

Computational Astrophysics Leiden

Gravitational N-body simulations on 1 to many GPUs

In this talk we show the application of GPUs for gravitational N-body simulations. The simulations range from single GPUs simulations of small star clusters up to simulations of the Milky Way containing billions of particles and using thousands of GPUs. The Milky Way simulations are performed with the parallel GPU tree-code Bonsai running on the Titan supercomputer. We discuss the problems encountered and the steps taken and to get a parallel GPU code to scale from a few parallel GPU nodes on a local cluster up to thousands of GPU nodes in large supercomputers.


Full Sponsors NL

  • StreamComputing Trainer and consultant in OpenCL and GPGPU. Makes performing GPU-code out of CPU-code. StreamComputing Trainer and consultant in OpenCL and GPGPU. Makes performing GPU-code out of CPU-code.
  • SURFsara supports researchers in the Netherlands and works closely together with the academic community and industry. SURFsara supports researchers in the Netherlands and works closely together with the academic community and industry.