In November, I attended the Supercomputing conference in Denver, CO. The theme of this year’s conference was ‘HPC Everywhere’. Hundreds of presentations were given on all aspects of high-performance computing, from traditional HPC to big data and fast networking. All the major players in the HPC space, and many of the smaller ones had exhibits making for a lot to see in just a few days.
The presentation that stands out the most for me would be the presentation on quantum computing in HPC, given by a representative of D-Wave, a Canadian company that is at the forefront of quantum computing. Since quantum computers are good at solving certain types of problems that are difficult for traditional HPC, D-Wave proposes using the quantum computer to generate decent guesses (based on local minima of the problem space) which are then fed back to a traditional HPC system for verification. Using this hybrid approach, these problems can be solved in a fraction of the time. It will be interesting to see how this technology ends up being used in the future.
Since my interest is primarily related to data and how to manage it, this is where I focussed my attention. I attended a tutorial on Globus, a web-based tool to manage data transfers to and from HPC systems. By logging in to the Globus web site, you can initiate high-bandwidth transfers between any participating HPC sites. I’ve written before about Globus in the context of WestGrid, where we have good support for it. Compute Canada will be rolling out Globus at all sites within their purview, which will allow easy transfers between HPC sites. Globus Connect Server, a new product that’s currently being tested on Silo, will allow sharing files with other Globus users without having to open up all your data to those you wish to share with. I’ll be writing more on this at a later date.
The keynote at SuperComputing this year was by an anthropologist employed by Intel, Genevieve Bell. The topic of her presentation was ‘The Secret Life of Data’ which looked at data from an anthropological perspective. Could we benefit by looking at data from a more people-centred angle? One of the more interesting parts of her talk was a historical example of ‘Big Data’, the Domesday Book. This book is a record of the survey of much of England and Wales in 1086. This information would be used for centuries to resolve disputes and determine who was the owner of what piece of property. The fact that this book was so important for hundreds of years hints at another issue in data management, that of continued access to data. Because it was written down on paper, we are still able to access the data even today. How long will our current data be readable?
The issue of data availability is a major one in research computing today. While the Domesday book has been accessible for almost 1000 years, modern data storage is becoming obsolete much more quickly. Twenty years ago, for example, many people used floppy disks store data. If you needed to access that data today, it would be a challenge to find a floppy disk drive capable of reading the disk, and there’s still the issue of data formats. Most of the files created back then would be unrecognizable to modern software. This highlights the need to plan for the future when collecting data, and I’ll be writing more about research data management strategies in the future.