Big Data Driving Evolution of HPC

Over at Scientific Computing’s blog, Gord Sissons of IBM makes an interesting case that the availability of large amounts of inexpensive storage has started to drive the evolution of HPC, much as cheap computing has changed the HPC scene over the last few decades.

“HPC is changing again, and the catalyst this time around is Big Data. As storage becomes more cost-effective, and we acquire the means to electronically gather more data faster than ever before, data architectures are being re-considered once again. What happened to compute over two decades ago is happening today with storage.”

Read the full post here.

REDCap clinical research software coming

We want to let you know that ICT will launch later this summer a new service based on the REDCap software.

REDCap (Research Electronic Data Capture) is a mature, secure web application for building and managing research data collection and databases—specifically designed for clinical research. See http://project-redcap.org/   It has been developed over many years and has about a thousand institutions using it around the world.

ICT has arranged an institutional license for the REDCap software from Vanderbilt University.

The service we are building will mean that researchers can use the software without arranging their own license, without having to acquiring equipment and software, etc. We make the software available on virtual machines within our server room.   We’ll provide access to a shared implementation of the software in which you can create your data collection forms, put them on the web, create the database, analyze or export the data.  The implementation will be run by UofS ICT and your data will be stored on ICT servers at the Saskatoon campus.  It should be noted that this software and our systems are probably NOT HIPA compliant (http://www.health.gov.sk.ca/hipa)

We anticipate no charges for use of this service.

We are planning to have this service built on REDCap ready by the end of August.   If you’re interested, contact us at research_computing@usask.ca so we can keep you informed.  You can also find out more at http://project-redcap.org/

 

Research Computing group expands Advanced Computing support

On March 17th, Dr. Juan Carlos Zuñiga-Anaya joined us in Research Computing in Client Services as our Advanced Computing Research Analyst.

photo

Juan has been working in academic environments for several years in the fields of scientific computing, high performance computing, and scientific software development, including time as an assistant professor in Mathematics/Computer Science at the University of Guadalajara. He has a broad education, including a MSc in Electrical Engineering, specializing in applied Computer Science, and a PhD in Systems Theory, with a specialization in computational mathematics.

Juan will be working in our group focusing on advanced computing/HPC and scientific software. If you need help with optimizing your code, working with scientific software such as MATLAB,  using HPC research infrastructure, or just want to chat about your research, please contact Juan and the rest of the team at research_computing@usask.ca.

Big Data Hubris

Scientific American recently published an interesting article on their Observation Blog, titled Why Big Data Isn’t Necessarily Better Data. It nicely highlights one of the pitfalls of implementing Big Data analytics, namely, believing your data and/or data analysis are better than they are.   The example used in the article is Google Flu Trends (GFT) which seek to use Google’s search data to track outbreaks of influenza.  The results have been mixed, with results being similar to the CDC tracking data but overestimating the prevalence of flu in 100 of 108 weeks in 2011-2013 which one study examined. Google itself has found that the GFT data is highly susceptible to media coverage, which may explain some of these issues.

The main takeaway from this article is to be conscious of factors that could affect the quality of both your data and your analysis. This quote sums it up nicely:

Big data hubris is the “often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.”

Big data is a useful tool, but like all tools it must be used properly.  Some food for thought.