I’m sure that everyone is aware that 2013 is the International Year of Statistics. What you might not be aware of is what you can do for statistical analysis on campus. Here are a few suggestions for you.

Spreadsheet programs, like MicroSoft’s Excel or OpenOffice’s Calc, are often used for routine statistical calculations, like finding the mean of a set of numbers. However, these programs are considered by some to lack rigour in their statistical routines, and for other shortfalls based on the spreadsheet program’s focus on being a general tool, rather than a statistical processor.

SPSS (which originally stood for “Statistical Package for the Social Sciences”) is also widely used on campus. SPSS has a tabular data input format, similar to the spreadsheet format. Licenses for this package can be purchased through the Campus Computing Store.

SAS is also available through the Campus Computing Store. It is a powerful programmable language for statistical analysis.

Stata is another package that is available on campus. It has strengths in a number of areas,including new support for structural equation modelling. More information can be found about this program at its webpage under Research Computing. Stata is available through the virtual lab.

There are studies comparing the strengths of these packages to help people find which software works best for their data and analyses. UCLA’s Statistical Consulting Group has a page(1) that shows how to use the above packages for different analyses. There is also a review of the three programs in a 2005 paper by Acock(2). A more inclusive, albeit shorter blog post from 2009 on Brendan O’Connor’s blog will also help. (It has a great deal of discussion that could be beneficial as well.)

Training is available on the above systems. Please see the ICT Training web site for more information.

Another statistical software package is “R” (or “The R Project for Statistical Computing“) is a freely available package. ICT offers this on the virtual lab, as well as on moneta, the large memory HPC system.

If you don’t have enough data of your own to analyse, there are a number of open data sites that provide you with that. A recent example to cross my computer is Quandl, a website whose founder describes as “A Wikipedia for Time Series Data“. There are also quite a number of open data sites, for example this list on the Open Data Pilot Portal.

— References —

(1) Choosing the Correct Statistical Test in SAS, Stata and SPSS. UCLA: Statistical Consulting Group. (accessed 11 March, 2013).

(2) Acock, Alan C. (2005) SAS, Stata, SPSS: A Comparison. Journal of Marriage and Family, November 2005. Volume 67, Issue 4. pp. 1093–1095. Article first published online: 20 SEP 2005. DOI: 10.1111/j.1741-3737.2005.00196.x