Posted July 15, 2013 by Dr. Henri Montandon in brain experiments
 
 

R: free software programming language for statistics and graphics

 

 

 

 

 

 

 

Statistical analysis has been a bête noire for many students in neuroscience. And yet of all the maths available, statistical analysis has been the real work horse for neuroscience research. When I was doing my doctoral thesis, I learned as much statistics as I needed on my own, and that was enough to be able to ask questions of experts. Strangely, over the years, mathematics has become a deep passion of mine. This is because I now understand mathematics to be a language, the language of the universe, just as Galileo said. Statistics, because it is applied mathematics, has grown faster, more urgently even, then other maths. The site reviewed today is the gold standard for statistical tools. Even if statistics is dreadful to you, an hour spent browsing this site will give you a full understanding of the breadth and depth of the subject. And, to be sure, the Open Source movement encourages questions in a way which strictly commercial web sites do not.

There are at least 48 statistical packages and environments available for researchers, of which 15 are open source. (For comparison of the different programs, go here): https://en.wikipedia.org/wiki/Comparison_of_statistical_packages).

With ample justification, it can be said that R is more than a software package, it is a movement, with its own web site, wik, book library, graphics gallery, search engine, journal and certification .

From the web site: (http://www.r-project.org/)

Introduction to R

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R is available as Free Software under the terms of the Free Software Foundation‘s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage      facility,
  • a suite of operators for calculations      on arrays, in particular matrices,
  • a large, coherent, integrated      collection of intermediate tools for data analysis,
  • graphical facilities for data analysis      and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective      programming language which includes conditionals, loops, user-defined recursive      functions and input and output facilities.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

R is now the most widely used statistical software in academic science and it is rapidly expanding into other fields such as finance.

The history of R’s development is itself a fascinating story, (http://www.computerworld.co.nz/article/489306/story_r_statistical_tale_twist/), highlighting not only the battle being waged between commercial IT and Open Source, but between Open Source and Open Core.

 

 

 

 

 

  

 


Dr. Henri Montandon