Data Scientists are much in demand these days and almost everyone wants a part of the pie. You might wonder whether the second edition of “R in Action: Data Analysis and Graphics with R” up to the task. Read on to find out.
The first edition was released way back in 2011 and received a warm reception among those with an interest in R Programming. To state the review of the second edition in a nutshell- it does not disappoint.
Geared towards statisticians coming to terms with R, this book will however leave programmers wanting to get acquainted with R, without much clue.
That does not mean that the book lacks detail about the fundamentals and intricate concepts associated with R. It just means that those who actually have some statistical work to do will derive greater benefits from it. This book is pitched towards the people who are wondering about how to do a certain task that they are able to do in another statistical package.
The book is structured in to four parts in terms skill levels. It very ambiguously starts with a “Getting Started” section and then follows it up with Basic, Intermediate and Advanced Methods in logical order and in terms of difficulty levels.
As might be assumed Getting Started covers the rudiments of R and by the end of the section you will have an a concrete idea what it entails and basic installation and plotting of data and charts respectively. A whole chapter is dedicated towards creating datasets and swiftly guides you through the data structures that are supported by R from the point of view of a user rather than a programmer. The presence of asides pointing out R terminology use deserves mention. The last two chapters of this particular section expand on transformation and manipulation of statistical data.
In certain parts, the book tends to assume that you readily recollect concepts and tasks introduced previously and lacks reference to the topics discussed and you are left to fend for yourself.
After the introductory part the book dives into the intricacies of statistics. The second part makes descriptions about creating basic charts as well as conducting elemental statistical tests. This section is ideally suited towards beginners in statistics who is trying to overcome a course hurdle using R.
The third part-Intermediate Methods delves into topics that are more advanced in scope and you learn to perform tasks like regression, Anova with the use of R. This is great as R works differently in these aspects than a package system say like SPSS. The last part of this section focuses on power analysis or in other words things like sample sizes and resampling. Intermediate graphs and resampling procedures are also dealt with.
The final section of the book shifts focus to linear models and analysis of principle factors and components. This edition introduces Chapter 15 and covers time series which was not covered in the first edition of the book. Classifying and clustering are also covered for the first time and teach you about the rudiments of these concepts though classical discriminant analysis is ignored.
The last two chapters of the book still manage to hold interest and illuminate us about missing values as well as advanced graphics. The treatment of missing values deserves special mention.
This book is not for you if you are not well versed in statistics. However it definitely does a good job of filling in some statistics concepts, provided you already know some or more of theory and how to go about your job using R. The descriptions, comments and asides makes you think about particular tasks and encourage innovation, your own way of doing things.
However this book will not solve the problems you are likely to encounter while programming in R. This book is meant for people who know stats and are looking forward to learning R. This book is good companion to a course in a reputed R programming training institute.