R tutorial: Learn to crunch big data with R

A few years ago, I was the CTO and cofounder of a startup in the medical practice management software space. One of the problems we were trying to solve was how medical office visit schedules can optimize everyone’s time. Too often, office visits are scheduled to optimize the physician’s time, and patients have to wait way too long in overcrowded waiting rooms in the company of people coughing contagious diseases out their lungs.

One of my cofounders, a hospital medical director, had a multivariate linear model that could predict the required length for an office visit based on the reason for the visit, whether the patient needs a translator, the average historical visit lengths of both doctor and patient, and other possibly relevant factors. One of the subsystems I needed to build was a monthly regression task to update all of the coefficients in the model based on historical data. After exploring many options, I chose to implement this piece in R, taking advantage of the wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques implemented in the R system.

One of the attractions for me was the R scripting language, which makes it easy to save and rerun analyses on updated data sets; another attraction was the ability to integrate R and C++. A key benefit for this project was the fact that R, unlike Microsoft Excel and other GUI analysis programs, is completely auditable.

Alas, that startup ran out of money not long after I implemented a proof-of-concept web application, at least partially because our first hospital customer had to declare Chapter 7 bankruptcy. Nevertheless, I continue to favor R for statistical analysis and data science.

Source: InfoWorld Big Data