But as the site’s membership rapidly grew into the millions, it was not uncommon for surveys to generate responses from 500,000 members.It became clear to Rudder that Excel by itself could not handle data from 500,000 responses; a more robust solution was required.One of the most interesting aspects of the blog is the size of the samples.As Rudder noted in the blog’s inaugural post back in June 2009: Old media could only get 3,050 people to answer a poll about Obama.
“If I have some data and I have an idea for analyzing that data, the chances are good that someone in the R community has already written a program that does what I need,” says Max Shron, a data scientist at Ok Cupid. “There was a tremendous amount of data,” says Rudder.
“R lets us get a ‘zoomed-out’ view of what’s going on with the data, which helps us decide quickly if the tack we’re taking with the data is yielding something interesting.
Once we figure out what we’re looking for, and we start narrowing our focus, then we can move into Excel,” says Rudder.
Unlike some of the more widely used, proprietary, analytic tools, programs written in R can handle the larger and more complex data sets generated by Ok Cupid’s growing base of users.
That makes R a good choice for data scientists who are interested in pushing the envelope of traditional statistical analysis.
R is not a panacea for solving every data challenge – at least not yet, says Rudder.