Big Data, Social Publishing and the Grey Lady

Pirate walking the plankToday's big story, at least in Gotham, has to be Janet Robinson's surprise retirement announcement. The New York Times CEO spearheaded the paper's digital strategy, guiding it through the Times Select offering to the current paywall framework that counts even my household as a customer, albeit ipso facto, given a hard copy weekend edition that is more artifact than edifice.

 Amy Chozick's piece tells us that the Grey Lady "has been particularly aggressive under Ms. Robinson’s leadership in creating new digital products and has recently built an online subscription model that has performed much better than anticipated." Chozick, a newcomer to the Times encampment, broke the story on Twitter as well as "traditional" Web copy (i.e. - hours ahead of print). Perhaps they tapped her to write Robinson's professional obit precisely because of her newbie status...no emotional attachment, etc. What loyal staffer could describe with a straight face Robinson's recent meeting with her boss, Arthur Sulzberger, Jr., who "raised the issue of installing a different type of leadership at the company,"

Ouch.

Then again, Chozick's earlier piece on another senior departure at the LA Times set her up nicely as the designated chronicler of editorial seppuku. Who better to deliver the news than a cheerful refugee from Rupert Murdoch's Wall Street Journal?

Just kidding.

All this to set up the real story. Big Data is coming, and it's taking no prisoners. Last week, I successfully completed the out-of-body experience known as the Cloudera Developer Training for Apache Hadoop.

Don't worry. Six weeks ago, it was Greek to me as well.

As an open-source applications architect (you'll have to Google that one..:), I had kept Hadoop on my radar as an eventual to-do item, a distributed computing technology developed in the past few years and incubated at Yahoo, whose Sunnyvale campus was home to my middling development efforts around the same time.

Hadoop supports a Brobdingnagian approach to analytics, enabling data scientists to manipulate terabytes of information simultaneously, on hundreds or even thousands of computers. Problems that might take days, months or even years to solve in traditional linear fashion are now completed in minutes or hours at most. A gross simplification, but what is life without a bit of hyperbole?

So after four days cooped up in a windowless room with two dozen rocket scientists from Apple, Juniper, NetApp and assorted other Silicon Vally tribes, after the truly humbling discovery that my Java programming chops had not completely vanished, given my focus on the LAMP stack (ibid, Google), I took an online certification exam that left me quaking with self-doubt until the final message that I had passed. Mirabile dictu.

To paraphrase another journalistic icon, I have seen the future and it is big data. My earlier talk about vectors and the velocity of knowledge? As we say in New York, forget about it. Vectors are a fundamental element of statistical analysis. Data scientists are the new rock stars. Newspapers will morph into social publishers, or die trying. Yes, that's the hook. In a world now sadly without Christopher Hitchens, patterns of data generated by social gestures will govern the success of journalism, and all forms of publishing. Informed analytics, statistical models driven by machine learning, will resuscitate advertising. An Orwellian fantasy? Perhaps.

To quote Hitchens, whose Why Orwell Matters joins my growing reading list, "In whatever kind of a ‘race’ life may be, I have very abruptly become a finalist." Harsh medicine for anyone, especially publishers who ignore Big Data.