(NOTE: I continue to update this list as I encounter new posts. Email me if you know of an article I haven’t listed yet.)
Myself and several other members of component projects of the Berkeley Data Analytics Stack (BDAS), especially Spark and Shark, have been trying to keep track of news articles that mention or are about the projects. I’ll discuss why this sort of news coverage is so unique and deserved below, but first the list:
- Movie Recommendations and More With Spark. On personal blog by Nick Pentreath, April 1, 2013
- Getting Spark Setup in Eclipse. On personal blog by James Percent, March 26, 2013
- Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon. Article about Mesos on wired.com by Cade Metz, March 5, 2013
- From Strata, the New Big Data on the Block. About BDAS, by Steve Miller on information-management.com, February 27, 2013
- SQL is what’s next for Hadoop: Here’s who’s doing it. By Derrick Harris on GigaOm, February 21, 2013
- On siliconANGLE
- Top 5 Open Source Projects in Big Data. By Molly Sassmann, February 4, 2013
- Big Data Up to 100X Faster – Researchers Crank Up the Speed Dial. By John Casaretto, November 29, 2012
- Various Spark related articles on Quora
- On DBMS2, a blog by Curt Monash
- Introduction to Spark, Shark, BDAS and AMPLab, December 13, 2012
- Spark, Shark, and RDDs — technology notes, December 13, 2012
- On Quantfind’s Blog, by Imran Rashid
- Unit testing with Spark, January 4, 2013
- Configuring Spark’s logs, January 4, 2013
- What Makes Spark Exciting. On Bizo Development Blog by Stephen Haberman, January 21, 2013
- On O’Reilly Strata
- The future of big data with BDAS, the Berkeley Data Analytics Stack by Andy Konwinski, Ion Stoica, Matei Zaharia, February 18, 2013
- Five big data predictions for 2013 by Ed Dumbill, January 16, 2013
- Shark: Real-time queries and analytics for big data by Ben Lorica, November 27, 2012
- Spark 0.6 improves performance and accessibility by Ben Lorica, October 16, 2012
- Seven reasons why I like Spark by Ben Lorica, August 21, 2012
- Spark: an Open Source Engine for Iterative Data Mining. On DataInformed, Ian B. Murphy, October 17, 2012
- Shark Attack on SQL and Analytics. On Datanami by Datanami Staff, December 1, 2012
- Comparison of Hadoop Frameworks – Hive, Pig, Scalding, Scoobi, Scrunch and Spark. On AI Computer Vision (personal blog) by Sami Badawi, March 26, 2012
- Hadoop for Real-Time: Spark, Shark, Spark Streaming, Bagel, etc. will be 2012′s new buzzwords. On Telruptive by Maarten Ectors, August 15, 2012
- Bring some Spark into your life. On Mawazo by Pranab Ghosh, September 27, 2012
- Taming the Big Data Beast with Hadoop and Alternatives. On Linux For You by Sandya Mannarswamy, March 25, 2012
- Using Spark and Hive to process BigData at Conviva. On Conviva company blog by Dilip Joseph, December 27, 2011
- On ByteMining (personal blog) by Ryan Rosario
- My Review of Hadoop Summit 2011, June 30th, 2011
- Hadoop Fatigue — Alternatives to Hadoop, August 16th, 2011
Why did we build this list? I’m a huge fan of the Spark Project, and I also contribute to it (mostly documentation, and community related stuff). Spark was born and developed in the UCB AMPLab. The lab’s approach to releasing and supporting high quality open source software projects and fostering communities around them is somewhat unique, and has resulted in serious adoption of the projects coming out of the group.
Research labs traditionally spend very few resources on promoting adoption of the projects that are built as part of their research agenda because often the research ideas can be tested using throw-away software prototypes. The extra energy required to turn such prototypes into production quality projects is not obviously worth the effort. The folks in the AMPLab believe the extra effort is worth it. Not only do graduate students spend a lot of time answering questions on developer and user mailing lists, but the lab itself is investing in the effort. For example, the AMPLab recently hired Matt Massie, Cloudera engineer #5 and he is recruiting a rock-star team charged with testing and hardening the software coming out of the lab.
If the goal is “free”, high-quality, next-generation software, then how can we measure if we are succeeding? Well, we can measure adoption of the BDAS software for production use, as well as the grassroots community activity by hundreds of BDAS enthusiasts. Another way is to measure and track discussion of the software in the media, which the above list aims to do. By any metric, we can say that the AMPLab projects like Spark and Shark are having tremendous impact!
Feel free to email me if you know of other articles, or if you are a technology journalist or reporter and would like introductions to the key folks on the Spark, Shark, Mesos or BDAS projects.