24 Jun
2010
In general, NoSQL Boston was a solid event. Lots of interesting and vocal hackers, geeks, and companies attended. The vibe was friendly (with the exception of the Hbase-Hypertable melee; who doesn't love a good debate). Overall I think 10gen put on a good show.
Keynote: Crossroads, Inroads, Pitfalls & Bylaws: Peering into NoSQL's Conceivable Future
The key takeaway from Tim was the community needs to do a better job of educating others: on the movement, on the different data stores, on architectures, etc. I couldn't agree with him more. Working in an industry extremely cautious of new technologies (pharmaceuticals and the life sciences), the biggest impediment is often a lack of knowledge, with lack of tooling and vendors to support said technologies being a close second. The former is something that is in our hands and impacts the later; where there is demand, companies will rise to fulfill the need.
Panel 1: Scaling With NoSQL
Bradford did a solid job moderating. Mark Atwood was a bit of a curmudgeon but I feel the root of that was more the makeup of the panel and less him. Once we veered off the scaling topic, we were left with two key-value stores and three column based stores which wasn't very conducive for a cohesive discussion amongst the panelists. Nonetheless, there were a few nuggets that came out which were very insightful. Mark's statement that "Memcached should be integrated into all nosql stores" was awesome. As was Bradford's question regarding operations and the general answer from the panel of it being engineer (versus administrator) driven; a by-product I suspect of the developer being "closer" to the data. Going back to the make up of the panel, what might have worked better is if the panel was split in two. One focused on key-value stores, perhaps discussing common use cases given their ease of deployment, with representatives from at least Memcached, Project Voldemort, and Tokyo Cabinet (surprised by their lack of representation). A second focused on large column stores focusing on the scaling side of the equation (where it's a much more interesting question) with the Hbase, Cassandra, and Hypertable representatives.
Panel 2: NoSQL In The Cloud
Adam's panel was the highlight of the day. The diversity and spirit of the speakers was phenomenal. Most interesting for me was first, the debate with regards to whether the community ought to be wrapping these stores in an ORM like model to avoid lock-in, and second, the contrarian position John took on what data may not belong in the cloud With regards to the first, Jonathon's remark "I see application database independence as an anti-pattern" was killer and a perspective I've never considered. The reference to the Vietnam of Computer Science was also poignant (and spawned some interesting Twitter conversations after the fact). Not sure where I stand on the ORM debate. I get Jonathon's point and completely agree with the concept of limiting yourself with abstraction. However, for the masses, the abstraction that the ORM does provide works and works well, easing adoption. With regards to the second, John shared an interesting post that delves a bit deeper on the topic. Regardless of opinion, both points were thought provoking.
Lightning Talks
Good stuff, albeit a bit off topic at times (Jim Wilson's talk was fantastic but it pushed the limit on what I'd consider relevant content). I would have liked to have seen more talks at this intimate level, though I understand the limitations of a one day event.
Panel 3: Schema Design With Document-Oriented Databases
While the panel itself was not superbly moderated (Durran seemed emotionless which is not what I expected in the least), the content was solid and the panelists well spoken, diverse, and on point. At this stage of the NoSQL movement, definitions are so important hence having the distinction made right off the bat differentiating document and key value stores was helpful; even if the panelists were not in complete accord. The short of it, Riak, Mongo, and CouchDB approach storage slightly differently, each with pros and cons, each with their own way of accomplishing common tasks. I especially loved the comments about using the right tool for the job (in response to a question about building an inverted index off a document-oriented database).
Panel 4: The Evolution of the Graph Data Structure from Research to Production
While modeling data in graph form is incredibly intuitive and natural, the storage end of the equation still feels academic. The panel title suggested the focus would be on real world implementations. Unfortunately, the moderator and panelists focused elsewhere. No question about it though, this is certainly a space to watch in the coming years.
My Guess What The Landscape Will Look Like 12-18 Months Out
- Cassandra is poised to be the front runner in the distributed column storage space (with Hbase at a close second). Maybe it's the big name companies adopting it (Facebook, Twitter, Digg, Reddit, Rackspace) but it seems to be grabbing the most attention right now. Lots of activity both on the core code base and, as important if not more in terms of adoption, the tooling around it; lots of open source libraries being released into the wild. I would not be surprised to see Rackspace offer Cassandra as a service. It seems like a good fit and opportunity for them. If not Rackspace, then a 3rd party altogether in the same vein as MongoHQ and Cloudant offering Mongo and CouchDB services respectively.
- Mongo is maturing very rapidly. 10gen is doing a fantastic job advocating. The Ruby community is embracing it wholeheartedly. I would not be surprised to see it become the forerunner in the document-oriented database world if it isn't already.
- 2010 will be a very good year for document-oriented databases. The technology is mature today and will only get better. Installation and adoption costs are minimal as compared with other NoSQL stores. Hosted services are emerging (Cloudant, MongoHQ) lowering the barrier to entry as well.
- Debate aside, I suspect we will see numerous ORMs surface around a lot of these stores. The Hashrocket guys showcased MongoDoc which has an ActiveRecord feel. I'm sure we'll see at least one for Cassandra open sourced from Digg or the like.