A blog about academic research & teaching

Month: April 2014

Toward principled end-to-end tracing of distributed systems

The past 10 years or so has seen a large amount of research on how to create end-to-end traces of distributed-systems’ activity.  Such traces show the workflow of causally-related activity (e.g., activity required to service a request) across every component of the distributed system.  For example, one end-to-end trace might show the functions executed by a request as it traverses a front-end gateway, a load balancer, a database, and the local filesystem where the requested data is stored. The trace might also show detailed timing information, such as the overall response-time of the associated request and the execution times of each individual function.   Some examples of tracing-related research efforts include Magpie (OSDI 2004), Stardust (Sigmetrics 2006), and X-Trace (NSDI 2007).  Recently, several industry implementations have also emerged, including Google’s Dapper and Twitter’s Zipkin.  This year’s NSDI included two papers that could be classified as end-to-end tracing infrastructures: NetSight and FlowTags.

Live blog of operational systems track at NSDI’14

Live blog of debugging complex systems session at NSDI’14

This is an experiment in live blogging, so beware ;).  My immediate impressions: the focused on advertising systems was interesting.  Large-scale analysis was a focus of many talks and (as one would expect), the answer was always to use a map-reduce-style infrastructure.

Powered by WordPress & Theme by Anders Norén