This past year, I had the opportunity to participate in a NSF panel for the first time.  These panels are convened to evaluate academic research proposals.  Panel members review several proposals each, discuss the merits of each proposal in a group setting, and recommend which ones should be funded to the NSF. Panelists’ recommendations are not binding—NSF makes the final funding decision.

Listed below are some of my thoughts about this experience.

The past 10 years or so has seen a large amount of research on how to create end-to-end traces of distributed-systems’ activity.  Such traces show the workflow of causally-related activity (e.g., activity required to service a request) across every component of the distributed system.  For example, one end-to-end trace might show the functions executed by a request as it traverses a front-end gateway, a load balancer, a database, and the local filesystem where the requested data is stored. The trace might also show detailed timing information, such as the overall response-time of the associated request and the execution times of each individual function.   Some examples of tracing-related research efforts include Magpie (OSDI 2004), Stardust (Sigmetrics 2006), and X-Trace (NSDI 2007).  Recently, several industry implementations have also emerged, including Google’s Dapper and Twitter’s Zipkin.  This year’s NSDI included two papers that could be classified as end-to-end tracing infrastructures: NetSight and FlowTags.

This is an experiment in live blogging, so beware ;).  My immediate impressions: the focused on advertising systems was interesting.  Large-scale analysis was a focus of many talks and (as one would expect), the answer was always to use a map-reduce-style infrastructure.

