For those that prefer Ruby over Java, I have good news: Sensorpedia scripts can now be written using JRuby. It uses the same API as the Java interface (both a positive and negative), so now you’ll be able to parse data sources using the convenience of Ruby regular expressions. In order to test this, I wrote a simple parser for the NOAA data sources (which consist of simple text files). Since a sensor can have multiple scripts associated with it, I also integrated Tables functionality by associating the NOAA sensors with the “Tables” tag (the Tables scripts were already written for the DASMet sources). Here’s a video demonstrating this functionality.
In the video, the user first creates a pivot table to examine the “Air Temperature”. Not all the nodes have air temperature data, but some of those that do are in pretty cold climes (temperatures are in F). The user then writes a local function that filters nodes that register less than 20 F. Afterwards, the user creates a new pivot table to view which nodes were filtered. Then, the user constructs a collective function to take the average temperature and locations of all the filtered nodes. Finally, the user requests the final averaged data.
There’s still quite a bit of work to do with respect to the scripting layer. The most important probably being the key-value store I’m using to store and access all the data from the scripts. Right now it’s pretty fragile, but ultimately I want something that resembles map-reduce (both in terms of functionality and scalability).
In a previous blog post, I discussed the idea of a virtual scripting layer that operated over the Sensorpedia data sources. Each data source, instead of just exposing links to data, will also expose a data-oriented programming interface. Users will be able to write and upload scripts that can transform raw data and communicate with external applications. By associating scripts with tags, users will be able to easily share and incorporate functionality into a data source by applying the appropriate tag to the data source. So, for example, if a user has associated an HTML parsing script with the tag HTML, other users can benefit by applying the same tag to their data source.
So what will these scripts actually look like? Although I still haven’t quite decided on the final syntax, I’d like users to write these scripts in multiple languages. That way users comfortable with Java can program in Java while trendier users can write these scripts in Ruby. One way we can accomplish this is by exploiting the many languages that are available for the Java runtime (Java, JRuby, Jython, Clojure, etc.). The only thing we’ll impose is some minor syntax additions that are shared across all languages that
define the data-events the script can handle. For now, the scripts kind of look like this:
script ScriptName
dep “DependantData”
event “DependantData”
// Language-specific code
end
Each script defines a set of data-event methods that are invoked whenever data with the appropriate name is published (a publication may be associated with some data value). Data can be published via several sources, including from the network (so that scripts can react to external tools), timer mechanisms (for periodic sampling), or from
other methods. Once invoked, methods can, in turn, publish other data elements and thus trigger additional methods. Since each script works independently and are loosely coupled, the user will be able to load new scripts without affecting or even knowing about the internal operation of other scripts.
Although there’s still a long way to go with respect to both the specification and implementation, enough has been implemented that I’ve been able to rewrite the Tables demo using this new framework. Instead of hard-wiring Tables to interact with the DASMet tower webpages, Tables interacts with a set of scripts I wrote associated with the Tables and DASMettags. These scripts are then grouped together as a virtual sensor node. Tables, in turn, communicates with these virtual sensor nodes over the internet using a simple text-based protocol (see Figure). Unlike the old demo, Tables can now interact with any data source (besides DASMet) that apply the Tables tag and includes a script that exposes sensor data.
Obviously this short description is not enough for users to begin writing their own scripts, but hopefully it should give you an idea of where I’d like to go with this. There’s still a lot of work to do, including better session support and better integration with the Sensorpedia database. Also, besides evolving the final syntax, users can only write scripts using Java. Once I integrate some other language support, I’ll post another blog entry explaining the more technical details of how these things are written, loaded, and executed.
Besides automatically discovering new sensor data, I’m also interested in what to do after we get the data into Sensorpedia. Right now users interact with Sensorpedia via the web mapping tool. Users type in a search term that matches the title, textual description, or user-supplied tags and generates a list of geo-indexed placemarkers. This is pretty useful (and cool to view), but there are limitations.
Unfortunately, the data in Sensorpedia is mostly formatted to be consumed by human viewers. Click on one of placemarkers and you may find an HTML table, an XML file, or a png file of a graph. This can make interacting with the actual data a bit difficult. However, if we are able to parse the data, then we may be able to do more interesting and dynamic things with the data.
What sort of tool might we use? In previous (and ongoing) research, I developed Tables, a spreadsheet-inspired programming tool for sensor networks. Tables supports flexible data querying, local computation that executes on individual sensor nodes, and collective functions that can aggregate data from multiple nodes. I thought that this may be a useful tool for Sensorpedia, so I began porting a basic version.
I began by first writing scripts that parse the pages provided by the DASMet weather towers at ORNL. (search for “dasmet”). Next, I wrote a Java interface for a virtual “sensor node” that stores this data and implements an interpreter for the Tables spreadsheet language. Finally, I linked up the Tables interface to this virtual sensor network.
The first video shows different ways of querying the sensor network using a tool called a “pivot table”. The user is instructing the virtual sensor network to display the URLs associated with each DASMet tower with the unique ID of each virtual sensor node. Next, the user is constructing a more useful query asking for the Temperature values associated each tower. The video progresses through several more queries that includes additional metadata visually organized in different ways. The final query shows the user correlating two different sensor data.
In the second video, the user constructs a query to view the Temperature data again. Afterwards, the user writes a function that records whether the sensor node has any Temperature values greater than ‘60′. This function is executed on each sensor node and is automatically executed whenever the node gets new Temperature values. Now the user can construct a query to view which sensor nodes exceeded the threshold value. Although our example does this immediately (and so we don’t have any surprises), we could leave the sensor network running longer to accumulate these values.
Afterwards, the user types in another function to average the Temperature values. Unlike the previous function, this function is typed into the “t = 1″ sheet. This makes it so that the function collects data from multiple sensor nodes (instead of executing over a single node). Whenever a sensor exceeds the Temperature threshold, it will contribute data to the average.
So we see how using a combination of pivot tables, local functions, and collective functions the user can write some interesting and dynamic code that runs over both real and virtual sensor networks. The key, of course, is actually creating the virtual sensor network. Currently Tables only works with the DASMet towers. In the future, I hope to find a scalable way to get Tables to work over more of the Sensorpedia data. This will be difficult since different data sources present data in a different way, but we may be able to extend the virtual sensor node concept so that users can submit “scripts” for each sensor source that converts the native data format to a more structured format.
Well enough for this post. Like always, feel free to leave suggestions, comments, and results…