Virtual Sensor Nodes Comment
In a previous blog post, I discussed the idea of a virtual scripting layer that operated over the Sensorpedia data sources. Each data source, instead of just exposing links to data, will also expose a data-oriented programming interface. Users will be able to write and upload scripts that can transform raw data and communicate with external applications. By associating scripts with tags, users will be able to easily share and incorporate functionality into a data source by applying the appropriate tag to the data source. So, for example, if a user has associated an HTML parsing script with the tag HTML, other users can benefit by applying the same tag to their data source.
So what will these scripts actually look like? Although I still haven’t quite decided on the final syntax, I’d like users to write these scripts in multiple languages. That way users comfortable with Java can program in Java while trendier users can write these scripts in Ruby. One way we can accomplish this is by exploiting the many languages that are available for the Java runtime (Java, JRuby, Jython, Clojure, etc.). The only thing we’ll impose is some minor syntax additions that are shared across all languages that
define the data-events the script can handle. For now, the scripts kind of look like this:
script ScriptName
dep “DependantData”
event “DependantData”
// Language-specific code
end
Each script defines a set of data-event methods that are invoked whenever data with the appropriate name is published (a publication may be associated with some data value). Data can be published via several sources, including from the network (so that scripts can react to external tools), timer mechanisms (for periodic sampling), or from
other methods. Once invoked, methods can, in turn, publish other data elements and thus trigger additional methods. Since each script works independently and are loosely coupled, the user will be able to load new scripts without affecting or even knowing about the internal operation of other scripts.
Although there’s still a long way to go with respect to both the specification and implementation, enough has been implemented that I’ve been able to rewrite the Tables demo using this new framework. Instead of hard-wiring Tables to interact with the DASMet tower webpages, Tables interacts with a set of scripts I wrote associated with the Tables and DASMet tags. These scripts are then grouped together as a virtual sensor node. Tables, in turn, communicates with these virtual sensor nodes over the internet using a simple text-based protocol (see Figure). Unlike the old demo, Tables can now interact with any data source (besides DASMet) that apply the Tables tag and includes a script that exposes sensor data.
Obviously this short description is not enough for users to begin writing their own scripts, but hopefully it should give you an idea of where I’d like to go with this. There’s still a lot of work to do, including better session support and better integration with the Sensorpedia database. Also, besides evolving the final syntax, users can only write scripts using Java. Once I integrate some other language support, I’ll post another blog entry explaining the more technical details of how these things are written, loaded, and executed.
More than data Comment
There’s a trend occuring right now where people are interested in collating sensor data and providing a nice Web 2.0 interface. I can count several projects besides Sensorpedia, including: Sensorbase (CENS), Sensormap (MSR), and Sensor.Networks (Sun). Sensorbase provides a nice interface to construct and query relational tables, while Sensor.Networks provide a nice interface to interact with Sun Spot devices. Sensorpedia, in constrast, emphasizes a loosely-coupled approach by which “sensors” simply publish data in their own format and register specific URLs.
Although these tools do their job well within their original assumptions, they all lack a general way of interacting with the data sources as programmable units (ie: as computational sensor nodes). For example, Sensorpedia allows users to view sensor data, but does not provide any convenient method to manipulate that data. If a user wants to monitor water levels in a particular geographic area, he or she will need to download the relevant sensor data, write a script to parse that data, and finally execute the relevant computation. Unless there’s a way for the server to restrict which data sources to give back to the user, the user will most likely need to download all the data wasting time and bandwidth. In addition, these tools provide limited interfaces to interact with the data, and do not provide a simp
le way to construct additional interfaces.
However, imagine if users had access to a programmable substrate and could write scripts for virtual sensor nodes. Users could then write scripts to transform the native sensor format (ie: whatever Sensorpedia finds on the internet) to a uniform, structured format, enabling a variety of end-user interfaces, including Tables and SQL. In turn, additional scripts could perform analysis, etc.
So what would these scripts look like? Each script would be associated with either sensor nodes or specific tags. A user may write a script applicable for all nodes (ie: exposing the node ID or url) or write a script only applicable for specific nodes. By associating a script with a tag, all nodes that share that tag would inherit functionality. This way, all nodes tagged with Dasmet would automatically employ the Dasmet conversion script. Since these scripts are designed to interact with sensor data, it makes sense to expose an event-driven data-oriented interface. As scripts generate new data, they may trigger additional scripts, and so on. That way, users will be able to write relatively short scripts that can interact with other scripts in a loosely-coupled, flexible manner.
Since scripts are associated with specific nodes and tags, there may be many scripts executing at any given time. It’s easy to imagine that most data sources will be associated with a basic conversion script. Many virtual sensor nodes will also be associated with some analysis scripts and one or more interface scripts. Consequently as Sensorpedia becomes popular, scalability will become an issue. Since these are virtual sensor nodes, it makes sense to explore using virtual machines to implement this scalability. As load increases, the virtual sensor nodes could instantiate themselves on additional physical nodes. As load decreases, the sensor nodes could merge back onto fewer physical nodes.
So where do we go from here? Ultimately, it’s my goal to implement something that resembles the embedded figure (in terms of functionality). Tables should be able to execute over this scripting layer along with other interesting querying interfaces. In the meantime, I still need to design the scripting environment, produce some examples, and implement a scalable execution environment. I plan on posting implementation details and results as times goes on. As always please feel free to contribute ideas.
Programming tools for Sensorpedia Comment
Besides automatically discovering new sensor data, I’m also interested in what to do after we get the data into Sensorpedia. Right now users interact with Sensorpedia via the web mapping tool. Users type in a search term that matches the title, textual description, or user-supplied tags and generates a list of geo-indexed placemarkers. This is pretty useful (and cool to view), but there are limitations.
Unfortunately, the data in Sensorpedia is mostly formatted to be consumed by human viewers. Click on one of placemarkers and you may find an HTML table, an XML file, or a png file of a graph. This can make interacting with the actual data a bit difficult. However, if we are able to parse the data, then we may be able to do more interesting and dynamic things with the data.
What sort of tool might we use? In previous (and ongoing) research, I developed Tables, a spreadsheet-inspired programming tool for sensor networks. Tables supports flexible data querying, local computation that executes on individual sensor nodes, and collective functions that can aggregate data from multiple nodes. I thought that this may be a useful tool for Sensorpedia, so I began porting a basic version.
I began by first writing scripts that parse the pages provided by the DASMet weather towers at ORNL. (search for “dasmet”). Next, I wrote a Java interface for a virtual “sensor node” that stores this data and implements an interpreter for the Tables spreadsheet language. Finally, I linked up the Tables interface to this virtual sensor network.
The first video shows different ways of querying the sensor network using a tool called a “pivot table”. The user is instructing the virtual sensor network to display the URLs associated with each DASMet tower with the unique ID of each virtual sensor node. Next, the user is constructing a more useful query asking for the Temperature values associated each tower. The video progresses through several more queries that includes additional metadata visually organized in different ways. The final query shows the user correlating two different sensor data.
In the second video, the user constructs a query to view the Temperature data again. Afterwards, the user writes a function that records whether the sensor node has any Temperature values greater than ’60′. This function is executed on each sensor node and is automatically executed whenever the node gets new Temperature values. Now the user can construct a query to view which sensor nodes exceeded the threshold value. Although our example does this immediately (and so we don’t have any surprises), we could leave the sensor network running longer to accumulate these values.
Afterwards, the user types in another function to average the Temperature values. Unlike the previous function, this function is typed into the “t = 1″ sheet. This makes it so that the function collects data from multiple sensor nodes (instead of executing over a single node). Whenever a sensor exceeds the Temperature threshold, it will contribute data to the average.
So we see how using a combination of pivot tables, local functions, and collective functions the user can write some interesting and dynamic code that runs over both real and virtual sensor networks. The key, of course, is actually creating the virtual sensor network. Currently Tables only works with the DASMet towers. In the future, I hope to find a scalable way to get Tables to work over more of the Sensorpedia data. This will be difficult since different data sources present data in a different way, but we may be able to extend the virtual sensor node concept so that users can submit “scripts” for each sensor source that converts the native data format to a more structured format.
Well enough for this post. Like always, feel free to leave suggestions, comments, and results…
Discovering new sensor data Comment
While David has been working hard on Sensorpedia’s infrastructure, I’ve been thinking about different ways to automate the process of identifying, tagging, and extracting sensor data from the internet. This would be handy for several reasons. 1) we wouldn’t have to spend valuable human time performing a relatively mundane task and 2) having a sensor crawler would ensure that we would discover new sensors as they come online. Overall this is a pretty ambitious task, but to get started I’ve been asking: what is sensor data anyway? For the purpose of this small experiment, I decided sensor data is any numeric data that contains some textual elements that describes that data. This is probably too simple of a definition, but it will do for now. Using this simple definition, it should be relatively straightforward to examine the number of numeric characters in a document to determine if a page has “sensor” data.


In the first figure I took the list of known sensor sources from the sensorpedia database. The sources were filtered to only include ‘text/html’ and ‘text/plain’ to avoid images, video, etc. For each data source I downloaded the page and graphed the ratio of numeric characters that appears in the main body (excluding any html tags and punctuation). For example, if the page contained exactly two characters (an ‘a’ and ’5′), then the ratio would be 0.5.
It’s pretty evident that most of the data sources contained between 30 and 50 percent numeric characters. The only exceptions to this were the first few sources and the very last source. As for the first few sources, I found that they were php files that contained images of sensor graphs instead of alphanumeric content (apparently mislabeled in the sensorpedia database). The last source supposedly contained 100% numeric data (after punctuation removal). This is a little weird since most users would have no way of understanding this data, but presumably somebody is publishing this data for their own benefit. After removing these two extreme groups, we get an average of about 37%.
As for the second figure, I did the exact same thing except I substituted the known sensor sources with 2695 random webpages (I wrote a small Ruby crawler to do this for me). It’s pretty striking how different the figures are. There appears to be two distinct groups of pages. The great majority of the webpages contained less than 1% numeric data. There’s also a smaller group that contains about 20% numeric data. Oddly enough many of the ones with 20% numeric data seemed to be pointing to some Japanese website discussing weather data. I can’t read Japanese, so I’m not quite sure what it’s all about. Finally there’s at least one page with nearly 50% numeric data. Upon closer inspection that extreme page ended up being a UPS page that contained lots of actual data (see screenshot).
Once I graphed this data I wanted to know if a simple threshold test would work to differentiate the two types of webpages. The threshold I used was the average numeric ratio of the known sensor data minus one deviation. This excludes the random webpages, but also excludes several of the legitimate sensor sources. Using two deviations (the lower brown line) still excluded most of the random pages, but also included all the known sensor data. For a first pass, this test seems to work pretty well!
There’s still a lot of work to do (ie: differentiate sensor data from any old table of data, etc.) and I haven’t even thought about graphs, images, and video… Until then, please send me suggestions (or better yet, results)!
Introductions Comment
Hello everybody! I’m new here (three weeks in Knoxville!) and I just wanted to introduce myself to the Sensorpedia community. I just started working at Oak Ridge National Lab in the Data Systems Sciences & Engineering group and will be involved with various Sensorpedia-related projects. Before arriving here, I was doing post-doctoral work at the Renaissance Computing Institute, a research institute in Chapel Hill, NC affiliated with UNC. It was fun being a postdoc and will miss Franklin St. but I’m pretty stoked about the exciting work happening here at the lab. Before then I was in sunny Albuquerque, NM where I received my PhD in CS from UNM (with advisor Barney Maccabe). As time goes on, I plan on posting some of my sensorpedia related research, results, and ideas here on this blog. Anybody should feel free to comment, give suggestions, and of course collaborate to provide results!
Sensorpedia Sneak Peek 1 Comment
Sensorpedia is still in a limited beta testing phase, but we’re happy to announce a new Sneak Peek at the application. We’d love to hear your thoughts on our progress so far.
The Sneak Peek provides read-only access for non-beta users to search and explore the data currently in Sensorpedia. Contribution of data is currently still limited to beta testers. Sign up for our Sensorpedia mailing list to be notified when we move to open beta. (If you’ve got some really cool data you’d like to interface to Sensorpedia that just can’t wait, please contact us with details.)
Check out the Sensorpedia Sneak Peek!
Here are some sample searches to point you in the right direction to start exploring:
Click the + sign in the search sidebar to add it to the active layers list and expand out into individual sensor locations.
(Note, some feeds like the ICAO weather data for the US and some of the buoy data sets are rather large and take some time to pop up when you add them to the active layers list.)
Please send us your feedback on the Sensorpedia Sneak Peek so we can incorporate your suggestions into the final product.
UPDATE (Jul 6, 2010)
We have been incorporating beta user feedback since we released the “sneak peek” and are looking forward to releasing a new version with improved interface and more powerful API this summer! Stay tuned to the blog for all the latest news.
A look back… and forward! Comment
Summer 2009 was great time for the Sensorpedia program. I’m very thankful for the all the new ideas, innovation, and enthusiasm shown by our seven summer interns. We had a great team that was very productive. (And we had a lot of fun too!)
If you’ve not already done so, please take a few minutes to check out the student’s blog posts detailing their efforts on everything from an iPhone application to a Sensorpedia Python library. By the end of the summer we had over 3,000 sensors registered with Sensorpedia monitoring everything from weather to volcanic activity. The guys had a great impact on improving how we (collectively) share information. Thank you!
The students also gained experience and sharpened skills that will be valuable to them throughout their career. I have started encouraging interns to read several books as part of their learning experience at the lab. My recommended reading list contains books in categories that I think are important for Computer Science interns to read. In this list you won’t find the typical CS related titles, but rather a number of books designed to stretch your thinking a bit. If you are a student or are working in this area, I highly recommend you check out some of these titles. Contact me if you’re interested in learning more about internship opportunities at ORNL.
Looking forward…
Stay tuned for more updates on where we’re heading with Sensorpedia. We’re looking to build on the momentum of this summer as we move soon from private to open beta. Sign up to be notified when we make the switch. We’ll also be announcing here on the blog and on Twitter.
addSensor: A New API-Alternative Sensor Data Interface Comment
The sun has set on my first summer internship with Sensorpedia. What was I involved with during these ten fast weeks? It was a summer filled with learning unexpected topics, using the Sensorpedia API to register new sensors, filming a new Sensorpedia video (and in turn, showing my ability for gaffes), helping Sensorpedia advance through its beta stages, and creating a new application for registering sensor data.

This new application is called addSensor. As this blog title states, it is a new interface that allows users to register their sensor data to Sensorpedia without needing to learn anything about our Application Programming Interface. My addSensor application takes care of all the required programming tasks behind the scenes for the user. For more detailed information about addSensor (including screen shots and why it is important), check below this text to see the poster that I created for the 2009 ORNL summer student poster session on August 5, 2009. In the remainder of this blog, I will make two main points about this application.
First, it is dynamic. Focus was paid to creating an interface that didn’t overwhelm the user, yet had all the features necessary to create full fledged, data-rich sensor feeds. This was accomplished by making only the minimally required data fields immediately accessible, with the other fields dynamically accessible as the user’s needs dictated. It also has lots of tools (and tooltips) to help the user along the way. For example, a map can appear if requested, with a draggable marker that generates the associated latitude and longitude. Also, various preview options provide immediate confirmation of successful entry by the user. Smaller interfaces within the addSensor interface, such as calendar and time tools, help round out the overall functionality.
Second, it is a work in progress. There is an old saying in the video production field that says, “With enough time and money, you can create almost anything.” I’ve found that this saying holds true in computer science as well. Creating all of the dynamic options of addSensor certainly took some time and means that there are some parts of the application that still need attention. However, I believe that the application as a whole represents strides in the right direction toward a great tool for Sensorpedia.
I have many people to thank for my internship. First, thank you to those who wrote letters of recommendation for me regarding this internship. Second, thank you to the organizations that fund Sensorpedia. Thank you also to the organizations that funded my internship (ORNL, ORISE, DOE CCI). Finally, thank you to the people that I worked with at ORNL: the other summer interns, project leader Bryan Gorman, lead Sensorpedia developer David Resseguie, and everyone else around the Sensorpedia lab and the CSED. Your guidance and camaraderie made it a fantastic experience!
A First Iteration Comment
The word “iteration” is used a lot in certain fields like computer science, mathematics and music, where a repetition of steps is often necessary. An iteration is often a new or different version of something. Whether you’re
already familiar with this word or not, you know about it in a basic human sense. Take for example the phrases, “A second chance”, “Learn from your mistakes”, and “There’s no substitute for experience”. These all speak to the truth of what new iterations can bring. But before new iterations can come, there has to be a first time. This summer at Sensorpedia there were several first iterations going on. Several of them were first times for me, and some of them were first times for Sensorpedia.
What kind of first times? For me, it was my first time interning in a major research environment. Also, the new addSensor application that I developed underwent its first iteration (although Chris Tomkins-Tinch first created an early predecessor to it). Sensorpedia’s first times included reaching new registered sensor milestones, advancing its main web application, and collaborating with its first group of private beta testers.
The important part about having finally done something for the first time is realizing what the next iteration can hold. The leaders of Sensorpedia are figuring out ways to formalize existing technologies to create a powerful web 2.0 site for sensor data and other types of information sharing. Now having many of their first iterations underway, their position reminds me of an event that occurred a little more than one hundred years ago.
It was the year 1900 when two brothers from Ohio left their bicycle shop and headed to the breezy dunes of Kitty Hawk, North Carolina. They were thought to be crazy for their belief that controlled, powered flight was possible- but even crazier for going to actually do it. Wilbur and Orville Wright’s first iteration of their “flying machine” that autumn was with a glider. Flying only as a kite not far above the ground, Wilbur rode the glider while testing wing-warping and other control techniques. This led to follow up iterations in the summer of 1901 and the fall of 1902. Through persistent experimentation with their gliders, they discovered that only three keys existed to controlled flight: pitch, roll, and yaw (the three axes). Armed with that important discovery, the Wright brothers set out the following year on their next iteration. It was then that they added another element to their flying machine- a single speed aluminum-cast motor. They paired the motor with a new kind of propeller (inspired by what they had seen on ships) to create the means to power the plane. Then on December 17, 1903, the brothers successfully flew their new machine four times.
Those first controlled, powered flights weren’t much by today’s standards. The longest of those four flights that day was a mere 852 feet- and if we were witnesses that day we might have said that “controlled” was stretching it. But after that day, it took less than 66 years to put men on the moon and return them safely. It all started with Wright brothers’ first iteration flying machine.
Like the Wright brothers before us, we now look toward the next iterations of Sensorpedia’s projects. What can we learn from this summer’s first iterations?


