You are currently browsing the Sensorpedia blog archives for October, 2009.

Archive for October, 2009

More than data

Tuesday, October 20th, 2009

There’s a trend occuring right now where people are interested in collating sensor data and providing a nice Web 2.0 interface. I can count several projects besides Sensorpedia, including: Sensorbase (CENS), Sensormap (MSR), and Sensor.Networks (Sun). Sensorbase provides a nice interface to construct and query  relational tables, while Sensor.Networks provide a nice interface to interact with Sun Spot devices. Sensorpedia, in constrast, emphasizes a loosely-coupled approach by which “sensors” simply publish data in their own format and register specific URLs.

Although these tools do their job well within their original assumptions, they all lack a general way of interacting with the data sources as programmable units (ie: as computational sensor nodes).  For example, Sensorpedia allows users to view sensor data, but does not provide any convenient method to manipulate that data. If a user wants to monitor water levels in a particular geographic area, he or she will need to download the relevant sensor data, write a script to parse that data, and finally execute the relevant computation. Unless there’s a way for the server to restrict which data sources to give back to the user, the user will most likely need to download all the data wasting time and bandwidth. In addition, these tools provide limited interfaces to interact with the data, and do not provide a simpscript_architecturele way to construct additional interfaces.

However, imagine if users had access to a programmable substrate and could write scripts for virtual sensor nodes. Users could then write scripts to transform the native sensor format (ie: whatever Sensorpedia finds on the internet) to a uniform, structured format, enabling a variety of end-user interfaces, including Tables and SQL. In turn, additional scripts could perform analysis, etc.

So what would these scripts look like? Each script would be associated with either sensor nodes or specific tags. A user may write a script applicable for all nodes (ie: exposing the node ID or url) or write a script only applicable for specific nodes. By associating a script with a tag, all nodes that share that tag would inherit functionality. This way, all nodes tagged with Dasmet would automatically employ the Dasmet conversion script. Since these scripts are designed to interact with sensor data, it makes sense to expose an event-driven data-oriented interface. As scripts generate new data, they may trigger additional scripts, and so on. That way, users will be able to write relatively short scripts that can interact with other scripts in a loosely-coupled, flexible manner.

Since scripts are associated with specific nodes and tags, there may be many scripts executing at any given time. It’s easy to imagine that most data sources will be associated with a basic conversion script. Many virtual sensor nodes will also be associated with some analysis scripts and one or more interface scripts. Consequently as Sensorpedia becomes popular, scalability will become an issue. Since these are virtual sensor nodes, it makes sense to explore using virtual machines to implement this scalability. As load increases, the virtual sensor nodes could instantiate themselves on additional physical nodes. As load decreases, the sensor nodes could merge back onto fewer physical nodes.

So where do we go from here? Ultimately, it’s my goal to implement something that resembles the embedded figure (in terms of functionality). Tables should be able to execute over this scripting layer along with other interesting querying interfaces. In the meantime, I still need to design the scripting environment, produce some examples, and implement a scalable execution environment. I plan on posting implementation details and results as times goes on. As always please feel free to contribute ideas.

Programming tools for Sensorpedia

Sunday, October 11th, 2009

Besides automatically discovering new sensor data, I’m also interested in what to do after we get the data into Sensorpedia. Right now users interact with Sensorpedia via the web mapping tool. Users type in a search term that matches the title, textual description, or user-supplied tags and generates a list of geo-indexed placemarkers. This is pretty useful (and cool to view), but there are limitations.

Unfortunately, the data in Sensorpedia is mostly formatted to be consumed by human viewers. Click on one of placemarkers and you may find an HTML table, an XML file, or a png file of a graph. This can make interacting with the actual data a bit difficult. However, if we are able to parse the data, then we may be able to do more interesting and dynamic things with the data.

What sort of tool might we use? In previous (and ongoing) research, I developed Tables, a spreadsheet-inspired programming tool for sensor networks. Tables supports flexible data querying, local computation that executes on individual sensor nodes, and collective functions that can aggregate data from multiple nodes. I thought that this may be a useful tool for Sensorpedia, so I began porting a basic version.

I began by first writing scripts that parse the pages provided by the DASMet weather towers at ORNL. (search for “dasmet”). Next, I wrote a Java interface for a virtual “sensor node” that stores this data and implements an interpreter for the Tables spreadsheet language. Finally, I linked up the Tables interface to this virtual sensor network.

The first video shows different ways of querying the sensor network using a tool called a “pivot table”. The user is instructing the virtual sensor network to display the URLs associated with each DASMet tower with the unique ID of each virtual sensor node. Next, the user is constructing a more useful query asking for the Temperature values associated each tower. The video progresses through several more queries that includes additional metadata visually organized in different ways. The final query shows the user correlating two different sensor data.

In the second video, the user constructs a query to view the Temperature data again. Afterwards, the user writes a function that records whether the sensor node has any Temperature values greater than ‘60′. This function is executed on each sensor node and is automatically executed whenever the node gets new Temperature values. Now the user can construct a query to view which sensor nodes exceeded the threshold value. Although our example does this immediately (and so we don’t have any surprises), we could leave the sensor network running longer to accumulate these values.

Afterwards, the user types in another function to average the Temperature values. Unlike the previous function, this function is typed into the “t = 1″ sheet. This makes it so that the function collects data from multiple sensor nodes (instead of executing over a single node). Whenever a sensor exceeds the Temperature threshold, it will contribute data to the average.

So we see how using a combination of pivot tables, local functions, and collective functions the user can write some interesting and dynamic code that runs over both real and virtual sensor networks. The key, of course, is actually creating the virtual sensor network. Currently Tables only works with the DASMet towers. In the future, I hope to find a scalable way to get Tables to work over more of the Sensorpedia data. This will be difficult since different data sources present data in a different way, but we may be able to extend the virtual sensor node concept so that users can submit “scripts” for each sensor source that converts the native data format to a more structured format.

Well enough for this post. Like always, feel free to leave suggestions, comments, and results…

Discovering new sensor data

Friday, October 9th, 2009

While David has been working hard on Sensorpedia’s infrastructure, I’ve been thinking about different ways to automate the process of identifying, tagging, and extracting sensor data from the internet. This would be handy for several reasons. 1) we wouldn’t have to spend valuable human time performing a relatively mundane task and 2) having a sensor crawler would ensure that we would discover new sensors as they come online. Overall this is a pretty ambitious task, but to get started I’ve been asking: what is sensor data anyway? For the purpose of this small experiment, I decided sensor data is any numeric data  that contains some textual elements that describes that data. This is probably too simple of a definition, but it will do for now. Using this simple definition, it should be relatively straightforward to examine the number of numeric characters in a document to determine if a page has “sensor” data.

known_sensor_data_static_thresholdrandom_data_static_threshold

In the first figure I took the list of known sensor sources from the sensorpedia database. The sources were filtered to only include ‘text/html’ and ‘text/plain’ to avoid images, video, etc. For each data source I downloaded the page and graphed the ratio of numeric characters that appears in the main body (excluding any html tags and punctuation). For example, if the page contained exactly two characters (an ‘a’ and ‘5′), then the ratio would be 0.5.

It’s pretty evident that most of the data sources contained between 30 and 50 percent numeric characters. The only exceptions to this were the first few sources and the very last source. As for the first few sources, I found that they were php files that contained images of sensor graphs instead of alphanumeric content (apparently mislabeled in the sensorpedia database). The last source supposedly contained 100% numeric data (after punctuation removal). This is a little weird since most users would have no way of understanding this data, but presumably somebody is publishing this data for their own benefit. After removing these two extreme groups, we get an average of about 37%.

As for the second figure, I did the exact same thing except I substituted the known sensor sources with 2695 random webpages (I wrote a small Ruby crawler to do this for me). It’s pretty striking how different the figures are. There appears to be two distinct groups of pages. The great majority of the webpages contained less than 1% numeric data. There’s also a smaller group that contains about 20% numeric data. Oddly enough many of the ones with 20% numeric data seemed to be pointing to some Japanese website discussing weather data. I can’t read Japanese, so I’m not quite sure what it’s all about. Finally there’s at least one page with nearly 50% numeric data. Upon closer inspection that extreme page ended up being a UPS page that contained lots of actual data (see screenshot).random_with_lots_of_numeric

Once I graphed this data I wanted to know if a simple threshold test would work to differentiate the two types of webpages. The threshold I used was the average numeric ratio of the known sensor data minus one deviation. This excludes the random webpages, but also excludes several of the legitimate sensor sources. Using two deviations (the lower brown line) still excluded most of the random pages, but also included all the known sensor data. For a first pass, this test seems to work pretty well!

There’s still a lot of work to do (ie: differentiate sensor data from any old table of data, etc.) and I haven’t even thought about graphs, images, and video… Until then, please send me suggestions (or better yet, results)!

Introductions

Friday, October 9th, 2009

Hello everybody! I’m new here (three weeks in Knoxville!) and I  just wanted to introduce myself to the Sensorpedia community. I just started working at Oak Ridge National Lab in the Data Systems Sciences & Engineering group and will be involved with various Sensorpedia-related projects. Before arriving here, I was doing post-doctoral work at the Renaissance Computing Institute, a research institute in Chapel Hill, NC affiliated with UNC. It was fun being a postdoc and will miss Franklin St. but I’m pretty stoked about the exciting work happening here at the lab. Before then I was in sunny Albuquerque, NM where I received my PhD in CS from UNM (with advisor Barney Maccabe). As time goes on, I plan on posting some of my sensorpedia related research, results, and ideas here on this blog. Anybody should feel free to comment, give suggestions, and of course collaborate to provide results!

Sensorpedia Sneak Peek

Thursday, October 1st, 2009

Sensorpedia is still in a limited beta testing phase, but we’re happy to announce a new Sneak Peek at the application. We’d love to hear your thoughts on our progress so far.

The Sneak Peek provides read-only access for non-beta users to search and explore the data currently in Sensorpedia. Contribution of data is currently still limited to beta testers. Sign up for our Sensorpedia mailing list to be notified when we move to open beta. (If you’ve got some really cool data you’d like to interface to Sensorpedia that just can’t wait, please contact us with details.)

Check out the Sensorpedia Sneak Peek!

Here are some sample searches to point you in the right direction to start exploring:

Click the + sign in the search sidebar to add it to the active layers list and expand out into individual sensor locations.

(Note, some feeds like the ICAO weather data for the US and some of the buoy data sets are rather large and take some time to pop up when you add them to the active layers list.)

Please send us your feedback on the Sensorpedia Sneak Peek so we can incorporate your suggestions into the final product.

UPDATE (Jul 6, 2010)
We have been incorporating beta user feedback since we released the “sneak peek” and are looking forward to releasing a new version with improved interface and more powerful API this summer! Stay tuned to the blog for all the latest news.