|
Comments
|
Today's Top SOA Links
AJAXWorld News Desk Can RSS & XML Help Us Build the Data Web?
Let's consider the most-seen AJAX powered mashup: modifications of map sites, and adding real estate pictures
By: Ric Hardacre
Jan. 8, 2008 04:15 AM
Of course one could spend all day typing random questions into a search engine but the serious business of research lies in statistical analysis, comparing datasets for trends. One obvious example is comparing the performance of various stock markets around the globe over time, this happens so frequently that it's quite a simple task online. So is comparing currency's performances against each other. But what if you wanted to do both? What if you wanted to compare the yen against the London Stock Exchange's closing index? At first glance that might seem like a nonsensical comparison but in financial research trends are key and trends can't be found without comparing datasets, no matter how far and wide. Unfortunately most, if not all, of the data on the Web is embedded in Web pages, and most often only presented in graphical form. There's just no way of looking at a chart of something over time and reliably viewing the corresponding data, at least, not across the board. Now let's step back a moment and look at another existing Web technology: RSS (or ATOM if you are so inclined). Really Simple Syndication is a way for Web sites to publish a Over the past few years RSS has caught on in a big way, it's simple to write your own RSS parser and create a news ticker that watches your favorite Web sites, for example. Browser plug-ins have given way to native support and suddenly the Web started to feel just that little bit more integrated. RSS foretold the Web 2.0 age and should not be overlooked. Before there were JavaScript APIs coming out of the woodwork there were RSS aggregators that built virtual Web sites out of the parts of other Web sites. Now let's consider the most seen AJAX powered mashup: modifications of map sites, adding real estate pictures and locations to a map, for example. This sort of thing would be made a lot easier and accessible if the real estate agents published an RSS-like feed of properties, along with their GPS coordinates and prices. Even in this limited scope the possibilities are endless, a burger chain could publish the locations of its restaurants, or news bulletins could come attached with markers. Planes, trains, and - well, possibly - automobiles could be tracked and tacked onto maps. Want to see where the roadworks are on your journey? Just import the official highway's feed of roadworks into any mapping site or software of your choosing. This problem has already been solved to a certain extent of course with Google Earth, but the scope is limited by its narrow focus, purely on coordinates. Wouldn't it be great if a feed could contain not only property locations and prices, but that your browser could detect the presence of both. The browser itself would then let you tack them onto a map, or sort by price in a spreadsheet. This too has limits. I doubt "number of bedrooms" could be generalized into a universal datatype, but price (currency="USD") and coordinates (type="GPS") are easily comparable and transposable across differing datasets. The next step would be to merge the property and burger chain feeds, selecting only those houses within x miles of the nearest drive-thru. So let's get back to our original data problem, this time suppose I want to compare the populations of London, New York, and Tokyo over the past century. Sounds simple enough and a few minutes of Web searching yielded a handy "Inner London" set of census data I could use, which was encouraging. However, New York took a few minutes longer before it too yielded data. Tokyo proved too stubborn however and I gave up, having only managed to get data for a handful of random years; 2000, 2003, and 1960, hardly an extensive dataset. I got relatively good data for London and New York, one set of figures for each decade. One, however, was embedded in a HTML table and the other in a text file, formatted and aligned using spaces. The next step is to copy and paste each snippet of data cell by cell into a spreadsheet and finally run the graph wizard. Finally you can behold the wonder as Inner London's population stays essentially still while New York's stealthily overtakes it. Now I could be accused at this point of serial laziness, but it just seems like a lot of work, especially the extensive search engine hammering. Suppose that each site had published a standard formatted set of data, and I could run a dataset search to pull it back. Then an aggregator could look at the data and splice it together. Sounds farfetched, but is it really? Let's now pull RSS back into this discussion, suppose we apply a similar principle to this data too. First, give the dataset a header including its title and source then add each column of data with a strict set of criteria for defining its type and formatting. This way a piece of aggregator software would look at our population sets and see two columns in eac h, one for the year and one for the population. Population is just a number and at the behest of trying to keep it simple our columns could probably be just "year" and "number," leaving it up to the column title to belie what the number count is, well, counting. Next the aggregator would look across the datasets and recognizing that both have a major axis of "year" instantly know that they can be combined, even if the years don't match exactly (British population counts are done on years ending in 1, not 0 for example). Provided the definition of "year" is fixed and the numbers representing the year are always four-digit then there can never be any ambiguity between different datasets. Now the aggregator would see that both have a second column of number, but even though it doesn't know what the number pertains to it would notice that their lowest and highest values fall into a similar range, hardly rocket science. The result is the ability to graph them both together. Even if we were comparing two datasets with differing second columns then, like we do manually, we would overlay them both scaled to facilitate easy comparison with a label to say what units each are in. And here's a crude example, thrown together in minutes: <?xml version="1.0" encoding="utf-8" ?> Basically what we have here is a combination of technologies, the XML is designed to reflect RSS and bear more than a passing resemblance to an HTML table. In the latter case it means that any programmer familiar with browser DOM scripting can easily parse this too. It takes a different approach to a SOAP dataset in that the datatypes and column names are listed in a header block while the data table structure is fixed. It's also more lightweight than a serialized recordset but still contains the important details about the data we're publishing. Because it's a straight table with defined columns we can use it to publish serial data, as above, or discreet data, such as the list of burger restaurants and their locations. Finally, because it's simple XML it's extensible, rows, or cells could have individual notes attached. Indeed, notice that in 1941 World War II was going on so we only have an estimate to work with. Suddenly comparing global temperature to the numbers of sea-borne pirates over time becomes so much easier - and though I'm only speculating out loud here, I for one would welcome such a standard. Imagine what this would do for business practice in general, and democracy at large. If the de facto standard of openness was to publish on one's Web site a syndication of data: incomes, expenditure, tax, contracts - all there for the public to parse, analyze, and compare in the click of a button. Wait a minute, I hear you say, what about ODF spreadsheets? Yes, they're open, standard, and already XML formatted. But suggesting that a data syndication format would be pointless if you could already download and open a spreadsheet in a spreadsheet application is the same as asking of RSS: Why not just visit the Web site and check for new news items yourself? So ask yourself this instead: If that had remained the attitude for all this time, would we have a Web 2.0? Reader Feedback: Page 1 of 1
Your Feedback
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
||||||||||||||||||||||||||||||