Using Yahoo Query Language (YQL) to Monitor a Web Page

27 / Oct / 2012 by raj 0 comments

I keep on experimenting on new things. So this time I thought of experimenting on something which I can use in my daily routine. I regularly visit a deals webpage. So, I thought of making an android mobile app for my android device that can track the deals webpage and notifies me whenever a new deal is added.

The deals website does not provide any api or service by which I can accomplish this. The only way to accomplish this task was to scrap the deals page and monitor it for changes.(I know scraping of other’s webpage is illegal, but its OK if we are doing it for the purpose of learning.)

One possible solution was to use a server side script that scraps the web page and stores its hash. So, in subsequent scraps, if the hash changes, that means the web page content has been changed. If it is so, it returns true otherwise it returns false. We periodically call the script from the phonegap app using jquery ajax call. Thus, when true is returned, it notifies me of the change in web page content and hence I get to know that a new deal is posted. The only problem with this approach is, We need to have extra resources like a server of our own where our script is hosted.

The other solution is to use Yahoo Query Language(YQL). By this solution, we will neither need server of our own nor the script. We just need to write a Yahoo Query and pass it along with the url provided by Yahoo. It is executed on Yahoo server and result(which is the complete html of the page in json format) is returned. In my case, I have created Yahoo Query from the javascript code. The query returns the deals web page content in json format(we can configure it to return result in other formats like xml also). We can then calculate its hash using javascript and store it in phone’s local storage. In subsequent scraps we just need to compare the already stored hash with the newly calculated hash to know if the content has been changed or not. Quite Simple! Let’s see how to implement it in code.

        function poll(url) {
            var yql = 'http://query.yahooapis.com/v1/public/yql?q=' + encodeURIComponent('select * from html where url="' + url + '"') + '&format=json&callback=?';
            jQuery.getJSON(yql, function (data) {
                if (data) {
                    var newHashCode = JSON.stringify(data.query.results.body).hashCode(); 
                    //implementation of hashCode method is shown later
                    var oldHashCode = window.localStorage.getItem("dataHash");
                    if (oldHashCode) {
                        if (!(oldHashCode == newHashCode)) {
                            alert(url + " changed!");
                        }
                    }
                    window.localStorage.setItem("dataHash", newHashCode);
                }
            });
        }

Here, we pass the url of the web page which we want to monitor to the poll() function.
The variable yql stores the entire url which contains the address of yahoo page which will be hit and our yahoo query containing the url of webpage which we want to monitor. YQL has a simple MySQL query like syntax.

'select * from html where url="' + url + '"'

means select the entire html page of specified url. We can be as specific as we want in selecting page content.

Here is the implementation of hashCode method. The following code will inject hashCode method to every string. You can find its explanation here.

String.prototype.hashCode = function () {
            var hash = 0;
            if (this.length == 0) return hash;
            for (i = 0; i < this.length; i++) {
                ch = this.charCodeAt(i);
                hash = ((hash << 5) - hash) + ch;
                hash = hash & hash;
            }
            return hash;
};

Final step is to call the poll() function periodically to test for changes in webpage.

$(function () {
            document.addEventListener("deviceready", onDeviceReady, true);
});

function onDeviceReady() {
            setInterval("poll('www.eample.com')", 300000);
}

Here we are calling the poll() function every 5 minutes and passing it the url to be monitored.
I just used one small query of YQL. It has a lot more to offer. You can explore it in more detail here.
You can see the content returned from a particular url by writing your query at YQL Console.

I found YQL quite interesting. Hope you will find it interesting too.

Regards

Raj Gupta
raj.gupta@intelligrape.com
@rajdgreat007

FOUND THIS USEFUL? SHARE IT

Leave a comment -