Wednesday 14 January 2015

Finding hash of a web page using python

I wanted to detect whether a page has changed or not. Hashing is the most common way to determine this. Using python, it turned out to be a simple four line code:

In [1]: import requests
In [2]: import hashlib
In [3]: page_contents=requests.get('http://www.google.com')
In [4]: hashlib.sha256(page_contents.text.encode('utf-8')).hexdigest()
Out[5]: 'bb5b5872d83f2f9a89912630b27dd0af145727b818f757ea86b4d6a09cadeb32'

Stack overflow source - http://stackoverflow.com/questions/17159609/create-a-checksum-for-a-fetched-webpage

No comments:

Post a Comment