Apr
25

Access Deleted Web Pages with the Google Cache and the Internet Archive

Has this situation ever happened to you? You enter search keywords in Google for a very specific topic. In the resulting screen, you see the title of that perfect article with exactly what you were seeking. Hopeful, you click the link and receive a 404-error message saying that the page does not exist. This scenario sadly happens to everyone countless times. Fortunately, there are two ways to view these once accessible pages.

 

Google Cache

One of the features that set Google apart from other search engines is the Google Cache. As the Googlebot indexes web pages into the central database, it also saves the HTML portion. The HTML portion is basically the text and layout without the pictures. When searching in Google, you've probably noticed the "Cached" link.

access_deleted.gif

If you haven't tried clicking on that link, visit it. You will be directed to the saved version of that specific web page when the Googlebot last cached it. This is the first method to try when you can't download the actual page.

Google Cache Hacks

Some people like to "hack" the Google cache to display any page from the past. This is relatively easy to do if you look at the URL of a Google cached page. This is the URL of my website's cache:

CODE:
  1. http://64.233.187.104/search?q=cache:jQJ-k3RK1wMJ:www.hackernotcracker.com
  2. /+hacker+not+cracker&hl=en&ct=clnk&cd=1

It’s pretty easy to decipher the URL. The "64.233.187.104" is just the IP address for "google.com." The "search?" means that it is passing some commands to the search application. The "q" is the variable for query, or request. The "cache" tells the search application that it is looking for the cached version of the web page. The rest of the text after "cache" is the URL of the original page in a strange encoded format.

If we take the information from the original URL above, we can make our own customized URL for any page. Use this:

http://www.google.com/search?q=cache:URL

Just replace "URL" with the URL of the page that you want to view in its cached version. You can even create your own Google Cache Generator like this:

Enter the URL of the Page that You Want to See Cached:

  

HTML:
  1. <form name="cache_example_form" action="javascript:window.location='http://www.google.com/search?q=cache:' + document.getElementById('cache_example').value" method="get">
  2. <input type="text" name="cache_example" id="cache_example"> <input type="submit" value="Cache It!">
  3. </form>

Though most pages are cached, it is pretty impossible for all pages on the Internet to be included. Google only saves the pages that it crawls. If a page is not in the Google search, it will not be in the Google cache.


The Internet Archive

An alternative to the Google Cache is The Internet Archive. The Internet Archive is a more extensive database of old web pages. With the Google Cache, newer ones overwrite older pages. However with The Internet Archive, the crawler keeps every page that it archives. Sometimes it even retains the pictures and content. The only drawback is that the crawler archives fewer pages than the Google Cache does. The Googlebot saves many pages while the Internet Archive generally saves the main pages of noteworthy websites. Take a look at the websites from blue-chip companies today. It's interesting to see the evolution of each one. Look at the first Pizza Hut homepage as compared to the one today. From 1996, it's pretty scary!

cache_oldhut.jpgcache_newhut.jpg

If you enjoyed this post, make sure you subscribe to hacker not cracker via RSS feed or email update!



Additional Reading

Comment View Comments from Other Readers

Popular Posts

Featured Posts

Related Posts

No Related Posts!

Recent Posts

What's Your Reaction?


Subscribe to this Blog:

Reader Reactions Elsewhere


 

3 Responses to “Access Deleted Web Pages with the Google Cache and the Internet Archive”

  1. thanasisk Says:

    This is like ANCIENT news...
    And you do not mention that it may not work always...

  2. Darren Cornwell Says:

    This may well be old news, but I didn't know how to just request a specific page and that little form script is genius - thanks!

Leave a Reply

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 4 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a

 
Latest Post on Loading...: Please Wait...
admin admin
© 2006 and web design of Allan Ray Barizo from [art] [⁄app].
This site is best viewed with FF and at least 1024x768 resolution.