Simple C# Screen Scraping Proxy with JQuery

Was asked today how to do a screen scrape an external site using JQuery. The short version is, you can’t do it with JQuery alone. There exist certain security measures that prevent ajax requests going out to other domains/points of origin.

You can achieve the effect in a number of ways. The most old school of these is using an iframe, but in most cases this just won’t cut it as you’ll need to be able to manipulate the returned HTML.

A better way is to code up a simple server side proxy that does the scrape, and then do your ajax postback to there instead. Here’s an example in C#…

            
using (WebClient client = new WebClient())
{
   string url = "http://www.google.com/"; 
   Byte[] requestedHTML = client.DownloadData(url);
   UTF8Encoding objUTF8 = new UTF8Encoding();       
   
   //This line just writes the string straight back to the response, but you 
   //could just as easily stick it in a string variable and manipulate it to 
   //your hearts content!                         
   Response.Write(objUTF8.GetString(requestedHTML)); 

}

Let’s say you saved that as the code behind of an otherwise blank page called “Ajax/scape.aspx”. You’d then just need to use the jquery “.load” command

$("myDiv").load("Ajax/scrape.aspx");

and you’re there! Note that the “load” command will cache by default, so if you need something more complex look up the “.ajax” command.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s