Instapaper – HTML Selector


ip4-icon-big-e1318871822509[2]I use Instapaper a lot.  I discovered it thanks to Hanselman’s blog.

The only way I found to stay on top of things in technology is to do an aggressive Technology Watch.  I do that by reading hundreds of articles found on the web per week:  blogs, RSS feeds, online magazine, suggestions from LinkedIn & Twitter, name it.

Instapaper helps me manage that volume of reading.  It allows me to take a web page and “read it later”.  Simple concept but a powerful one!

Not only Instapaper keeps a list of web page for me, it trims them, keeping only the content, removing ads and other distractions.

The Problem

But like many tools out there, Instapaper isn’t perfect.  It has problems with some pages.  LinkedIn pages are notoriously buggy under it but some other site as well.

For instance, try to insert https://www.linkedin.com/pulse/evolving-role-chief-data-officer-ofir-shalev in Instapaper and you’ll see it choke on it!

I asked the technical support if they could fix it and they did for a while…  I don’t know, it might be LinkedIn who changed something in it.

Some other site will get their actual content cut out.

Bottom line, I can’t read some of my articles and it’s driving me mad!

The Solution

I’ve developed a very simple web solution that allows you to specify a URL of a page and an x-path expression to find an HTML element within the page.

image

For instance, the URL I gave in example and if you dig the HTML of the LinkedIn articles, you’ll find the content is within //div[@class=’stream-content’] (that means the first div, regardless of its position that has the class stream-content).

image

You press select and the system spits out the new URL:  http://htmlselector.azurewebsites.net/Selecting?url=https%3a%2f%2fwww.linkedin.com%2fpulse%2fevolving-role-chief-data-officer-ofir-shalev&xpath=%2f%2fdiv%5b%40class%3d%27stream-content%27%5d.

This URL, if you click it, simply reroutes the call to the URL you specifies, fetch the XPATH and returns only the content of the x-path.

You can then pass that URL to Instapaper and it will work.  Finally!

 

I’ve used XHTMLr to implement the x-path query within an HTML document.

 

If you think that can be useful, the web site is hosted on Azure as a free web site:  http://bit.ly/1HBcdON.  Go nuts!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s