Web 3.0: When Web Sites Become Web Services - ReadWriteWeb
Popularity Report
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
URL Tag Cloud
- web3.0
- , web
- , api
- , trends
- , webservices
- , mashup
- , web2.0
- , article
- , future
- , data
- , rss
- , technology
- , software_social
- , web_services
- , mashups
- , research
- , web3
- , scraping
Bookmark History
Saved by 72 people (-20 private), first by anonymouse user on 2007-03-19
- Janetcoolman on 2009-09-04 - Tags web3.0 , webservices
- Cvestal on 2009-08-22 - Tags no_tag
- Caseblack on 2009-05-14 - Tags Article
- Muge_akbulut on 2009-01-23 - Tags web , 3.0
- Spreadneck on 2009-01-09 - Tags google-bookmarks , Bookmarks , Web_3.0 , webservices , api , mashup , trends
Public Sticky notes
Highlighted by jackie
Highlighted by jackie
Highlighted by rogerboeken
Highlighted by avanelk
Highlighted by mck134
Highlighted by jgentry
Highlighted by whertha
Highlighted by yishuninajiang
Highlighted by whertha
Highlighted by eyalnow
Highlighted by whertha
Highlighted by whertha
Highlighted by chanelubrin
Highlighted by whertha
Highlighted by eyalnow
Highlighted by whertha
. This service opens access to the majority of items in Amazon's
product catalog. The API is quite rich, allowing manipulation of users, wish lists and
shopping carts. However its essence is the ability to lookup Amazon's products.
Highlighted by chanelubrin
Highlighted by chanelubrin
So how do these services get around the fact that there is no API? The answer is that they leverage standardized URLs and a technique called Web scraping. Let's understand how this works. In del.icio.us, for example, all URLs that have the tag book can be found under the URL http://del.icio.us/tag/book; all URLs tagged with the tag movie are at http://del.icio.us/tag/movie; and so on. The structure of this URL is always the same: http://del.icio.us/tag[TAG]. So given any tag, a computer program can fetch the page that contains the list of sites tagged with it. Once the page is fetched, the program can now perform the scraping - the extraction of the necessary information from the page.
How Web Scraping Works
Web Scraping is essentially reverse engineering of HTML pages. It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information. The actual data is mingled with layout and rendering information and is not readily available to a computer. Scrapers are the programs that "know" how to get the data back from a given HTML page. They work by learning the details of the particular markup and figuring out where the actual data is.
Highlighted by mck134
Highlighted by eyalnow
Highlighted by chanelubrin
Highlighted by whertha
Highlighted by whertha
Highlighted by eyalnow
There are several good reasons why Web Sites (online retailers in particular), should think about offering an API. The most important reason is control. Having an API will make scrapers unnecessary, but it will also allow tracking of who is using the data - as well as how and why. Like Amazon, sites can do this in a way that fosters affiliates and drives the traffic back to their sites.
Highlighted by whertha
Highlighted by whertha
How Web Scraping Works
Web Scraping is essentially reverse engineering of HTML pages. It can also be thought of as parsing out chunks of information from a page.
Highlighted by eyalnow
Highlighted by whertha
, has recently launched. It focuses
on letting people create mashups and widgets from web services and rss. Before both of
these, Dapper
launched a generic scraping service
for any web site. Dapper is an interesting technology that facilitates the scraping of
the web pages, using a visual interface.
Highlighted by chanelubrin
Highlighted by chanelubrin
Highlighted by eyalnow
Highlighted by chanelubrin
Highlighted by chanelubrin
Highlighted by eyalnow
Highlighted by eyalnow
Highlighted by eyalnow
Highlighted by eyalnow
Highlighted by chanelubrin
Highlighted by chanelubrin
Highlighted by eyalnow
Highlighted by eyalnow
Highlighted by chanelubrin
Highlighted by chanelubrin
Highlighted by chanelubrin


Public Comment