If you enter a URL, the LivingWebBrowser downloads the data from the URL. Then it recognizes the data is a HTML or not. If the data was a HTML data, it resolves the HTML data as a chunk of tag data. Then it calculates the statistics of the number of the tag data. Then it classifies the HTML data to some categories by trend of the statistics of the tag data.
I made the classification from studies of the statistics of current HTML data. At the first I download almost 5,000 web pages from my own URL list. Then calculate the statistics of the Web pages. I looked into the statistics and see the differences of each Web page from the difference of usage of tag.
The classification is loosely depending on the history of the development of HTML tag usage. For example, it uses relatively new tag, such as object or script, then it classified as flashy visualization.
I choose these for major divisions. There are some minor divisions in each major division.
- flash, java page -> gradations
- table page -> rectangles
- image page -> circles
- list page -> lines
And I also made another classification for No-HTML page (PDF, etc.) and Not Found pages.
Many company web pages are heavily using table tag. So the company pages tend to be classified as the table page (rectangles). Personal pages are tend to be classified the other categories. In this classification, you can see such kinds of differences.