Best way to go for Domain Keywords

Hi everyone.

I am looking for a solution to get keywords to a list of URLs:

e.g. www.amazon.com
Result should be (shopping, cloth, TVs, smartphones).

What would your suggestion to get to the keywords? Right now i am getting all html data (HTML1 / 2/ 3, HTMLTitile etc.) and try to find the data inside of the output. But that seems not to be the smartest way.

I want to have a clou about the website what products they offer to their customers.

Best

I guess this depends on what you mean with keywords and if there's a logical pattern in the page you're scraping. I would look in the Amazon (product?) pages and see if the relevant keywords appear in the same place every time. Then limit the target scraping to those objects in the HTML.

For industry classification have a look at the webshrinker api that offers industry classification for a web page. Else the dandelion api should help