Help with scraping

Nosh · February 19, 2018, 3:03pm

How can I scrape this fields seperately ?

https://www.confianzaonline.es/empresas/decantia.htm

CIF
Datos de contacto
Sitio/s web
confianzadigital

diskborste · February 19, 2018, 6:13pm

Her are some examples of extracting for first three parts:

=XPathOnUrl(A1;"//li[@class='verified']")
=XPathOnUrl(A1;"//ul/li[3]/span[2]")
=XPathOnUrl(A1;"//ul/li[4]/span[2]")

The easiest way is to inspect the elements (in your web browser, right click on the part you are interested in, chose "Inspect", then right click on in the right area and chose "Copy Xpath". Then you can use this string in your XPathOnUrl formula.

Nosh · February 20, 2018, 4:22pm

Thanks a lot ! On similar sites I get in Excel the error: endofstream exception.
Can you tell me what this means ?

diskborste · February 20, 2018, 4:47pm

Not really, can you give me an example of a site where you get that error and the formula you're using?

Nosh · February 20, 2018, 5:06pm

Here the screenshot: endofstream

diskborste · February 20, 2018, 6:49pm

It is easier if you post the URL and the part you wish to scrape

Nosh · February 20, 2018, 6:59pm

OK. Only wanted to show the error message.

here we go:
https://www.ekomi.es/testimonios-wacomestore-es.html
Dirección
Website
Representante autorizado

diskborste · February 20, 2018, 8:32pm

How about these:

Direccion:
=XPathOnUrl(A1;"//p[@class='shopAddressDetails']")

Website:
=XPathOnUrl(A1;"//a[@class='shoplink url']")

Representante
=XPathOnUrl(A1;"//ul/li[5]/div[2]")

Nosh · February 21, 2018, 11:49am

Strange. I did copy & paste and it does not work.
captura20 captura21

diskborste · February 21, 2018, 12:08pm

Perhaps you're using other separators in your system. Can you try and replace the semi-colon with comma?

How about the other xpaths I suggested?

Nosh · February 21, 2018, 12:31pm

Like this ?
None of the xpaths work captura22

dovydasm · February 21, 2018, 3:03pm

The forum messed up @diskborste's formulas (it replaced single quote characters with different characters). Try these:

Direccion:
=XPathOnUrl(A1;"//p[@class='shopAddressDetails']")

Website:
=XPathOnUrl(A1;"//a[@class='shoplink url']")

Representante
=XPathOnUrl(A1;"//ul/li[5]/div[2]")

Nosh · February 21, 2018, 5:02pm

Great stuff! Thanks !