Scraping Authors from News Articles with Xpath

Chivers · March 28, 2021, 6:37pm

Hi everyone,
Does anyone know of a way to scrape authors/journalists from articles from various news sites using a common theme?

I've tried using XPath with "//div[@class='author']" but it doesn't seem to work.

Here's are a few URLs so you can see what I need:

https://www.theguardian.com/uk-news/2021/mar/28/racial-justice-is-key-to-effective-policing-says-npcc-chief-martin-hewitt
https://www.dailymail.co.uk/news/article-9411757/More-30million-Britons-received-Covid-jab-infections-fall-nearly-week.html
https://techcrunch.com/2021/03/27/y-combinator-demo-day-dispo-due-diligence/

diskborste · March 29, 2021, 12:15pm

Looks like different page layouts for each page so a universal XPath probably doesnt exist. You could do like this, creating a table with XPath for each domain, then lookup the Xpath based on the domain:

Chivers · March 31, 2021, 12:47pm

Cheers Victor, that's helped massively!

Alaa · October 31, 2022, 12:34pm

Hey Victor, this is extremely helpful. Thank you so much.

Could you elaborate on what to do in this case if the link is part of the HTML tag? Thank you so much in advance

The link is: https://www.nytimes.com/1990/07/12/garden/currents-for-a-california-winery-francoamerican-design.html