Scraping Authors from News Articles with Xpath

Hi everyone,
Does anyone know of a way to scrape authors/journalists from articles from various news sites using a common theme?

I've tried using XPath with "//div[@class='author']" but it doesn't seem to work.

Here's are a few URLs so you can see what I need:

  • https://www.theguardian.com/uk-news/2021/mar/28/racial-justice-is-key-to-effective-policing-says-npcc-chief-martin-hewitt
  • https://www.dailymail.co.uk/news/article-9411757/More-30million-Britons-received-Covid-jab-infections-fall-nearly-week.html
  • https://techcrunch.com/2021/03/27/y-combinator-demo-day-dispo-due-diligence/

Looks like different page layouts for each page so a universal XPath probably doesnt exist. You could do like this, creating a table with XPath for each domain, then lookup the Xpath based on the domain:

1 Like

Cheers Victor, that's helped massively!

Hey Victor, this is extremely helpful. Thank you so much.

Could you elaborate on what to do in this case if the link is part of the HTML tag? Thank you so much in advance

Screenshot_1

The link is: https://www.nytimes.com/1990/07/12/garden/currents-for-a-california-winery-francoamerican-design.html