Welcome to datahtml
Datahtml is a library to process and extract data from html and xml content.
Datahtml lets you:
Extract ld+json data from html
Extract frequently used meta tags from html (those that are used for SEO and social media, between others)
Extract Article data from a html, usually from Newspaper sites
Parse RSS feeds from sites
Crawl some specific social media sites like google and youtube
Under the hood datahtml uses libraries like BeautifoulSoup, Newspaper2k, feedparser between others