Parsers
parse_url
- datahtml.parsers.parse_url(url: str, socials_url=['facebook.com', 'instagram.com', 't.me', 'facebook.com', 'twitter.com', 'tiktok.com', 'youtube.com', 'spotify.com', 'wikipedia.org', 'meetup.com', 'linkedin.com', 'books.google.com', 'bit.ly', 'apps.apple.com', 'play.google.com']) URL
Parse a url string to
datahtml.types.URL.- Parameters:
url – the fullurl to be parsed
socials_url – it’s a list used to identify if the url belongs to any know social network or not. It could be deprecated in the future because it’s seem out of scope for this function.
For developers: URL_REGEX return a tuple with 3 values: (protocol, netloc, path)