Parsers

parse_url

datahtml.parsers.parse_url(url: str, socials_url=['facebook.com', 'instagram.com', 't.me', 'facebook.com', 'twitter.com', 'tiktok.com', 'youtube.com', 'spotify.com', 'wikipedia.org', 'meetup.com', 'linkedin.com', 'books.google.com', 'bit.ly', 'apps.apple.com', 'play.google.com']) URL

Parse a url string to datahtml.types.URL.

Parameters:
  • url – the fullurl to be parsed

  • socials_url – it’s a list used to identify if the url belongs to any know social network or not. It could be deprecated in the future because it’s seem out of scope for this function.

For developers: URL_REGEX return a tuple with 3 values: (protocol, netloc, path)