How to scrape web pages from the command line using htmlq - Linux Tutorials - Learn Linux Configuration

EgDoc · January 18, 2022, 3:05am

Web scraping is the process of analyzing the structure of HTML pages, and programmatically extract data from them. In the past we saw how to scrape the web using the Python programming language and the “Beautilful Soup” library; in this tutorial, instead, we see how to perform the same operation using a command line tool written in Rust: htmlq.

This is a companion discussion topic for the original entry at https://linuxconfig.org/how-to-scrape-web-pages-from-the-command-line-using-htmlq

Daniel_Cordey · January 20, 2022, 5:20pm

Why can’t we share the code between users on the same system ? I don’t see the reason behind downloading all these MBytes for every user… Probably not the htmlq’s fault, but cairo doesn’t look very smart in this case.