Web (URL) Scraping Tasks
Check out the general usage examples to learn how to use MassMine. Below is a description of the tasks available for web (URL) scraping.
Task parameters marked * are required. For parameters marked with a + choose only one.
Returns the text from one or more supplied URLs with html tags removed. If you only want the plain text from a webpage, and don't care about the underlying structure (lists, tables, etc.), then this is the quickest method for extracting webpage content.
- query*: One or more URLs, separated by spaces
# Scrape the text from a single webpage massmine --task=web-text --query=http://www.massmine.org # Scrape multiple webpages at once massmine --task=web-text --query='http://www.massmine.org http://www.call-cc.org/'