S

SiteCrawler

This project provides a simple WebCrawler with retry-capabilities, functionality to distinguish between http/https sites. It biggest feature is that it allows for plugins (or CrawlerActions), which allows you to hook your scripts into the crawling process. It also allow for setting "blocked" URLs. Those URLs or patterns will not be crawled.
https://github.com/forcedotcom/SiteCrawler
The BSD 2-Clause License
Salesforce.com
Jasper Roel
Files download
File Operation
SiteCrawler-1.0.0.jar download
SiteCrawler-1.0.0.pom download
SiteCrawler-1.0.0-sources.jar download
Apache Maven
<dependency>
  <groupId>io.github.jasperroel</groupId>
  <artifactId>SiteCrawler</artifactId>
  <version>1.0.0</version>
</dependency>
Gradle Groovy
implementation 'io.github.jasperroel:SiteCrawler:1.0.0'
Gradle Kotlin
implementation("io.github.jasperroel:SiteCrawler:1.0.0")
Scala SBT
libraryDependencies += "io.github.jasperroel" % "SiteCrawler" % "1.0.0"
Groovy Grape
@Grapes(
  @Grab(group='io.github.jasperroel', module='SiteCrawler', version='1.0.0')
)
Apache Ivy
<dependency org="io.github.jasperroel" name="SiteCrawler" rev="1.0.0" />
Leiningen
[io.github.jasperroel/SiteCrawler "1.0.0"]
Apache Buildr
'io.github.jasperroel:SiteCrawler:jar:1.0.0'