Before using Heritrix, make sure that JDK, Eclipse, and the relevant Eclipse plugins are installed on your machine. Initially, I made the mistake of not installing Eclipse but instead used JBuilder for debugging, which always failed.
1. Installation:
The current version number is 1.12.1, and the official website address is http://crawler.archive.org/. For a standard installation, unzip the package to the relevant directory, then configure the system environment variable "HERITRIX_HOME" to point to this unzipped directory (assuming the Java environment has already been configured).
2. Post-Installation Steps:
Unzip `%HERITRIX_HOME%\heritrix-1.12.1.jar` to a temporary directory, then copy the `profiles` directory from it to `%HERITRIX_HOME%\conf\`. This step resolves a bug in Heritrix related to the default Profile configuration.
3. Configure Management Account:
Copy `%HERITRIX_HOME%\conf\jmxremote.password.template` to `%HERITRIX_HOME%`, and rename it to `jmxremote.password`. Then edit the password section of this file as follows: `monitorRole @PASSWORD@ monitorRole admin`.