What I want to say is ---- Heritrix crawls specific formats!
This is the most concrete thing for me in terms of learning about search engines. However, since there's no internet access in the dormitory anymore, I have to put the crawler part on hold for now and start reading the source code of the Lucene section formally. Regardless of whether my future study of search engines ends in failure or with some small success, I think I will continue to learn open-source projects in the future. I've really fallen in love with this concept now. The only thing that makes me a bit uneasy is that Java has truly become a former hero and no longer enjoys the same glory it once did...