The way search engine crawlers access web pages is through links. The methods for crawling web page links are breadth-first and depth-first. However, from the perspective of resource consumption on their own servers, they generally adopt a breadth-first strategy.
For search engines, there won't be a lot of web content stored on their own servers that holds little significance for users. First, this helps reduce the occupation of server resources. Additionally, it significantly improves the user experience of search results. Therefore, even if some heavily reprinted articles are successfully crawled (indicated by a 200 code in the web server logs), they will still be filtered out during the preprocessing stage.