https://github.com/paperless-ngx/paperless-ngx
https://github.com/kangvcar/InfoSpider
https://github.com/unclecode/crawl4ai