Improve web crawler, document ingestion, and screening workflows

Multilingual Web Crawling

The web crawler now supports multiple languages, allowing it to navigate non-English websites and locate relevant pages like sustainability reports and investor relations sections.
A penalty system prevents the crawler from drifting to external domains when following links from a company's website.
Download links are now detected and pre-fetched during crawling to confirm they serve actual documents before adding them to the collection.
Duplicate URL detection prevents processing the same page or document more than once.

Document Download & Ingestion

Dynamic timeout strategy adjusts based on file size, preventing timeouts on larger documents.
Retry strategy for PDF conversion — if the initial conversion fails, the system fetches the page content directly and retries.
File extensions are now preserved instead of defaulting to .pdf, which previously caused Excel and other non-PDF files to fail processing.
Added support for older Excel files (.xls format).

Screening Workflows

Mentioning a user in a comment now automatically subscribes them to that screening response for future notifications.
Comments, mentions, subscriptions, assignees, and due dates are now preserved when regenerating a response or cloning a screening.
Locked responses now properly block edits and regeneration with a clear error message.
Added analytics tracking for comment actions, subscriptions, and mentions to provide better insight into team collaboration.