Improved

Improve web crawler, document ingestion, and screening workflows

Multilingual Web Crawling

  • The web crawler now supports multiple languages, allowing it to navigate non-English websites and locate relevant pages like sustainability reports and investor relations sections.
  • A penalty system prevents the crawler from drifting to external domains when following links from a company's website.
  • Download links are now detected and pre-fetched during crawling to confirm they serve actual documents before adding them to the collection.
  • Duplicate URL detection prevents processing the same page or document more than once.

Document Download & Ingestion

  • Dynamic timeout strategy adjusts based on file size, preventing timeouts on larger documents.
  • Retry strategy for PDF conversion — if the initial conversion fails, the system fetches the page content directly and retries.
  • File extensions are now preserved instead of defaulting to .pdf, which previously caused Excel and other non-PDF files to fail processing.
  • Added support for older Excel files (.xls format).

Screening Workflows

  • Mentioning a user in a comment now automatically subscribes them to that screening response for future notifications.

  • Comments, mentions, subscriptions, assignees, and due dates are now preserved when regenerating a response or cloning a screening.

  • Locked responses now properly block edits and regeneration with a clear error message.

  • Added analytics tracking for comment actions, subscriptions, and mentions to provide better insight into team collaboration.