Improved
Improve web crawler, document ingestion, and screening workflows
15 days ago
Multilingual Web Crawling
- The web crawler now supports multiple languages, allowing it to navigate non-English websites and locate relevant pages like sustainability reports and investor relations sections.
- A penalty system prevents the crawler from drifting to external domains when following links from a company's website.
- Download links are now detected and pre-fetched during crawling to confirm they serve actual documents before adding them to the collection.
- Duplicate URL detection prevents processing the same page or document more than once.
Document Download & Ingestion
- Dynamic timeout strategy adjusts based on file size, preventing timeouts on larger documents.
- Retry strategy for PDF conversion — if the initial conversion fails, the system fetches the page content directly and retries.
- File extensions are now preserved instead of defaulting to
.pdf, which previously caused Excel and other non-PDF files to fail processing. - Added support for older Excel files (
.xlsformat).
Screening Workflows
-
Mentioning a user in a comment now automatically subscribes them to that screening response for future notifications.
-
Comments, mentions, subscriptions, assignees, and due dates are now preserved when regenerating a response or cloning a screening.
-
Locked responses now properly block edits and regeneration with a clear error message.
-
Added analytics tracking for comment actions, subscriptions, and mentions to provide better insight into team collaboration.