Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-05-12 | 1.7 kB | |
v0.6.3 source code.tar.gz | 2025-05-12 | 6.0 MB | |
v0.6.3 source code.zip | 2025-05-12 | 6.1 MB | |
Totals: 3 Items | 12.1 MB | 0 |
Release 0.6.3 (unreleased)
Features
- extraction: add
RegexExtractionStrategy
for pattern-based extraction, including built-in patterns for emails, URLs, phones, dates, support for custom regexes, an LLM-assisted pattern generator, optimized HTML preprocessing viafit_html
, and enhanced network response body capture (9b5ccac) - docker-api: introduce job-based polling endpoints—
POST /crawl/job
&GET /crawl/job/{task_id}
for crawls,POST /llm/job
&GET /llm/job/{task_id}
for LLM tasks—backed by Redis task management with configurable TTL, moved schemas toschemas.py
, and addeddemo_docker_polling.py
example (94e9959) - browser: improve profile management and cleanup—add process cleanup for existing Chromium instances on Windows/Unix, fix profile creation by passing full browser config, ship detailed browser/CLI docs and initial profile-creation test, bump version to 0.6.3 (9499164)
Fixes
- crawler: remove automatic page closure in
take_screenshot
andtake_screenshot_naive
, preventing premature teardown; callers now must explicitly close pages (BREAKING CHANGE) (a3e9ef9)
Documentation
- format bash scripts in
docs/apps/linkdin/README.md
so examples copy & paste cleanly (87d4b0f) - update the same README with full
litellm
argument details for correct script usage (bd5a9ac)
Refactoring
- logger: centralize color codes behind an
Enum
inasync_logger
,browser_profiler
,content_filter_strategy
and related modules for cleaner, type-safe formatting (cd2b490)
Experimental
- start migration of logging stack to
rich
(WIP, work ongoing) (b2f3cb0)