changing to 2-clause BSD license
some cleanups
sync, preparing for move to GH
download without threads
download without threads
making (potentially buggy) cache optional
sync
finishing touches to clarax before first produc...
finishing touches to clarax before first produc...
adaptation of German tokenization scripts for E...
bugs, bugs, busg
adding external gzip option to tender
still chasing a bug
still chasing a bug
chasing a bug
minimal fixes to cwb2noske
sync
adding marmot and ner wrappers
adding cwb2noske
add langfilter
sync annotation
some bug fixes in random walker
improvement in link extraction/selection
basic random walker stable
adding cache meachnism to clarax, bugs fixed, o...
major progress on walker, still with bugs and w...
towards basic random walk functionality
finished data writing refactoring
progress on refactoring formatted writers to wo...
refactoring formatted writers to work with ClaraX
finished politeness manager
snapshot of politeness manager
adding clarax.ini
snapshot of work on politeness manager etc.
moving towards the first ClaraX walker
integrated texrex processors into ClaraX
adding external modified(!) libraries
adding trwalkers.pas
Adding skeleton of ClaraX rewrite
fixed problem with host meta extraction
fixed illegal codepoint cleanup
fixed the most annoying bug in TTrNormalizer
fixed two stupid loop counter bugs in TrUtf8Filter
minimal improvement in TrTextAssessmentMulti
sync
host and TLD extraction plus some final touches
sync
finished TODO for behindthecow
almost done wit Unicode/UTF-8 cleanups
finished normalizer/UTF-8 cleaner
bug fixes
adding ISO boilerplate MLP
NFC normalizer working
adding minimal ICU normalization wrapper
fixed a major bug in arc position recording (in...
cleanup
refactoring TTrArcReader done; starting TTrWarc...
more towards the end of refactoring TTrArcReade...
in the middle of refactoring TTrArcReader for b...
modified reader architecture for alternative re...
polishing text assessment
TTrTextAssessmentMulti complete, but untested
begin multi-language changes for CommonCrawl data
add EN depbigram extractor
fixes to calf
minor Calf stuff
minor Calf stuff
adding setup.py for Calf
adding Calf
sync
syncing german annotation
annotation scripts updates
adding raw ngram extractor
adding sv cwbify script
syncing annotation dir
scripts: mate and malt for german
adding TigerCOW-DE MATL model
adding German MALT model
adding NER models
mor work on german abbreviations
almost done with German tagging
almost done with german tagging
German annotation
update German annotation
COW12 cleanup scripts
DE tokenization
more COW12 scripts
adding cleanup scripts for COW12
German annotation WiP
initial commit of german annotation scripts
sync
fix cow-cwb-en + cleanup annotation scripts
updated CWB scripts for ENCOW14
sync
adding cow-slice-builder
adding en malt scripts
adding en malt scripts
changing CWB encode wrappers
add simle XML check wrapper for gzipped COW XML...