perlwikibot-svn Mailing List for Perl MediaWiki Robot (Page 2)
Status: Pre-Alpha
Brought to you by:
rotemliss
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(19) |
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(20) |
Aug
(12) |
Sep
|
Oct
|
Nov
(3) |
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(11) |
Oct
(1) |
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
(6) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
From: <am...@us...> - 2008-08-12 07:53:55
|
Revision: 68 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=68&view=rev Author: amire80 Date: 2008-08-12 07:54:04 +0000 (Tue, 12 Aug 2008) Log Message: ----------- Nynorsk string files and a very cosmetic change to the script. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Added Paths: ----------- trunk/no-interwiki/nn.language_codes.txt trunk/no-interwiki/nn.strings.txt Added: trunk/no-interwiki/nn.language_codes.txt =================================================================== --- trunk/no-interwiki/nn.language_codes.txt (rev 0) +++ trunk/no-interwiki/nn.language_codes.txt 2008-08-12 07:54:04 UTC (rev 68) @@ -0,0 +1 @@ +link language_codes.txt \ No newline at end of file Property changes on: trunk/no-interwiki/nn.language_codes.txt ___________________________________________________________________ Added: svn:special + * Added: trunk/no-interwiki/nn.strings.txt =================================================================== --- trunk/no-interwiki/nn.strings.txt (rev 0) +++ trunk/no-interwiki/nn.strings.txt 2008-08-12 07:54:04 UTC (rev 68) @@ -0,0 +1,39 @@ +# months +January Jan +February Feb +March Mar +April Apr +May Mag +June Jun +July Jul +August Aug +September Sep +October Oct +November Nov +December Dec + +in d' + +no_iw sense iw +disambig Fleirtyding +template Mal + +date Dato +type type + +# MW specials +REDIRECT OMDIRIGER + +# Namespaces +User Brukar +User talk Brukardiskusjon +Image Fil +Portal Tema +Category Kategori +article space Hovud + +# Other +other other +rlm ‏ +exclude_lowercase ß + Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-12 07:22:30 UTC (rev 67) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-12 07:54:04 UTC (rev 68) @@ -324,9 +324,7 @@ my $page_counter; -my %statistics = ( - count_iw => [], -); +my %statistics = (count_iw => []); my %namespace_count; my %type_count; my %found_links; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-12 07:22:21
|
Revision: 67 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=67&view=rev Author: amire80 Date: 2008-08-12 07:22:30 +0000 (Tue, 12 Aug 2008) Log Message: ----------- Adding string files for fr + some cosmetics in the main script. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Added Paths: ----------- trunk/no-interwiki/fr.language_codes.txt trunk/no-interwiki/fr.strings.txt Added: trunk/no-interwiki/fr.language_codes.txt =================================================================== --- trunk/no-interwiki/fr.language_codes.txt (rev 0) +++ trunk/no-interwiki/fr.language_codes.txt 2008-08-12 07:22:30 UTC (rev 67) @@ -0,0 +1 @@ +link language_codes.txt \ No newline at end of file Property changes on: trunk/no-interwiki/fr.language_codes.txt ___________________________________________________________________ Added: svn:special + * Added: trunk/no-interwiki/fr.strings.txt =================================================================== --- trunk/no-interwiki/fr.strings.txt (rev 0) +++ trunk/no-interwiki/fr.strings.txt 2008-08-12 07:22:30 UTC (rev 67) @@ -0,0 +1,39 @@ +# months +January Jan +February Feb +March Mar +April Apr +May Mag +June Jun +July Jul +August Aug +September Sep +October Oct +November Nov +December Dec + +in d' + +no_iw sense iw +disambig Homonymie +template Modèle + +date date +type type + +# MW specials +REDIRECT REDIRECT + +# Namespaces +User Utilisateur +User talk Discussion Utilisateur +Image Image +Portal Portail +Category Catégorie +article space Principal + +# Other +other autre +rlm ‏ +exclude_lowercase ß + Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-11 13:29:30 UTC (rev 66) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-12 07:22:30 UTC (rev 67) @@ -50,7 +50,7 @@ ); #>>> our $VERSION = ($SVN_PROPS{Revision} =~ /\A\$Revision:\ (?<revision_num>\d+)\ \$\z/xms) - ? "0.1.$+{revision_num}" + ? "0.1.9.$+{revision_num}" : croak(q(Something is wrong with SVN revision number)); my %PATTERN; @@ -324,8 +324,9 @@ my $page_counter; -my %statistics; -$statistics{count_iw} = []; +my %statistics = ( + count_iw => [], +); my %namespace_count; my %type_count; my %found_links; @@ -1163,7 +1164,7 @@ sub print_multi_links_by_foreign { LANG_CODE: foreach my $lang_code (sort keys %found_links) { - my $filename = "$MULTI_DIR/$lang_code.txt"; + my $filename = "$MULTI_DIR/$lang_code.$WIKITEXT_EXT"; my @foreign_articles = sort keys %{ $found_links{$lang_code} }; FOREIGN_ARTICLE: foreach my $foreign_article (@foreign_articles) { @@ -1209,7 +1210,7 @@ } } - my $filename = "$MULTI_DIR/LOCAL.txt"; + my $filename = "$MULTI_DIR/LOCAL.$WIKITEXT_EXT"; foreach my $local_multi_article (sort keys %local_multi_links) { append_to_file($filename, '* ' . mw_bold(make_link($local_multi_article))); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-11 13:29:22
|
Revision: 66 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=66&view=rev Author: amire80 Date: 2008-08-11 13:29:30 +0000 (Mon, 11 Aug 2008) Log Message: ----------- Moved multi links to separate files. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-11 13:28:09 UTC (rev 65) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-11 13:29:30 UTC (rev 66) @@ -57,6 +57,7 @@ Readonly my $WIKITEXT_EXT => 'wiki.txt'; Readonly my $OUT_DIR => 'out'; Readonly my $UNSORTED_DIR => "$OUT_DIR/unsorted"; +Readonly my $MULTI_DIR => "$OUT_DIR/multilinks"; Readonly my $ALT_SEP => q{|}; Readonly my $FIELD_SEP => qq{\t}; Readonly my $LINK_SEP => q{|}; @@ -291,7 +292,8 @@ } # TODO: Make smarter, configurable, whatever -foreach my $out_dir ($OUT_DIR, $UNSORTED_DIR) { +# $OUT_DIR must be first, because it's the parent +foreach my $out_dir ($OUT_DIR, $UNSORTED_DIR, $MULTI_DIR) { if (-d $out_dir) { unlink glob "$out_dir/*$WIKITEXT_EXT"; } @@ -1137,7 +1139,7 @@ open my $file, '>>:utf8', $fn or croak(file_error('opening', $fn, 'appending')); - say {$file} $line; + say {$file} ($line // q{}); close $file or croak(file_error('closing', $fn, 'appeding')); @@ -1161,6 +1163,7 @@ sub print_multi_links_by_foreign { LANG_CODE: foreach my $lang_code (sort keys %found_links) { + my $filename = "$MULTI_DIR/$lang_code.txt"; my @foreign_articles = sort keys %{ $found_links{$lang_code} }; FOREIGN_ARTICLE: foreach my $foreign_article (@foreign_articles) { @@ -1174,7 +1177,7 @@ make_link($lang_code . $MW_SYNTAX{namespace_sep} . $foreign_article); - INFO("* '''$foreign_title''' - $links\n"); + append_to_file($filename, "* '''$foreign_title''' - $links"); } } } @@ -1186,8 +1189,6 @@ my %local_multi_links; LANG_CODE: foreach my $lang_code (sort keys %found_links) { - - # my @foreign_articles = map { make_link($lang_code . $MW_SYNTAX{namespace_sep} . $_) } sort keys %{ $found_links{$lang_code} }; my @foreign_articles = sort keys %{ $found_links{$lang_code} }; FOREIGN_ARTICLE: foreach my $foreign_article (@foreign_articles) { @@ -1208,19 +1209,23 @@ } } + my $filename = "$MULTI_DIR/LOCAL.txt"; foreach my $local_multi_article (sort keys %local_multi_links) { - INFO('* ' . mw_bold(make_link($local_multi_article))); + append_to_file($filename, + '* ' . mw_bold(make_link($local_multi_article))); foreach my $other_local_article ( sort keys %{ $local_multi_links{$local_multi_article} }) { - INFO('** ' . make_link($other_local_article)); + append_to_file($filename, + '** ' . make_link($other_local_article)); my $foreign_articles = join_links( $local_multi_links{$local_multi_article} ->{$other_local_article}, 0 ); - INFO("*** $foreign_articles"); + append_to_file($filename, "*** $foreign_articles"); } + append_to_file($filename); } return; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-11 13:28:00
|
Revision: 65 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=65&view=rev Author: amire80 Date: 2008-08-11 13:28:09 +0000 (Mon, 11 Aug 2008) Log Message: ----------- Adding strings file for oc - Occitan. It's not correct, but enough for the first run. Added Paths: ----------- trunk/no-interwiki/oc.strings.txt Added: trunk/no-interwiki/oc.strings.txt =================================================================== --- trunk/no-interwiki/oc.strings.txt (rev 0) +++ trunk/no-interwiki/oc.strings.txt 2008-08-11 13:28:09 UTC (rev 65) @@ -0,0 +1,39 @@ +# months +January Jan +February Feb +March Mar +April Apr +May Mag +June Jun +July Jul +August Aug +September Sep +October Oct +November Nov +December Dec + +in d' + +no_iw sense iw +disambig Omonimia +template Modèl + +date data +type tip + +# MW specials +REDIRECT REDIRECT + +# Namespaces +User Utilizaire +User talk Discussion Utilizaire +Image Imatge +Portal Portal +Category Categoria +article space Principal + +# Other +other altre +rlm ‏ +exclude_lowercase ß + This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-04 16:08:11
|
Revision: 64 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=64&view=rev Author: amire80 Date: 2008-08-04 16:08:14 +0000 (Mon, 04 Aug 2008) Log Message: ----------- Fixed a bug in template parser. Added percentage of iwless pages to stats. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-04 13:32:26 UTC (rev 63) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-04 16:08:14 UTC (rev 64) @@ -373,9 +373,7 @@ next PAGE if ($page_counter < $option{start_from}); - my $namespace = namespace($page); - $namespace_count{$namespace}++; - + my $namespace = namespace($page); my $page_title = $page->title(); # Skipping cases: @@ -384,11 +382,13 @@ is_redirect($page) or not is_in_namespace($page, @INCLUDE_NAMESPACES) + # TODO: Be more precise here. # Portal pages which have a '/' in their name are probably # internal and do not need interwiki links. or (is_in_namespace($page, 'Portal') and $page_title =~ m{/}xms) ); + $namespace_count{$namespace}++; INFO("\n* processing $page_counter - ", $page_title); my $page_text_ref = $page->text(); @@ -573,7 +573,8 @@ foreach my $next_filter (@{$filter}) { # N.B. - case-insensitive. Wrong, but kinda useful. - if ($next_match =~ /\A\Q$next_filter/xmsi) { + if ($next_match =~ /\A\Q$MW_SYNTAX{'start_tmpl'}$next_filter/xmsi) + { # N.B.: parse_template calls find_templates() recursively my $parsed_template = @@ -1273,8 +1274,14 @@ INFO('pages without interwiki links per namespace'); foreach my $namespace (keys %{ $statistics{'has no interwiki link'} }) { - INFO( - "$namespace: $statistics{'has no interwiki link'}->{$namespace}"); + my $iwless_in_namespace = + $statistics{'has no interwiki link'}->{$namespace}; + ## no critic ValuesAndExpressions::ProhibitMagicNumbers + no integer; + my $percentage = sprintf '%.2f', + 100 * $iwless_in_namespace / $namespace_count{$namespace}; + use integer; + INFO("$namespace: $iwless_in_namespace, $percentage%"); } INFO("\nNAMESPACES"); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-04 13:32:18
|
Revision: 63 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=63&view=rev Author: amire80 Date: 2008-08-04 13:32:26 +0000 (Mon, 04 Aug 2008) Log Message: ----------- Nicer diff. Modified Paths: -------------- trunk/no-interwiki/tidy.sh Modified: trunk/no-interwiki/tidy.sh =================================================================== --- trunk/no-interwiki/tidy.sh 2008-08-04 10:42:35 UTC (rev 62) +++ trunk/no-interwiki/tidy.sh 2008-08-04 13:32:26 UTC (rev 63) @@ -14,7 +14,7 @@ exit 1 fi -diff $FN.bak ${FN} +diff $FN ${FN}.bak if [ $? -eq 2 ]; then exit 1 fi This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-04 10:42:27
|
Revision: 62 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=62&view=rev Author: amire80 Date: 2008-08-04 10:42:35 +0000 (Mon, 04 Aug 2008) Log Message: ----------- Mostly cosmetic documentation update. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-04 10:35:32 UTC (rev 61) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-04 10:42:35 UTC (rev 62) @@ -197,7 +197,9 @@ \Q$STRING{no_iw}\E # The string may have spaces }xmsi; -# XXX HACK Until i get a better regex for matching balancing {{}} ... +# A simplistic template just for testing. +# Quite possibly it is not needed anymore. +# Until i get a better regex for matching balancing {{}} ... $PATTERN{template} = qr{ \A # beginning of string \Q$MW_SYNTAX{start_tmpl}\E # {{ @@ -396,7 +398,7 @@ (${$page_text_ref} =~ $PATTERN{simple_no_iw_check}); # Does the page have interwiki links? - # XXX Actually checks only for English + # BIG XXX Actually checks only for English my $has_iw = has_interwiki($page); if ($has_iw) { @@ -570,11 +572,8 @@ foreach my $next_filter (@{$filter}) { - # XXX Matches anywhere in the template. - # It probably should match the template name. - # Also - it is case-insensitive which is very wrong - # but kinda useful. - if ($next_match =~ /\Q$next_filter/xmsi) { + # N.B. - case-insensitive. Wrong, but kinda useful. + if ($next_match =~ /\A\Q$next_filter/xmsi) { # N.B.: parse_template calls find_templates() recursively my $parsed_template = @@ -769,7 +768,7 @@ } } - # XXX Still very stupid, but getting better + # BIG XXX Still very stupid, but getting better if (defined $iw_links{en}) { return 'en'; } @@ -942,7 +941,7 @@ sub write_sorted_pages { my ($type_name, $type_tree_ref) = @_; - my $type_fn = make_type_fn($type_name); # XXX + my $type_fn = make_type_fn($type_name); my $section_counter = 0; my $page = q{}; @@ -963,7 +962,9 @@ if ($section_counter == $option{max_sections_per_page}) { write_page(\$page, \$type_fn, $file_number++); $section_counter = 0; - undef $page; # XXX Trying to free memory + + # N.B. Trying to free memory, not guaranteed + undef $page; $page = q{}; } elsif ($section_counter) { @@ -985,7 +986,9 @@ [ @all_links_in_letter[ $first_link .. $last_link ] ]); $page .= $links; } - undef @all_links_in_letter; # XXX Trying to free memory + + # N.B. Trying to free memory, not guaranteed + undef @all_links_in_letter; } # The page may be empty at this point This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-04 10:35:23
|
Revision: 61 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=61&view=rev Author: amire80 Date: 2008-08-04 10:35:32 +0000 (Mon, 04 Aug 2008) Log Message: ----------- All language names. Modified Paths: -------------- trunk/no-interwiki/language_codes.txt Modified: trunk/no-interwiki/language_codes.txt =================================================================== --- trunk/no-interwiki/language_codes.txt 2008-08-04 10:20:39 UTC (rev 60) +++ trunk/no-interwiki/language_codes.txt 2008-08-04 10:35:32 UTC (rev 61) @@ -138,116 +138,116 @@ wuu Wuu bar Bavarian lad Ladino -gu +gu Gujarati fiu-vro Voro -gv +gv Manx pdc Pennsylvania German csb Kashubian mn Mongolian kw Cornish to Tongan haw Hawaii -gan +gan Gan km Khmer ps Pashto -ang -ie +ang Anglo-Saxon +ie Interlingue tk Turkmen ln Lingala -gn -bcl -tpi -si -wo -crh -ty -srn -zea -sc -cbk-zam -jbo -ay -ky -eml -myv -szl -ig -my -mg -or -stq -kg -glk -arc -rmy -pap -kab -so -ba -ks -sah -mzn -ce -lo -pa -udm -tet -hak -cu -hif -sd -ext -iu -kaa -na -got -dsb -bo -sm -bm -cdo -chr -om -ee -ug -as -ti -av -zu -mdf -kv -nv -ss -pih -cr -ts -ve -ch -bi -xh -rw -dz -tn -kl -ik -bug -bxr -xal -ny -st -tw -ak -ab -fj -ha -ff -lbe -ki -za -lg -sn -tum -sg -rn -chy -ng +gn Guarani +bcl Central Bicolano +tpi Tok Pisin +si Sinhalese +wo Wolof +crh Crimean Tatar +ty Tahitian +srn Sranan +zea Zealandic +sc Sardinian +cbk-zam Zamboanga Chavacano +jbo Lojban +ay Aymara +ky Kirghiz +eml Emilian-Romagnol +myv Erzya +szl Silesian +ig Igbo +my Burmese +mg Malagasy +or Oriya +stq Saterland Frisian +kg Kongo +glk Gilaki +arc Assyrian Neo-Aramaic +rmy Romani +pap Papiamentu +kab Kabyle +so Somali +ba Bashkir +ks Kashmiri +sah Sakha +mzn Mazandarani +ce Chechen +lo Lao +pa Punjabi +udm Udmurt +tet Tetum +hak Hakka +cu Old Chirch Slavonic +hif Fiji Hindi +sd Sindhi +ext Extremaduran +iu Inuktitut +kaa Karakalpak +na Nauruan +got Gothic +dsb Lower Sorbian +bo Tibetan +sm Samoan +bm Bambara +cdo Mindong +chr Cherokee +om Oromo +ee Ewe +ug Uyghur +as Assamese +ti Tigrinya +av Avar +zu Zulu +mdf Moksha +kv Komi +nv Navajo +ss Swati +pih Norfolk +cr Cree +ts Tsonga +ve Venda +ch Chamorro +bi Bislama +xh Xhosa +rw Kinyarwanda +dz Dzongkha +tn Tswana +kl Greenlandic +ik Inupiak +bug Biginese +bxr Buryat +xal Kalmyk +ny Chichewa +st Sesotho +tw Twi +ak Akan +ab Abkhazian +fj Fijian +ha Hausa +ff Fula +lbe Lak +ki Kikuyu +za Zhuang +lg Luganda +sn Shona +tum Tumbuka +sg Sango +rn Kirundi +chy Cheyenne +ng Ndonga This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-04 10:20:33
|
Revision: 60 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=60&view=rev Author: amire80 Date: 2008-08-04 10:20:39 +0000 (Mon, 04 Aug 2008) Log Message: ----------- Updated language list, thanks to rotemliss. Now properly reading only the needed languages. Modified Paths: -------------- trunk/no-interwiki/eo.language_codes.txt trunk/no-interwiki/he.language_codes.txt trunk/no-interwiki/oc.language_codes.txt trunk/no-interwiki/prepare_noiw_list.pl Added Paths: ----------- trunk/no-interwiki/language_codes.txt Property Changed: ---------------- trunk/no-interwiki/he.language_codes.txt Modified: trunk/no-interwiki/eo.language_codes.txt =================================================================== --- trunk/no-interwiki/eo.language_codes.txt 2008-08-03 14:38:23 UTC (rev 59) +++ trunk/no-interwiki/eo.language_codes.txt 2008-08-04 10:20:39 UTC (rev 60) @@ -1 +1 @@ -link he.language_codes.txt \ No newline at end of file +link language_codes.txt \ No newline at end of file Modified: trunk/no-interwiki/he.language_codes.txt =================================================================== --- trunk/no-interwiki/he.language_codes.txt 2008-08-03 14:38:23 UTC (rev 59) +++ trunk/no-interwiki/he.language_codes.txt 2008-08-04 10:20:39 UTC (rev 60) @@ -1,150 +1 @@ -en English -de German -fr French -pl Polish -ja Japanese -it Italian -ru Russian -nl Dutch -pt Portuguese -es Spanish -sv Swedish -ru Russian -zh Chinese -no Norwegian Bokmal -fi Finnish -vo Volapuk -ca Catalan -ro Romanian -tr Turkish -uk Ukrainian -eo Esperanto -cs Czech -hu Hungarian -sk Slovak -da Danish -id Indonesian -he Hebrew -lt Lithuanian -sr Serbian -sl Slovenian -ko Korean -ar Arabic -bg Bulgarian -et Estonian -hr Croatian -new Newari -te Telugu -vi Vietnamese -nn Norwegian Nynorsk -th Thai -fa Persian -ga Galician -ceb Cebuano -el Greek -ms Malay -simple Simple English -eu Basque -bpy Bishnupriya Manipuri -bs Bosnian -lb Luxembourgish -is Icelandic -ka Georgian -sq Albanian -la Latin -br Breton -hi Hindi -az Azeri -bn Bengali -mk Macedonian -mr Marathi -sh Serbocroatian -tl Tagalog -cy Welsh -io Ido -pms Piedmontese -lv Latvian -su Sundanese -ta Tamil -jv Javanese -nap Neapolitan -oc Occitan -nds Low German -scn Sicilian -ast Asturian -ku Kurdish -be Belarusian (modern) -be-x-old Belarusian (tarashkevitsa) -tg Tajik -an Aragonese -ksh Ripuarian -fy Frisian -vec Venetian -roa-tara Tarantino -cv Chuvash -zh-yue Cantonese -ur Urdu -qu Quechua -sw Swahili -uz Uzbek -bat-smg Samogitian -ga Irish Gaelic -mi Maori -ml Malayalam -gd Scottish Gaelic -yo Yoruba -co Corsican -kn Kannada -pam Kapampangan -yi Yiddish -hsb Upper Sorbian -nah Nahuatl -ia Interlingua -li Limburg -sa Sanskrit -hy Armenian -als Alemannic -tt Tatar -roa-rup Aromanian -map-bms Banyumasan -pag Pangasinan -am Amharic -zh-min-nan Min Nan -nrm Norman -wuu Wuu -fo Faroese -vls West Flemish -lmo Lombard -nds-nl Dutch Low Saxon -se Northern Sami -rm Romansh -ne Nepali -war Waray-Waray -fur Friulian -lij Ligurian -nov Novial -sco Scots -bh Bihari -dv Divehi -pi Pali -diq Zazaki -ilo Ilokano -kk Kazakh -os Ossetian -zh-classical Classical Chinese -frp Franco Provencal -mt Maltese -lad Ladino -fiu-vro Voro -pdc Pennsylvania German -csb Kashubian -kw Cornish -bar Bavarian -to Tongan -haw Hawaii -mn Mongolian -ps Pashto -km Khmer -gv Manx -tk Turkmen -ln Lingala +link language_codes.txt \ No newline at end of file Property changes on: trunk/no-interwiki/he.language_codes.txt ___________________________________________________________________ Added: svn:special + 1 Added: trunk/no-interwiki/language_codes.txt =================================================================== --- trunk/no-interwiki/language_codes.txt (rev 0) +++ trunk/no-interwiki/language_codes.txt 2008-08-04 10:20:39 UTC (rev 60) @@ -0,0 +1,253 @@ +en English +de German +fr French +pl Polish +ja Japanese +it Italian +nl Dutch +pt Portuguese +es Spanish +ru Russian +sv Swedish +zh Chinese +no Norwegian Bokmal +fi Finnish +ca Catalan +uk Ukrainian +vo Volapuk +ro Romanian +tr Turkish +cs Czech +eo Esperanto +hu Hungarian +sk Slovak +da Danish +id Indonesian +he Hebrew +ko Korean +lt Lithuanian +ar Arabic +sr Serbian +sl Slovenian +bg Bulgarian +et Estonian +hr Croatian +vi Vietnamese +new Newari +fa Persian +te Telugu +nn Norwegian Nynorsk +gl Galician +th Thai +el Greek +ceb Cebuano +simple Simple English +ms Malay +eu Basque +ht Haitian +bs Bosnian +lb Luxembourgish +bpy Bishnupriya Manipuri +ka Georgian +is Icelandic +la Latin +sq Albanian +hi Hindi +br Breton +az Azeri +mr Marathi +mk Macedonian +sh Serbocroatian +tl Tagalog +bn Bengali +cy Welsh +lv Latvian +pms Piedmontese +io Ido +ta Tamil +oc Occitan +su Sundanese +jv Javanese +be Belarusian (modern) +nap Neapolitan +nds Low German +scn Sicilian +be-x-old Belarusian (tarashkevitsa) +ku Kurdish +ast Asturian +wa Walloon +af Afrikaans +an Aragonese +ksh Ripuarian +fy Frisian +tg Tajik +zh-yue Cantonese +cv Chuvash +ur Urdu +roa-tara Tarantino +vec Venetian +qu Quechua +sw Swahili +bat-smg Samogitian +ml Malayalam +ga Irish Gaelic +uz Uzbek +gd Scottish Gaelic +mi Maori +yo Yoruba +kn Kannada +pam Kapampangan +co Corsican +yi Yiddish +hsb Upper Sorbian +nah Nahuatl +ia Interlingua +li Limburg +als Alemannic +hy Armenian +sa Sanskrit +tt Tatar +roa-rup Aromanian +am Amharic +fo Faroese +zh-min-nan Min Nan +pag Pangasinan +map-bms Banyumasan +nds-nl Dutch Low Saxon +nrm Norman +lmo Lombard +vls West Flemish +rm Romansh +diq Zazaki +se Northern Sami +ne Nepali +fur Friulian +dv Divehi +war Waray-Waray +kk Kazakh +lij Ligurian +sco Scots +nov Novial +bh Bihari +pi Pali +ilo Ilokano +mt Maltese +zh-classical Classical Chinese +os Ossetian +frp Franco Provencal +wuu Wuu +bar Bavarian +lad Ladino +gu +fiu-vro Voro +gv +pdc Pennsylvania German +csb Kashubian +mn Mongolian +kw Cornish +to Tongan +haw Hawaii +gan +km Khmer +ps Pashto +ang +ie +tk Turkmen +ln Lingala +gn +bcl +tpi +si +wo +crh +ty +srn +zea +sc +cbk-zam +jbo +ay +ky +eml +myv +szl +ig +my +mg +or +stq +kg +glk +arc +rmy +pap +kab +so +ba +ks +sah +mzn +ce +lo +pa +udm +tet +hak +cu +hif +sd +ext +iu +kaa +na +got +dsb +bo +sm +bm +cdo +chr +om +ee +ug +as +ti +av +zu +mdf +kv +nv +ss +pih +cr +ts +ve +ch +bi +xh +rw +dz +tn +kl +ik +bug +bxr +xal +ny +st +tw +ak +ab +fj +ha +ff +lbe +ki +za +lg +sn +tum +sg +rn +chy +ng Modified: trunk/no-interwiki/oc.language_codes.txt =================================================================== --- trunk/no-interwiki/oc.language_codes.txt 2008-08-03 14:38:23 UTC (rev 59) +++ trunk/no-interwiki/oc.language_codes.txt 2008-08-04 10:20:39 UTC (rev 60) @@ -1 +1 @@ -link he.language_codes.txt \ No newline at end of file +link language_codes.txt \ No newline at end of file Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-03 14:38:23 UTC (rev 59) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-04 10:20:39 UTC (rev 60) @@ -257,20 +257,18 @@ while (my $line = <$lang_code_file>) { chomp $line; my ($code, $name) = split /\t/xms, $line; - $LANG_CODE{$code} = $name; + $LANG_CODE{$code} = $name // $code; } + close $lang_code_file or croak(file_error('closing', $LANG_CODE_FN, 'reading')); Readonly my $ALT_LANGS => join $ALT_SEP, keys %LANG_CODE; -# XXX Should use ALT_LANGS, but an efficient way is needed to update -# lang codes list, so in the meantime it is loose. $PATTERN{interwiki_link} = qr{ \Q$MW_SYNTAX{start_link}\E (?<lang_code> -# $ALT_LANGS - [a-zA-Z-]+ + $ALT_LANGS ) : (?<foreign_article> @@ -344,7 +342,7 @@ say 'looking for multi links'; my $begin_multi_links_time = time; -# print_multi_links_by_foreign(); +print_multi_links_by_foreign(); print_multi_links_by_local(); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-03 14:38:13
|
Revision: 59 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=59&view=rev Author: amire80 Date: 2008-08-03 14:38:23 +0000 (Sun, 03 Aug 2008) Log Message: ----------- Added print_multi_links_by_local(). Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-08-03 13:30:21 UTC (rev 58) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-03 14:38:23 UTC (rev 59) @@ -337,9 +337,17 @@ create_no_iw_pages(); INFO(q{}); + +# my @found_lang_codes = sort keys %found_links; +# INFO("found lang_codes: @found_lang_codes"); + say 'looking for multi links'; my $begin_multi_links_time = time; -print_multi_links_by_foreign(); + +# print_multi_links_by_foreign(); + +print_multi_links_by_local(); + my $total_multi_links_time = time - $begin_multi_links_time; say "total multi links time: $total_multi_links_time"; @@ -1151,14 +1159,20 @@ sub print_multi_links_by_foreign { LANG_CODE: foreach my $lang_code (sort keys %found_links) { - my $lang_fn = "$lang_code.multi_links.txt"; my @foreign_articles = sort keys %{ $found_links{$lang_code} }; FOREIGN_ARTICLE: foreach my $foreign_article (@foreign_articles) { my @local_articles = keys %{ $found_links{$lang_code}->{$foreign_article} }; if (scalar @local_articles > 1) { - handle_multi_link($lang_code, $foreign_article); + my $links = join q{ | }, sort map { make_link($_) } + keys %{ $found_links{$lang_code}->{$foreign_article} }; + + my $foreign_title = + make_link($lang_code + . $MW_SYNTAX{namespace_sep} + . $foreign_article); + INFO("* '''$foreign_title''' - $links\n"); } } } @@ -1166,17 +1180,68 @@ return; } -sub handle_multi_link { - my ($lang_code, $foreign_article) = @_; - my $links = join q{ | }, sort map { make_link($_) } - keys %{ $found_links{$lang_code}->{$foreign_article} }; +sub print_multi_links_by_local { + my %local_multi_links; + LANG_CODE: + foreach my $lang_code (sort keys %found_links) { - my $foreign_title = - make_link($lang_code . $MW_SYNTAX{namespace_sep} . $foreign_article); - INFO("* '''$foreign_title''' - $links\n"); + # my @foreign_articles = map { make_link($lang_code . $MW_SYNTAX{namespace_sep} . $_) } sort keys %{ $found_links{$lang_code} }; + my @foreign_articles = sort keys %{ $found_links{$lang_code} }; + FOREIGN_ARTICLE: + foreach my $foreign_article (@foreign_articles) { + my @local_articles = + keys %{ $found_links{$lang_code}->{$foreign_article} }; + + if (scalar @local_articles > 1) { + add_local_multi( + \%local_multi_links, + make_link( + $lang_code + . $MW_SYNTAX{namespace_sep} + . $foreign_article + ), + @local_articles + ); + } + } + } + + foreach my $local_multi_article (sort keys %local_multi_links) { + INFO('* ' . mw_bold(make_link($local_multi_article))); + foreach my $other_local_article ( + sort keys %{ $local_multi_links{$local_multi_article} }) + { + INFO('** ' . make_link($other_local_article)); + my $foreign_articles = join_links( + $local_multi_links{$local_multi_article} + ->{$other_local_article}, + 0 + ); + INFO("*** $foreign_articles"); + } + } + return; } +sub add_local_multi { + my ( + $local_multi_links_ref, $foreign_link, + $first_local_article, @other_local_articles + ) = @_; + + $local_multi_links_ref->{$first_local_article} //= {}; + + foreach my $other_local_article (@other_local_articles) { + $local_multi_links_ref->{$first_local_article} + ->{$other_local_article} //= []; + push @{ $local_multi_links_ref->{$first_local_article} + ->{$other_local_article} }, $foreign_link; + } + + return; +} + sub join_links { my ($links_ref, $line_end) = @_; $line_end //= 1; # / This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-08-03 13:30:11
|
Revision: 58 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=58&view=rev Author: amire80 Date: 2008-08-03 13:30:21 +0000 (Sun, 03 Aug 2008) Log Message: ----------- Added mw_bold(); renamed create_multi_links_pages() to print_multi_links_by_foreign() Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 15:45:53 UTC (rev 57) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-08-03 13:30:21 UTC (rev 58) @@ -339,7 +339,7 @@ INFO(q{}); say 'looking for multi links'; my $begin_multi_links_time = time; -create_multi_links_pages(); +print_multi_links_by_foreign(); my $total_multi_links_time = time - $begin_multi_links_time; say "total multi links time: $total_multi_links_time"; @@ -1018,6 +1018,11 @@ return "$level_marker $text $level_marker\n"; } +sub mw_bold { + my ($text) = @_; + return "'''$text'''"; +} + # Custom Unicode character property for finding characters. # The custom is to give those subroutines CamelCase names. sub IsLeftToRight { ## no critic NamingConventions::ProhibitMixedCaseSubs @@ -1143,7 +1148,7 @@ return $string; } -sub create_multi_links_pages { +sub print_multi_links_by_foreign { LANG_CODE: foreach my $lang_code (sort keys %found_links) { my $lang_fn = "$lang_code.multi_links.txt"; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-31 15:45:47
|
Revision: 57 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=57&view=rev Author: amire80 Date: 2008-07-31 15:45:53 +0000 (Thu, 31 Jul 2008) Log Message: ----------- Exclude portal subpages. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 11:16:45 UTC (rev 56) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 15:45:53 UTC (rev 57) @@ -368,13 +368,21 @@ my $namespace = namespace($page); $namespace_count{$namespace}++; + my $page_title = $page->title(); + # Skipping cases: next PAGE - if (is_redirect($page) - or not is_in_namespace($page, @INCLUDE_NAMESPACES)); + if ( + is_redirect($page) + or not is_in_namespace($page, @INCLUDE_NAMESPACES) - INFO("\n* processing $page_counter - ", $page->title()); + # Portal pages which have a '/' in their name are probably + # internal and do not need interwiki links. + or (is_in_namespace($page, 'Portal') and $page_title =~ m{/}xms) + ); + INFO("\n* processing $page_counter - ", $page_title); + my $page_text_ref = $page->text(); # A simple sanity check: is the no_iw template anywhere around here? This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-31 11:16:37
|
Revision: 56 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=56&view=rev Author: amire80 Date: 2008-07-31 11:16:45 +0000 (Thu, 31 Jul 2008) Log Message: ----------- Improved no-interwiki template regex and got rid of a lot of false positives. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:57:47 UTC (rev 55) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 11:16:45 UTC (rev 56) @@ -224,7 +224,7 @@ }xmsi; $PATTERN{ltr_char} = qr/\P{IsLeftToRight}/xms; -$PATTERN{true_template} = qr{$RE{balanced}{-parens=>'{}'}}xms; # XXX very bad +$PATTERN{true_template} = qr/\{ $RE{balanced}{-parens=>'{}'} \}/xms; $PATTERN{section_link} = qr{(?<!&)\#}xms; $PATTERN{lowercase_link} = qr{\A[[:lower:]]}xms; @@ -236,7 +236,7 @@ (?: _ \d*)? \.$WIKITEXT_EXT }xms; -$PATTERN{invalid_filename_char} = qr{[\\\n/:*?"<>|]}xms; # " +$PATTERN{invalid_filename_char} = qr{[\\\n/:*?"<>|]}xms; # " # TODO: Check whether it is Neapolitan with its '' $PATTERN{character_code_in_link} = qr{ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-31 10:57:39
|
Revision: 55 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=55&view=rev Author: amire80 Date: 2008-07-31 10:57:47 +0000 (Thu, 31 Jul 2008) Log Message: ----------- Simple support for portals. Modified Paths: -------------- trunk/no-interwiki/eo.strings.txt trunk/no-interwiki/he.strings.txt trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/eo.strings.txt =================================================================== --- trunk/no-interwiki/eo.strings.txt 2008-07-31 10:35:08 UTC (rev 54) +++ trunk/no-interwiki/eo.strings.txt 2008-07-31 10:57:47 UTC (rev 55) @@ -15,7 +15,6 @@ in no_iw no estas interviki -category kategorio disambig apartigilo template Sxablono @@ -29,6 +28,8 @@ User Vikipediisto User talk Vikipediista diskuto Image Dosiero +Portal Portalo +Category kategorio article space (nomspace de artikoloj) # Other Modified: trunk/no-interwiki/he.strings.txt =================================================================== --- trunk/no-interwiki/he.strings.txt 2008-07-31 10:35:08 UTC (rev 54) +++ trunk/no-interwiki/he.strings.txt 2008-07-31 10:57:47 UTC (rev 55) @@ -15,7 +15,6 @@ in ב no_iw אין בינוויקי -category קטגוריה disambig פירושונים template תבנית @@ -29,6 +28,8 @@ User משתמש User talk שיחת משתמש Image תמונה +Portal פורטל +Category קטגוריה article space (מרחב ערכים) # Other Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:35:08 UTC (rev 54) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:57:47 UTC (rev 55) @@ -100,6 +100,7 @@ croak('Invalid command line options.'); } +# XXX Too coupled to Wikipedia, won't work for other projects. $PATTERN{dump_fn} = qr{ \A # Begin string (?<wiki_lang>\w+) # Lang code @@ -146,7 +147,7 @@ # This monstrosity basically says: | and optional spaces $PATTERN{param_sep} = qr{\s*\Q$MW_SYNTAX{param_sep}\E\s*}xms; -Readonly my @INCLUDE_NAMESPACES => ('article space', 'category',); +Readonly my @INCLUDE_NAMESPACES => ('article space', 'Category', 'Portal'); # # Constants for date processing @@ -498,10 +499,16 @@ my $page_title = $page->title(); if (is_category($page)) { INFO("$page_title is a category"); - push @all_types, get_string('category'); + push @all_types, get_string('Category'); $statistics{'categories'}++; } + if (is_in_namespace($page, 'Portal')) { + INFO("$page_title is a portal"); + push @all_types, get_string('Portal'); + $statistics{'portal'}++; + } + if (is_disambig($page)) { INFO("$page_title is a disambiguation"); push @all_types, get_string('disambig'); @@ -1037,7 +1044,7 @@ sub is_category { my ($page) = @_; - return is_in_namespace($page, 'category'); + return is_in_namespace($page, 'Category'); } sub is_disambig { @@ -1115,8 +1122,8 @@ return; } -# It appears simple, but non-alphabetic languages such as Chinese it must be -# different, so it will sit here ready for better i18n. +# It appears simple, but in non-alphabetic languages such as Chinese +# it may be different, so it will sit here ready for better i18n. sub get_sort_letter { my ($string) = @_; return substr $string, 0, 1; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-31 10:34:59
|
Revision: 54 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=54&view=rev Author: amire80 Date: 2008-07-31 10:35:08 +0000 (Thu, 31 Jul 2008) Log Message: ----------- Partial refactoring of $STRING/get_string() Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:09:44 UTC (rev 53) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:35:08 UTC (rev 54) @@ -173,6 +173,7 @@ (?<min>\d{2}),\s # minute (?<mday>\d{1,2})\s # day of month $STRING{in}? # This preposition appears sometimes + # It should have been get_string() (?<mon>$ALT_MONTHS)\s # A name of a month (?<year>\d+?)\s # Year \([A-Z]{3}\) # Three letters in brackets - timezone @@ -190,8 +191,9 @@ (?<value>.*) # value }xms; +# XXX It should use get_string() $PATTERN{simple_no_iw_check} = qr{ - \Q$STRING{no_iw}\E # the string may have spaces + \Q$STRING{no_iw}\E # The string may have spaces }xmsi; # XXX HACK Until i get a better regex for matching balancing {{}} ... @@ -216,6 +218,7 @@ \A # Beginning of string (page) \# # a # character $STRING{REDIRECT} # Redirect keyword in local language + # XXX It should use get_string() \s*:?\s*\[\[([^\]]*)\]\] # the link after the redirect }xmsi; @@ -223,6 +226,8 @@ $PATTERN{true_template} = qr{$RE{balanced}{-parens=>'{}'}}xms; # XXX very bad $PATTERN{section_link} = qr{(?<!&)\#}xms; $PATTERN{lowercase_link} = qr{\A[[:lower:]]}xms; + +# XXX get_string() cannot be used here if ($STRING{exclude_lowercase}) { $PATTERN{exclude_lowercase} = qr{\A[$STRING{exclude_lowercase}]}xms; } @@ -341,7 +346,7 @@ sub namespace { my ($page) = @_; - return $page->namespace() || $STRING{'article space'}; + return $page->namespace() || get_string('article space'); } sub find_iwless { @@ -710,6 +715,8 @@ and ($foreign_article =~ $PATTERN{lowercase_link})) { my $include_lowercase_link = 1; + + # XXX get_string() cannot be used here if (defined $STRING{exclude_lowercase} and $foreign_article =~ $PATTERN{exclude_lowercase}) { @@ -873,6 +880,8 @@ if ($option{rtl}) { if ($page_title =~ $PATTERN{ltr_char}) { + + # XXX get_string() cannot be used here $link_to_page = $STRING{rlm} . $link_to_page . $STRING{rlm}; } } @@ -1422,8 +1431,8 @@ Unicode-friendly. This program was also tested on Windows XP and Vista with ActivePerl 5.10 -and Cygwin Perl 5.10. In these Unicode-related issues caused filenames -and clipboard text to become jumbled. You have been warned. +and Cygwin Perl 5.10. In these environments Unicode-related issues caused +filenames and clipboard text to become jumbled. You have been warned. =head1 BUGS AND LIMITATIONS This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-31 10:09:36
|
Revision: 53 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=53&view=rev Author: amire80 Date: 2008-07-31 10:09:44 +0000 (Thu, 31 Jul 2008) Log Message: ----------- POD updates. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:08:34 UTC (rev 52) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:09:44 UTC (rev 53) @@ -1217,13 +1217,15 @@ =item * C<prepare_noiw_list.pl --rtl ./big-files/hewiki-20080420-pages-meta-current.xml> +=item * C<prepare_noiw_list.pl --stop_after=20000 ./big-files/hewiki-20080420-pages-meta-current.xml> + =back =head1 REQUIRED ARGUMENTS =over -=item * MediaWiki dump file name is obligatory. +=item * MediaWiki dump file name is required =back @@ -1248,11 +1250,14 @@ =item * --max_sections_per_page Maximum number of sections per output page. Default is 20. +=item * --max_iw_places Number of places to print in the statistics of +pages with the most interlanguage links. + =back =head1 DESCRIPTION -The main goal of this searching is to find pages which do not have +The main goal of this program is to find pages which do not have interwiki (interlanguage) links to certain languages. This program scans a MediaWiki XML dump file. It searches every page for @@ -1261,22 +1266,22 @@ =over -=item * If the page contains links to the defined languages and contains -no "no interwiki" template, its processing stops. +=item * If the page contains links to the defined languages and does not +contain the "no interwiki" template, its processing stops. =item * If the page contains links to the defined languages and contains this template, it is logged, so the template can be removed. (It is planned that it will be removed automatically in the future.) -=item * If the page contains no links to the defined languagesm but no -template, it is automatically added to type "other". +=item * If the page contains no links to the defined languages and does not +comtain the template, it is automatically added to type "other". =item * If the page contains no links to the defined languages and a template with types, it is added to the defined types. =back -Pages without links are added to nicely formatted lists +Pages without links are added to nicely formatted lists according to their type. This program also collects some information on the way about problematic @@ -1304,7 +1309,8 @@ =head2 unable to handle any case setting besides 'first-letter' -Something is weird with the dump. +Something is weird with the dump. See the documentation of +L<Parse::MediaWikiDump> and MediaWiki. =head2 A page has no pure title @@ -1315,7 +1321,7 @@ STRING is supposed to be a parameter in a template, but it does not look like one. It could be an error in the template, and also a bug in this program -(the parser that this program employs is rather limited). +(the parser that this program employs is rather stupid). =head2 Unicode character 0xNUMBER is illegal @@ -1326,7 +1332,8 @@ supposed to be in the page and should be fixed, but otherwise this issue is not supposed to affect the functionality of this program significantly. -This was reported as a MediaWiki bug: L<https://bugzilla.wikimedia.org/show_bug.cgi?id=14600> +This was reported as a MediaWiki bug: +L<https://bugzilla.wikimedia.org/show_bug.cgi?id=14600> =head1 EXIT STATUS @@ -1353,7 +1360,7 @@ =head1 DEPENDENCIES -This module depends on these CPAN modules: +This module requires these CPAN modules: =over @@ -1374,28 +1381,35 @@ This module is used for transliterating filenames to ASCII. +=item * C<Readonly> + +To make Perl::Critic happy :) + =back =head1 HACKING =head2 Perl 5.10 -This program needs Perl 5.10. It has clean, new and useful syntax, which +This program requires Perl 5.10. It has new clean and useful syntax, which makes the programs easier to hack, maintain and debug. It is useless to try and run it on an older version, unless you want to waste your time backporting. Please upgrade your Perl installation if you still have 5.8 or -something older. +(horrors!) something older. -=head2 Perl Best Practices and Perl::Critic +=head2 Perl Best Practices, Perl::Critic and perltidy Great effort has been put into making this source code pass as cleanly as -possible the Perl::Critic tests in the 'brutal' mode. If you modify it, do -yourself a favor, install Perl::Critic and regularly test it using this command: +possible the Perl::Critic tests in the 'brutal' mode. It also uses perltidy +for automatic code formatting. If you modify it, do yourself a favor, install +Perl::Critic and regularly test it using this command: -perlcritic -brutal prepare_noiw_list.pl +./tidy.sh -All places where P::C has been disabled using "# no critic" are explained. +It checks the syntax, runs perltidy on the code and runs Perl::Critic. +All the places where P::C has been disabled using "# no critic" are explained. + The time invested in making the code P::C-friendly will be returned as time saved on debugging. Also consider reading the book "Perl Best Practices" by Damian Conway if you have not already. @@ -1407,9 +1421,9 @@ This program works best on GNU/Linux, where Perl and the filesystem are Unicode-friendly. -This program was tested on Windows with ActivePerl 5.10 and Cygwin Perl 5.10. -In both cases Unicode-related issues cause filenames and clipboard text -to become jumbled. +This program was also tested on Windows XP and Vista with ActivePerl 5.10 +and Cygwin Perl 5.10. In these Unicode-related issues caused filenames +and clipboard text to become jumbled. You have been warned. =head1 BUGS AND LIMITATIONS This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-31 10:08:24
|
Revision: 52 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=52&view=rev Author: amire80 Date: 2008-07-31 10:08:34 +0000 (Thu, 31 Jul 2008) Log Message: ----------- Refactored date handling + creating a list of pages with an invalid date. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 18:08:45 UTC (rev 51) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-31 10:08:34 UTC (rev 52) @@ -385,7 +385,7 @@ INFO('has template no_iw. trying to remove ...'); remove_tmpl_no_iw($page_text_ref); $statistics{'has both valid interwiki and template'}++; - special_cases_file('outdated_template', {}, $page); + special_cases_file('outdated_template', $page); } } else { # does not have iw @@ -432,7 +432,7 @@ if ($found_templates_count > 1) { WARN('many templates were found'); $statistics{'many templates'}++; - special_cases_file('many_templates', {}, $page); + special_cases_file('many_templates', $page); } else { INFO('good, found one template'); @@ -445,20 +445,30 @@ } if (defined $template) { + INFO('has template no_iw'); my $date_str = $template->{params}->{date}; - - INFO('has template no_iw. checking cooling date ... '); - if (not defined $date_str - or cooling_date_passed($date_str)) - { - INFO('cooling date passed, updating to today ...'); - update_cooling_date($page_text_ref); - $statistics{'cooling date passed'}++; + if (defined $date_str) { + INFO('checking cooling date'); + my $date_ref = parse_date($date_str); + if (not defined $date_ref) { + INFO("invalid date: '$date_str'"); + $statistics{'invalid date'}++; + special_cases_file('invalid_date', $page); + } + elsif (cooling_date_passed($date_ref)) { + INFO('cooling date passed, updating to today ...'); + update_cooling_date($page_text_ref); + $statistics{'cooling date passed'}++; + } + else { + INFO(q(cooling date did not pass.)); + $statistics{q(cooling date did not pass)}++; + } } else { - INFO(q(cooling date did not pass.)); - $statistics{q(cooling date did not pass)}++; + INFO('date not defined'); } + } my @all_types = get_all_types($template->{params}->{type}, $page); @@ -635,9 +645,6 @@ } return \%parsed_date; } - else { - INFO("invalid date: $date_str"); - } # Returns undef for an invalid date return; @@ -729,8 +736,8 @@ for my $special_case_name (keys %special_cases) { if (scalar %{ $special_cases{$special_case_name} }) { - special_cases_file($special_case_name, - $special_cases{$special_case_name}, $page); + special_cases_file($special_case_name, $page, + $special_cases{$special_case_name}); } } @@ -743,7 +750,8 @@ } sub special_cases_file { - my ($special_case_name, $special_cases_ref, $page) = @_; + my ($special_case_name, $page, $special_cases_ref) = @_; + $special_cases_ref //= {}; # / my $special_case_langs = join q{, }, sort keys %{$special_cases_ref}; if ($special_case_langs) { $special_case_langs = " ($special_case_langs)"; @@ -773,15 +781,8 @@ } sub cooling_date_passed { - my ($date_string) = @_; + my ($date_ref) = @_; - # $date is a hash ref - my $date_ref = parse_date($date_string); - if (not defined $date_ref) { - INFO('in cooling_date_passed invalid date'); - return 1; - } - my @page_times = @{$date_ref}{qw(sec min hour mday mon year)}; INFO("page times: @page_times"); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-30 18:08:37
|
Revision: 51 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=51&view=rev Author: amire80 Date: 2008-07-30 18:08:45 +0000 (Wed, 30 Jul 2008) Log Message: ----------- Count pages without interwiki per namespace. Modified Paths: -------------- trunk/no-interwiki/eo.strings.txt trunk/no-interwiki/he.strings.txt trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/eo.strings.txt =================================================================== --- trunk/no-interwiki/eo.strings.txt 2008-07-30 17:48:29 UTC (rev 50) +++ trunk/no-interwiki/eo.strings.txt 2008-07-30 18:08:45 UTC (rev 51) @@ -29,6 +29,7 @@ User Vikipediisto User talk Vikipediista diskuto Image Dosiero +article space (nomspace de artikoloj) # Other other alia Modified: trunk/no-interwiki/he.strings.txt =================================================================== --- trunk/no-interwiki/he.strings.txt 2008-07-30 17:48:29 UTC (rev 50) +++ trunk/no-interwiki/he.strings.txt 2008-07-30 18:08:45 UTC (rev 51) @@ -29,6 +29,7 @@ User משתמש User talk שיחת משתמש Image תמונה +article space (מרחב ערכים) # Other other אחר Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 17:48:29 UTC (rev 50) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 18:08:45 UTC (rev 51) @@ -146,10 +146,7 @@ # This monstrosity basically says: | and optional spaces $PATTERN{param_sep} = qr{\s*\Q$MW_SYNTAX{param_sep}\E\s*}xms; -Readonly my @INCLUDE_NAMESPACES => ( - q{}, # Empty is a specific case - 'category', -); +Readonly my @INCLUDE_NAMESPACES => ('article space', 'category',); # # Constants for date processing @@ -342,6 +339,11 @@ exit; +sub namespace { + my ($page) = @_; + return $page->namespace() || $STRING{'article space'}; +} + sub find_iwless { PAGE: while (my $page = $dump->page()) { @@ -357,7 +359,7 @@ next PAGE if ($page_counter < $option{start_from}); - my $namespace = $page->namespace() || 'main'; + my $namespace = namespace($page); $namespace_count{$namespace}++; # Skipping cases: @@ -402,7 +404,7 @@ ) = @_; INFO(q(does not have iw link.)); - $statistics{'has no interwiki link'}++; + $statistics{'has no interwiki link'}->{ namespace($page) }++; # Now we need to search for no_iw templates # and parse their parameters - date and type @@ -1020,7 +1022,7 @@ sub is_in_namespace { my ($page, @namespaces) = @_; - return $page->namespace() ~~ [ map { get_string($_) } @namespaces ]; + return namespace($page) ~~ [ map { get_string($_) } @namespaces ]; } sub is_category { @@ -1173,6 +1175,12 @@ while (not defined $statistics{count_iw}->[ --$max_iw_index ]) { } } + INFO('pages without interwiki links per namespace'); + foreach my $namespace (keys %{ $statistics{'has no interwiki link'} }) { + INFO( + "$namespace: $statistics{'has no interwiki link'}->{$namespace}"); + } + INFO("\nNAMESPACES"); foreach my $namespace (sort keys %namespace_count) { INFO("$namespace: $namespace_count{$namespace}"); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-30 17:48:23
|
Revision: 50 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=50&view=rev Author: amire80 Date: 2008-07-30 17:48:29 +0000 (Wed, 30 Jul 2008) Log Message: ----------- Updating docs. Creating a list of pages with many templates. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 17:31:19 UTC (rev 49) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 17:48:29 UTC (rev 50) @@ -430,6 +430,7 @@ if ($found_templates_count > 1) { WARN('many templates were found'); $statistics{'many templates'}++; + special_cases_file('many_templates', {}, $page); } else { INFO('good, found one template'); @@ -1428,12 +1429,13 @@ Goal: None at the moment, it works well enough. -=head2 Templates must be removed manually +=head2 Templates are removed semi-automatically -Templates on pages which already have the needed link are not removed -automatically. +Templates on pages which already have needed links are not removed +automatically. A list of them is created and a bot can run on it and remove +the outdated templates. This can be done automatically. -Goal: v0.2 Noa +Goal: None at the moment, it works well enough. =head2 Cooling date This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-30 17:31:10
|
Revision: 49 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=49&view=rev Author: amire80 Date: 2008-07-30 17:31:19 +0000 (Wed, 30 Jul 2008) Log Message: ----------- Preparing a list of pages which have valid links and a template, so it can be deleted. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 16:18:57 UTC (rev 48) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 17:31:19 UTC (rev 49) @@ -383,6 +383,7 @@ INFO('has template no_iw. trying to remove ...'); remove_tmpl_no_iw($page_text_ref); $statistics{'has both valid interwiki and template'}++; + special_cases_file('outdated_template', {}, $page); } } else { # does not have iw @@ -741,6 +742,9 @@ sub special_cases_file { my ($special_case_name, $special_cases_ref, $page) = @_; my $special_case_langs = join q{, }, sort keys %{$special_cases_ref}; + if ($special_case_langs) { + $special_case_langs = " ($special_case_langs)"; + } my $special_case_fn = make_type_fn($special_case_name, 1); if (not -e $special_case_fn) { append_to_file($special_case_fn, $special_case_name); @@ -749,7 +753,7 @@ my $link = make_link($page_title); my $line = $link - . " ($special_case_langs)" + . $special_case_langs . $FIELD_SEP . get_sort_title($page_title); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-30 16:18:49
|
Revision: 48 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=48&view=rev Author: amire80 Date: 2008-07-30 16:18:57 +0000 (Wed, 30 Jul 2008) Log Message: ----------- Started refactoring statistics. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 16:14:36 UTC (rev 47) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 16:18:57 UTC (rev 48) @@ -328,35 +328,9 @@ my $begin_time = time; find_iwless(); my $total_time = time - $begin_time; -say "total time: $total_time"; -INFO("\nSUMMARY"); -foreach my $stat_type (sort keys %statistics) { - if (not ref $statistics{$stat_type}) { - INFO("$stat_type: $statistics{$stat_type}"); - } -} +print_stats(); -my $max_iw_index = $#{ $statistics{count_iw} }; -INFO("max_iw_index: $max_iw_index"); -for my $max_iw_place (0 .. $option{max_iw_places}) { - my @links = - map { make_link($_) } @{ $statistics{count_iw}->[$max_iw_index] }; - INFO("# $max_iw_index: " . join_links(\@links, 0)); - - # Do nothing, just count down to the next index with a defined list - while (not defined $statistics{count_iw}->[ --$max_iw_index ]) { } -} - -INFO("\nNAMESPACES"); -foreach my $namespace (sort keys %namespace_count) { - INFO("$namespace: $namespace_count{$namespace}"); -} -INFO("\nTYPES"); -foreach my $type (sort keys %type_count) { - INFO("$type: $type_count{$type}"); -} - create_no_iw_pages(); INFO(q{}); @@ -1174,6 +1148,38 @@ return join $link_sep, @{$links_ref}; } +sub print_stats { + INFO("\nSUMMARY"); + say "total time: $total_time"; + foreach my $stat_type (sort keys %statistics) { + if (not ref $statistics{$stat_type}) { + INFO("$stat_type: $statistics{$stat_type}"); + } + } + + my $max_iw_index = $#{ $statistics{count_iw} }; + INFO("max_iw_index: $max_iw_index"); + for my $max_iw_place (0 .. $option{max_iw_places}) { + my @links = + map { make_link($_) } @{ $statistics{count_iw}->[$max_iw_index] }; + INFO("# $max_iw_index: " . join_links(\@links, 0)); + + # Do nothing, just count down to the next index with a defined list + while (not defined $statistics{count_iw}->[ --$max_iw_index ]) { } + } + + INFO("\nNAMESPACES"); + foreach my $namespace (sort keys %namespace_count) { + INFO("$namespace: $namespace_count{$namespace}"); + } + INFO("\nTYPES"); + foreach my $type (sort keys %type_count) { + INFO("$type: $type_count{$type}"); + } + + return; +} + __END__ =head1 NAME This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-30 16:14:27
|
Revision: 47 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=47&view=rev Author: amire80 Date: 2008-07-30 16:14:36 +0000 (Wed, 30 Jul 2008) Log Message: ----------- Unsorted lists go to a separate dir. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-29 08:56:15 UTC (rev 46) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-30 16:14:36 UTC (rev 47) @@ -55,7 +55,8 @@ my %PATTERN; Readonly my $WIKITEXT_EXT => 'wiki.txt'; -Readonly my $UNSORTED_EXT => "unsorted.$WIKITEXT_EXT"; +Readonly my $OUT_DIR => 'out'; +Readonly my $UNSORTED_DIR => "$OUT_DIR/unsorted"; Readonly my $ALT_SEP => q{|}; Readonly my $FIELD_SEP => qq{\t}; Readonly my $LINK_SEP => q{|}; @@ -287,13 +288,14 @@ } # TODO: Make smarter, configurable, whatever -Readonly my $OUT_DIR => 'out'; -if (-d $OUT_DIR) { - unlink glob "$OUT_DIR/*"; +foreach my $out_dir ($OUT_DIR, $UNSORTED_DIR) { + if (-d $out_dir) { + unlink glob "$out_dir/*$WIKITEXT_EXT"; + } + else { + mkdir $out_dir; + } } -else { - mkdir $OUT_DIR; -} my $dump = Parse::MediaWikiDump::Pages->new($dump_fn); @@ -903,7 +905,7 @@ # Run over page types UNSORTED_TYPE_FN: - foreach my $unsorted_type_fn (glob "$OUT_DIR/*$UNSORTED_EXT") { + foreach my $unsorted_type_fn (glob "$UNSORTED_DIR/*") { my %all_pages_in_type = (); open my $unsorted_type_file, '<', $unsorted_type_fn or croak(file_error('opening', $unsorted_type_fn, 'reading')); @@ -1063,7 +1065,6 @@ my $STRINGS_FN = "$lang.strings.txt"; - # TODO: Refactor or upgrade to Locale::Maketext open my $STRINGS_FILE, '<:utf8', $STRINGS_FN or croak(file_error('opening', $STRINGS_FN, 'reading')); my @strings_file_lines = <$STRINGS_FILE>; @@ -1100,11 +1101,11 @@ #my $transliterated_type = $TRANSLITERATOR->translit($type); my $transliterated_type = $type; - my $ext = $unsorted ? $UNSORTED_EXT : $WIKITEXT_EXT; - my $type_fn = "$transliterated_type.$ext"; + my $type_fn = "$transliterated_type.$WIKITEXT_EXT"; $type_fn =~ s{$PATTERN{invalid_filename_char}}{-}xmsgo; - $type_fn = "$OUT_DIR/$type_fn"; + my $dir = $unsorted ? $UNSORTED_DIR : $OUT_DIR; + $type_fn = "$dir/$type_fn"; return $type_fn; } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-29 08:56:07
|
Revision: 46 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=46&view=rev Author: amire80 Date: 2008-07-29 08:56:15 +0000 (Tue, 29 Jul 2008) Log Message: ----------- More cosmetics to fix broken syntax highlighting. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-29 08:43:51 UTC (rev 45) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-29 08:56:15 UTC (rev 46) @@ -261,7 +261,7 @@ Readonly my $ALT_LANGS => join $ALT_SEP, keys %LANG_CODE; # XXX Should use ALT_LANGS, but an efficient way is needed to update -# lang codes list, so in the meantime it's loose. +# lang codes list, so in the meantime it is loose. $PATTERN{interwiki_link} = qr{ \Q$MW_SYNTAX{start_link}\E (?<lang_code> @@ -341,10 +341,9 @@ my @links = map { make_link($_) } @{ $statistics{count_iw}->[$max_iw_index] }; INFO("# $max_iw_index: " . join_links(\@links, 0)); - while (not defined $statistics{count_iw}->[ --$max_iw_index ]) { - # Do nothing, just count down to the next index with a defined list - } + # Do nothing, just count down to the next index with a defined list + while (not defined $statistics{count_iw}->[ --$max_iw_index ]) { } } INFO("\nNAMESPACES"); @@ -410,7 +409,7 @@ $statistics{'has both valid interwiki and template'}++; } } - else { # doesn't have iw + else { # does not have iw process_iwless_page($page, $has_tmpl_no_iw, $has_iw); } } @@ -425,7 +424,7 @@ $has_iw # scalar bool ) = @_; - INFO(q(doesn't have iw link.)); # ' + INFO(q(does not have iw link.)); $statistics{'has no interwiki link'}++; # Now we need to search for no_iw templates @@ -436,8 +435,8 @@ my $page_text_ref = $page->text(); my $page_title = $page->title(); - # Optimized - doesn't start searching, - # if we already know that it's not there + # Optimized - does not start searching, + # if we already know that it is not there if ($has_tmpl_no_iw) { find_templates($page_text_ref, \@found_templates, [ get_string('no_iw') ]); @@ -477,8 +476,8 @@ $statistics{'cooling date passed'}++; } else { - INFO(q(cooling date didn't pass.)); # ' - $statistics{q(cooling date didn't pass)}++; # ' + INFO(q(cooling date did not pass.)); + $statistics{q(cooling date did not pass)}++; } } @@ -517,7 +516,7 @@ # Still nothing? if (not scalar @all_types) { my $other_type = get_string('other'); - INFO("$page_title doesn't have any type, adding to $other_type"); + INFO("$page_title does not have any type, adding to $other_type"); @all_types = ($other_type); $statistics{'automatically added to type other'}++; } @@ -542,7 +541,7 @@ MATCH: foreach my $next_match (@matches) { if ($next_match !~ $PATTERN{template}) { - INFO(q(i thought that it's a template, but it was:)); # ' + INFO(q(i thought that it is a template, but it was:)); if ($next_match =~ $PATTERN{wikitable}) { INFO('a wikitable'); } @@ -557,7 +556,7 @@ # XXX Matches anywhere in the template. # It probably should match the template name. - # Also - it's case-insensitive which is very wrong + # Also - it is case-insensitive which is very wrong # but kinda useful. if ($next_match =~ /\Q$next_filter/xmsi) { @@ -614,7 +613,7 @@ $parsed_params{$name} = $value; } else { - my $error_msg = "Weird - $clause doesn't look a param"; + my $error_msg = "Weird - $clause does not look a param"; INFO($error_msg); cluck($error_msg); $statistics{'weird param'}++; @@ -708,12 +707,12 @@ } # A # sign not after an &. - # After an & it's probably a character number. + # After an & it is probably a character number. if ($foreign_article =~ $PATTERN{section_link}) { $special_cases{section_links}->{$lang_code} = q{}; } - # Char codes are common in section links, so there's no + # Char codes are common in section links, so there is no # need to show them again elsif ($foreign_article =~ $PATTERN{character_code_in_link}) { $special_cases{charnumber_links}{$lang_code} = q{}; @@ -1004,7 +1003,7 @@ $level # number ) = @_; - $level //= 2; + $level //= 2; # / my $level_marker = q{=} x $level; # Line ending is mandatory @@ -1083,7 +1082,7 @@ my ($english, $target) = split $PATTERN{field_sep}, $next_string_line; # Fallback to English if no target language string was supplied - $STRING{$english} = $target // $english; + $STRING{$english} = $target // $english; # / } return %STRING; @@ -1091,12 +1090,12 @@ sub get_string { my ($english) = @_; - return $STRING{$english} //= $english; + return $STRING{$english} //= $english; # / } sub make_type_fn { my ($type, $unsorted) = @_; - $unsorted //= 0; + $unsorted //= 0; # / #my $transliterated_type = $TRANSLITERATOR->translit($type); my $transliterated_type = $type; @@ -1168,7 +1167,7 @@ sub join_links { my ($links_ref, $line_end) = @_; - $line_end //= 1; + $line_end //= 1; # / my $link_sep = q{ } . $LINK_SEP . ($line_end ? "\n" : q{ }); return join $link_sep, @{$links_ref}; @@ -1232,7 +1231,7 @@ =head1 DESCRIPTION -The main goal of this searching is to find pages which don't have +The main goal of this searching is to find pages which do not have interwiki (interlanguage) links to certain languages. This program scans a MediaWiki XML dump file. It searches every page for @@ -1273,7 +1272,7 @@ =head2 FILENAME is a weird dump file name. -The dump file doesn't appear to have a standard name that appears +The dump file does not appear to have a standard name that appears at L<http://download.wikimedia.org/>. =head2 error opening FILENAME ... @@ -1288,21 +1287,21 @@ =head2 A page has no pure title -Something is particularly weird with the name of a page. The program can't +Something is particularly weird with the name of a page. The program cannot separate its name from its namespace. It can also be a bug in this program. -=head2 Some weirdness happened - STRING doesn't look a param +=head2 Some weirdness happened - STRING does not look a param -STRING is supposed to be a parameter in a template, but it doesn't look like +STRING is supposed to be a parameter in a template, but it does not look like one. It could be an error in the template, and also a bug in this program (the parser that this program employs is rather limited). =head2 Unicode character 0xNUMBER is illegal This is a standard Perl warning. It may appear if a page or its title have -funky Unicode characters which shouldn't be there according to the Unicode +funky Unicode characters which should not be there according to the Unicode standard (to be more precise, according to the implementation of this -standard in your version of perl). Most probably these characters aren't +standard in your version of perl). Most probably these characters are not supposed to be in the page and should be fixed, but otherwise this issue is not supposed to affect the functionality of this program significantly. @@ -1361,7 +1360,7 @@ =head2 Perl 5.10 This program needs Perl 5.10. It has clean, new and useful syntax, which -makes the programs easier to hack, maintain and debug. It's useless to try +makes the programs easier to hack, maintain and debug. It is useless to try and run it on an older version, unless you want to waste your time backporting. Please upgrade your Perl installation if you still have 5.8 or something older. @@ -1378,7 +1377,7 @@ The time invested in making the code P::C-friendly will be returned as time saved on debugging. Also consider reading the book "Perl Best Practices" by -Damian Conway if you haven't already. +Damian Conway if you have not already. =head1 INCOMPATIBILITIES @@ -1396,9 +1395,9 @@ Please report all bugs, features requests and other comments to Amir E. Aharoni (ami...@gm...). -=head2 There's no equality between languages +=head2 There is no equality between languages -Currently this program actually only lists pages which don't have +Currently this program actually only lists pages which do not have an interwiki link to the English Wikipedia. This is obviously not useful on the English Wikipedia and is conceptually problematic on other Wikipedias, too. This is being fixed, but it is not simple to do it Right. @@ -1451,7 +1450,7 @@ Goal: v0.8 Moshe -=head2 There's no test suite +=head2 There is no test suite That can be done after proper modularization. Also, a local test MediaWiki server would be needed. @@ -1490,7 +1489,7 @@ =back -=item * It's (roughly) based on another bot by Guy Shaked (Costello). +=item * It is (roughly) based on another bot by Guy Shaked (Costello). =over This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-29 08:43:41
|
Revision: 45 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=45&view=rev Author: amire80 Date: 2008-07-29 08:43:51 +0000 (Tue, 29 Jul 2008) Log Message: ----------- Cosmetics to fix broken syntax highlighting. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-28 16:47:22 UTC (rev 44) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-29 08:43:51 UTC (rev 45) @@ -1,7 +1,7 @@ #!/usr/bin/perl # prepare_noiw_list.pl -# version Noa - development +# version 0.2 Noa - development # See the POD documentation at the end of the file or run # perldoc prepare_noiw_list.pl @@ -51,7 +51,7 @@ #>>> our $VERSION = ($SVN_PROPS{Revision} =~ /\A\$Revision:\ (?<revision_num>\d+)\ \$\z/xms) ? "0.1.$+{revision_num}" - : croak(q(Something's wrong with SVN revision number)); + : croak(q(Something is wrong with SVN revision number)); my %PATTERN; Readonly my $WIKITEXT_EXT => 'wiki.txt'; @@ -234,7 +234,7 @@ }xms; $PATTERN{invalid_filename_char} = qr{[\\\n/:*?"<>|]}xms; # " -# TODO: Check whether it's Neapolitan with its '' +# TODO: Check whether it is Neapolitan with its '' $PATTERN{character_code_in_link} = qr{ (?: [%.] # There are both %C4%B0 and .AA.E0 @@ -1013,7 +1013,7 @@ # Custom Unicode character property for finding characters. # The custom is to give those subroutines CamelCase names. -sub IsLeftToRight { ## no critic (NamingConventions::ProhibitMixedCaseSubs) +sub IsLeftToRight { ## no critic NamingConventions::ProhibitMixedCaseSubs return <<'END'; +utf8::InHebrew +utf8::IsSpace This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <am...@us...> - 2008-07-28 16:47:12
|
Revision: 44 http://perlwikibot.svn.sourceforge.net/perlwikibot/?rev=44&view=rev Author: amire80 Date: 2008-07-28 16:47:22 +0000 (Mon, 28 Jul 2008) Log Message: ----------- Updated POD. Modified Paths: -------------- trunk/no-interwiki/prepare_noiw_list.pl Modified: trunk/no-interwiki/prepare_noiw_list.pl =================================================================== --- trunk/no-interwiki/prepare_noiw_list.pl 2008-07-28 16:26:20 UTC (rev 43) +++ trunk/no-interwiki/prepare_noiw_list.pl 2008-07-28 16:47:22 UTC (rev 44) @@ -1396,43 +1396,68 @@ Please report all bugs, features requests and other comments to Amir E. Aharoni (ami...@gm...). -=head2 English is king +=head2 There's no equality between languages -In the meantime this program can actually only find pages which don't have +Currently this program actually only lists pages which don't have an interwiki link to the English Wikipedia. This is obviously not useful on the English Wikipedia and is conceptually problematic on other Wikipedias, too. This is being fixed, but it is not simple to do it Right. +Goal: 0.4 Reut + =head2 Internationalization is far from perfect Date handling and strings localization is very primitive. There are plans to upgrade it to smarter modules such as Locale::Maketext. +Goal: 0.6 Itay + =head2 MediaWiki parsing is ad hoc This program only does very rudimentary and ad hoc MediaWiki syntax parsing. +Goal: None at the moment, it works well enough. + =head2 Templates must be removed manually -Templates on pages which already have the needed are not removed -automatically. This is a stub; you can help Wikipedia by writing these -functions! +Templates on pages which already have the needed link are not removed +automatically. -The actual reason for this is that the author doesn't want to write a bot -that touches the live online Wikipedia until he is very sure that the rest -of the script is very stable. Besides, manual work tends to make articles -Wikipedia articles better! Quality is more important than quantity and speed. +Goal: v0.2 Noa =head2 Cooling date The implementation of the cooling date is very rudimentary. -=head2 No separation of searching and formatting +Goal version: v0.4 Reut -There are two main function here: C<find_iwless()> and -C<create_no_iw_pages()>. They are doing separate things and should run -from different programs. +=head2 Major refactoring is needed +=over + +=item * The main code is on the brink of passing the threshold for complexity that +P::C accepts. + +=item * There is no separation of searching and formatting. There are two main +function here: C<find_iwless()> and C<create_no_iw_pages()>. They are doing +separate things and should run from different programs. + +=item * Statistics and multi links are just slapped to the log. + +=item * At least some of the code can be rewritten as classes that inherit +from L<Parse::MediaWikiDump>. + +=back + +Goal: v0.8 Moshe + +=head2 There's no test suite + +That can be done after proper modularization. Also, a local test MediaWiki +server would be needed. + +Goal: v1.0 Drora + =head1 HISTORY =over @@ -1441,7 +1466,8 @@ categories sorting. Memory usage optimization - accumulating information in files. More generic, but far-from-perfect handling of links to languages other than English. Translitetaion with Lingua::Translit. Logging with -Log::Log4perl. +Log::Log4perl. Brutal Perl::Critic 1.90. Started using Readonly. Not finished: +complete statistics, removal of templates from pages which already have links. =item * B<0.1 - First and unnamed Amir E. Aharoni's version>: Types introduced. Conceptual l10n, but only tested on he.wiki. Still en links This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |