You can subscribe to this list here.
| 2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2009 |
Jan
(1) |
Feb
|
Mar
(14) |
Apr
(2) |
May
|
Jun
(2) |
Jul
(5) |
Aug
(3) |
Sep
(2) |
Oct
(18) |
Nov
(5) |
Dec
(1) |
| 2010 |
Jan
(2) |
Feb
(4) |
Mar
(3) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
| 2011 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(3) |
Oct
(5) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(2) |
Oct
|
Nov
(4) |
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Tomaž Š. <tom...@ta...> - 2014-05-20 08:09:21
|
Dear all, To avoid losing content I've moved Wikiprep documentation from MediaWiki to the new SourceForge's wiki. http://sourceforge.net/p/wikiprep/pwiki/Main%20Page/ Best regards Tomaž |
|
From: Maximilien D. <max...@gm...> - 2013-03-23 15:26:14
|
Thank you for your answer. The line "wikiprep -format composite -compress -f enwiki-latest-pages-articles.xml.bz2" with the file enwiki-latest-pages-articles.xml.bz2 downloaded from http://en.wikipedia.org/wiki/Wikipedia:Database_download gives me the same error: "(in cleanup) Only version 0.3 and 0.4 dump files are supported at /usr/share/perl5/Parse/MediaWikiDump/Revisions.pm line 266." Any other suggestion? Best regards, MD On 03/22/2013 05:36 PM, Tomaž Šolc wrote: > On 03/22/2013 04:53 PM, Maximilien Danisch wrote: >> I downloaded the file enwiki-20130304-pages-meta-current.xml.bz2 from >> the page http://dumps.wikimedia.org/enwiki/20130304/. > Wikiprep is used for parsing full text article dumps > (*-pages-articles.xml.bz2). > > There is no support for processing meta-data dumps. > > Best regards > Tomaž > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user -- Home Page: http://perso.crans.org/danisch/max/home |
|
From: Tomaž Š. <tom...@ta...> - 2013-03-22 16:54:57
|
On 03/22/2013 04:53 PM, Maximilien Danisch wrote: > I downloaded the file enwiki-20130304-pages-meta-current.xml.bz2 from > the page http://dumps.wikimedia.org/enwiki/20130304/. Wikiprep is used for parsing full text article dumps (*-pages-articles.xml.bz2). There is no support for processing meta-data dumps. Best regards Tomaž |
|
From: Maximilien D. <max...@gm...> - 2013-03-22 15:53:54
|
Hi everyone, I downloaded the file enwiki-20130304-pages-meta-current.xml.bz2 from the page http://dumps.wikimedia.org/enwiki/20130304/. With the aim of parsing the file, I downloaded and installed the perl software: "version 3.04 of wikiprep" from the page http://sourceforge.net/projects/wikiprep/?source=dlp. However it seems that this software or maybe the library "MediaWikiDump" is retired. In fact, entering the line: "wikiprep -format composite -compress -f enwiki-20130304-pages-meta-current.xml.bz2" gives me the error: "(in cleanup) Only version 0.3 and 0.4 dump files are supported at /usr/share/perl5/Parse/MediaWikiDump/Revisions.pm line 266." Do you have any suggestion? Best, MD -- Maximilien Danisch. http://perso.crans.org/danisch/max/home |
|
From: Markus M. <mar...@gm...> - 2012-11-16 04:06:28
|
We wrote a classifier for searches and webpages that is small enough to run on Hadoop nodes. We found that a Gabrilovich-like wiki classifier did pretty well. But to create that we needed the wikiprep resource files - we had trouble creating those for recent wikimedia dumps and it also took a long while. We were looking for a solution that we can run in a few hours. On Thu, Nov 15, 2012 at 5:37 PM, Uilton Dutra <uil...@gm...> wrote: > Nice to see other implementations of wikiprep! I will definitely try it > out. > > Which kind of applications are you developing there? > > Cheers, > > - Uilton > > > > On Fri, Nov 9, 2012 at 7:50 PM, Markus Mobius <mar...@gm...>wrote: > >> Hi, >> >> Microsoft Research New England developed a project called Wikiprep# that >> processes wikimedia files similarly to wikiprep. The output files are very >> similar to wikiprep. >> >> It's written in C# and based on Net 4.0 but might work in Mono. It's >> licensed under Apache 2.0. >> >> Source code and setup program are here: >> >> http://wikiprepsharp.codeplex.com/ >> >> Markus Mobius >> Senior Researcher >> MSR New England >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_nov >> _______________________________________________ >> Wikiprep-user mailing list >> Wik...@li... >> https://lists.sourceforge.net/lists/listinfo/wikiprep-user >> >> > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > > |
|
From: Uilton D. <uil...@gm...> - 2012-11-15 22:38:18
|
Nice to see other implementations of wikiprep! I will definitely try it out. Which kind of applications are you developing there? Cheers, - Uilton On Fri, Nov 9, 2012 at 7:50 PM, Markus Mobius <mar...@gm...>wrote: > Hi, > > Microsoft Research New England developed a project called Wikiprep# that > processes wikimedia files similarly to wikiprep. The output files are very > similar to wikiprep. > > It's written in C# and based on Net 4.0 but might work in Mono. It's > licensed under Apache 2.0. > > Source code and setup program are here: > > http://wikiprepsharp.codeplex.com/ > > Markus Mobius > Senior Researcher > MSR New England > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > > |
|
From: Markus M. <mar...@gm...> - 2012-11-09 21:50:36
|
Hi, Microsoft Research New England developed a project called Wikiprep# that processes wikimedia files similarly to wikiprep. The output files are very similar to wikiprep. It's written in C# and based on Net 4.0 but might work in Mono. It's licensed under Apache 2.0. Source code and setup program are here: http://wikiprepsharp.codeplex.com/ Markus Mobius Senior Researcher MSR New England |
|
From: Markus M. <mar...@gm...> - 2012-11-09 21:49:34
|
Hi, Microsoft Research New England developed a project called Wikiprep# that processes wikimedia files similarly to wikiprep. The output files are very similar to wikiprep. It's written in C# and based on Net 4.0 but might work in Mono. It's licensed under Apache 2.0. Source code and setup program are here: http://wikiprepsharp.codeplex.com/ Markus Mobius Senior Researcher MSR New England On Fri, Aug 31, 2012 at 9:27 AM, Karsten Jeschkies <je...@we...> wrote: > Hi, > > I have problems prescanning the Wikipedia Dump from August 3rd. > > Strawberry 64bit crashes on Windows 7 right at the start. > > On Ubuntu 12.04 64bit it runs for roughly 3 hrs and crashes without an > error. I am running it with debug logging now. Can anyone suggest how I > should tackle the problem? > > Also: I noticed that the development stopped. Did anyone continue or does > anyone know an alternative? > > Thx, > Karsten > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > > |
|
From: Tomaž Š. <tom...@ta...> - 2012-09-03 07:23:57
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03. 09. 2012 06:26, Erik Ward wrote: > Then running wikiprep (I installed all the debian packages listed in > the readme) > > wikiprep -format composite -compress -nourl -parallel -f > enwiki-latest-pages-articles.xml.0000.gz > > gzip: enwiki-latest-pages-articles.xml.0000.gz: unexpected end of file This error makes me think your "enwiki-latest-pages-articles.xml.0000.gz" is corrupted. Try running gzip -dc enwiki-latest-pages-articles.xml.0000.gz > /dev/null and see if it produces any errors. If it does, something went wrong when splitwiki created this file. Best regards Tomaž -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFQRFr2sAlAlRhL9q8RAuU5AJ9hhAmZD/8hEHm7Q5+BQj3BIfe26QCfaK2U Iok1aI41jtf/vqunwsrHM2U= =9oiM -----END PGP SIGNATURE----- |
|
From: Erik W. <Eri...@st...> - 2012-09-03 04:51:44
|
Hello! I have been trying to run wikiprep over the weekend but saw that it crashes. First I tried using splitwiki and then running it using four cores. Compilation of splitwiki. splitwiki: splitwiki.o gcc splitwiki.o -O2 -lz -o splitwiki splitwiki.o: splitwiki.c gcc -c -Wall splitwiki.c Then running wikiprep (I installed all the debian packages listed in the readme) wikiprep -format composite -compress -nourl -parallel -f enwiki-latest-pages-articles.xml.0000.gz gzip: enwiki-latest-pages-articles.xml.0000.gz: unexpected end of file no element found at line 113211447, column 749, byte -1537634522 at /usr/share/perl5/Parse/MediaWikiDump/ Revisions.pm line 233 No such file or directory at /usr/local/bin/wikiprep line 358. ./enwiki-latest-pages-articles.title2id.db: No such file or directory at /usr/local/bin/wikiprep line 479 . ./enwiki-latest-pages-articles.title2id.db: No such file or directory at /usr/local/bin/wikiprep line 479 . ./enwiki-latest-pages-articles.title2id.db: No such file or directory at /usr/local/bin/wikiprep line 479 . ./enwiki-latest-pages-articles.title2id.db: No such file or directory at /usr/local/bin/wikiprep line 479 . I thought that I would try and see if it would crash using only one core before looking into details: wikiprep -format composite -compress -nourl -f enwiki-latest-pages-articles.xml.bz2 Use of qw(...) as parentheses is deprecated at /usr/local/share/perl/5.14.2/Wikiprep/Disambig.pm line 9. Use of qw(...) as parentheses is deprecated at /usr/local/bin/wikiprep line 134. Sep 03 09:48:12 [WARNING] title Ss (ID 354283) already encountered before (ID 198274) Sep 03 10:42:47 [WARNING] title T? (ID 13066537) already encountered before (ID 3406617) Sep 03 11:04:29 [WARNING] title ? (ID 19185171) already encountered before (ID 18984678) Sep 03 11:08:26 [WARNING] title ? (ID 20363161) already encountered before (ID 16504503) Sep 03 12:09:10 [NOTICE] total 12584750 pages (31631083782 bytes) Sep 03 12:09:15 [NOTICE] Loaded 6125016 titles Sep 03 12:09:15 [NOTICE] Loaded 5608108 redirects Sep 03 12:09:15 [NOTICE] Loaded 362394 templates Out of memory! So what are the best steps to getting wikiprep running? I would rather not learn perl details (unfamiliar with the language). Best regards, Erik |
|
From: Karsten J. <je...@we...> - 2012-08-31 13:27:32
|
Hi, I have problems prescanning the Wikipedia Dump from August 3rd. Strawberry 64bit crashes on Windows 7 right at the start. On Ubuntu 12.04 64bit it runs for roughly 3 hrs and crashes without an error. I am running it with debug logging now. Can anyone suggest how I should tackle the problem? Also: I noticed that the development stopped. Did anyone continue or does anyone know an alternative? Thx, Karsten |
|
From: Tomaž Š. <tom...@ta...> - 2012-07-16 08:13:45
|
Dear all Later this year SourceForge will be shutting down the hosted MediaWiki installation that serves web pages accessible at http://wikiprep.sf.net. http://sourceforge.net/blog/hosted-apps-retirement/ Those pages contain useful information that is not present anywhere else. Sourceforge suggests installing a new MediaWiki installation in project's web space. As I'm no longer actively involved with Wikiprep I don't want to maintain this separate MediaWiki installation. However I would hate to see that content lost. Would anyone here like to take care of moving content away from MediaWiki to some other web location, either on Sourceforge or somewhere else (for instance, Wikiprep code is also mirrored on Github and they also provide a Wiki)? If not I will move it all into text files in the source repository. Best regards Tomaž |
|
From: Alireza N. <ali...@gm...> - 2012-07-01 18:49:23
|
Hi, I've used wikiprep on a 2012 dump of WIkipedia (8GB) on my 4GB ram machine, ubuntu linux. I didn't split dump file and after the *prescan* phase, wikiprep process crashed with Out of memory! error. I know that the *prescan * phase was completed because my log file ends with these lines: Jul 01 08:31:52 [NOTICE] total 12300930 pages (30654051304 bytes) Jul 01 08:32:00 [NOTICE] Loaded 5992645 titles Jul 01 08:32:00 [NOTICE] Loaded 5471056 redirects Jul 01 08:32:01 [NOTICE] Loaded 346913 templates Is it possible to continue process from the *transform* phase? I also tested -transform option and that resulted this error: Can't call method "filter_fetch_value" on an undefined value at /usr/local/bin/wikiprep line 483 using Wikiprep-3.04 version. |
|
From: Tomaž Š. <tom...@ta...> - 2012-05-16 10:12:33
|
You don't need to change Wikiprep for that. Categories are just like normal page in Wikipedia. So to find their names, just find the title of the page with that ID. Either in the original XML dump or in the Wikiprep output (gum.xml). There are not that many categories, so you can simply hold the mapping in memory and apply it in your indexer. Regards Tomaž On 05/15/2012 01:20 PM, vineet yadav wrote: > Hi, > I want to create lucene index of wikipedia. I want to create index of > wikipedia category names and store them in separate field. So I want > to store category names in hgw.xml file and use it for indexing. > Wikiprep gives category id instead of category names. Can you point me > out what changes I need to make to get category names? > Thanks > Vineet Yadav > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > |
|
From: vineet y. <vin...@gm...> - 2012-05-15 11:20:10
|
Hi, I want to create lucene index of wikipedia. I want to create index of wikipedia category names and store them in separate field. So I want to store category names in hgw.xml file and use it for indexing. Wikiprep gives category id instead of category names. Can you point me out what changes I need to make to get category names? Thanks Vineet Yadav |
|
From: Tomaž Š. <tom...@ta...> - 2011-10-18 10:21:22
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 17. 10. 2011 20:31, Uilton Dutra wrote: > Hi Tomaž, > > I like Github because it's easier to manage pull requests and issues. > > Would please create a repo there? Here it is: https://github.com/avian2/wikiprep Regards Tomaž > On Mon, Oct 10, 2011 at 11:47 AM, Tomaž Šolc <tom...@ta... > <mailto:tom...@ta...>> wrote: > > Hi Uilton > > I'm glad you found Wikiprep useful. > > Short of cloning some repositories from Github I don't have much > experience with it, so I don't know what you gain from hosting code > there. People have forked some other projects of mine on Github and I've > pulled their changes back into my repository without problems. > > I see you already have your own Wikiprep repository on Github. If you > think it will help, I can open an account and create another clone of > the repository. > > I think there's already a confusion about repositories, because there's > a very out-dated CVS on SourceForge. It might be best then to move > everything to Github to not complicate things further. > > Best regards > Toma~ > > On 10. 10. 2011 00:32, Uilton Dutra wrote: >> Hi Toma~, > >> First of all I wish to thank you for the work and support you gave to >> Wikiprep. I came across the script after reading Evgeniys papers > and I >> was very pleased to find out that the package was maintained. > Saved me a >> bunch of time! > >> Have you considered to move the source-code to Github? I think such >> platform makes it easier to collaborate. I dont have the time to be >> the oficial maintainer but I can help codding some stuff or > testing the >> dump processing. > >> Thanks again and hope everything is great in Slovenia! > >> Cheers from Brazil. > >> - Uilton > > >> 2011/10/4 Toma~ `olc <tom...@ta... > <mailto:tom...@ta...> <mailto:tom...@ta... > <mailto:tom...@ta...>>> > >> Hi everyone > >> I believe you should know that I am no longer employed at Zemanta. > >> This means that I no longer have access to machines needed to process >> Wikipedia dumps and have neither the means nor the motivation to > further >> develop Wikiprep. > >> Until I find someone who will take over the maintenance of this > project >> I will do my best to fix any bugs that get reported and review and >> commit any submitted patches. But the way things currently look you >> can't expect any major new Wikiprep development from me. > >> For the time being I have moved the git repository to my own server: > >> http://www.tablix.org/~avian/git/wikiprep.git/ > >> Best regards >> Toma~ > > - > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > <mailto:Wik...@li...> > <mailto:Wik...@li... > <mailto:Wik...@li...>> > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > > > > > > ------------------------------------------------------------------------------ >> All of the data generated in your IT infrastructure is seriously > valuable. >> Why? It contains a definitive record of application performance, > security >> threats, fraudulent activity, and more. Splunk takes this data and > makes >> sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunk-d2dcopy2 > > > >> _______________________________________________ >> Wikiprep-user mailing list >> Wik...@li... > <mailto:Wik...@li...> >> https://lists.sourceforge.net/lists/listinfo/wikiprep-user > - ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Wikiprep-user mailing list Wik...@li... <mailto:Wik...@li...> https://lists.sourceforge.net/lists/listinfo/wikiprep-user > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFOnVMUsAlAlRhL9q8RAhr9AKDDcR0Sg0z2Mr5PE5Eo3cDLK92RqwCg0veb y+k63bphLsdsZIPrD0TeaZA= =l6AM -----END PGP SIGNATURE----- |
|
From: Uilton D. <uil...@gm...> - 2011-10-17 18:31:51
|
Hi Tomaž, I like Github because it's easier to manage pull requests and issues. Would please create a repo there? Thanks! - Uilton On Mon, Oct 10, 2011 at 11:47 AM, Tomaž Šolc <tom...@ta...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Uilton > > I'm glad you found Wikiprep useful. > > Short of cloning some repositories from Github I don't have much > experience with it, so I don't know what you gain from hosting code > there. People have forked some other projects of mine on Github and I've > pulled their changes back into my repository without problems. > > I see you already have your own Wikiprep repository on Github. If you > think it will help, I can open an account and create another clone of > the repository. > > I think there's already a confusion about repositories, because there's > a very out-dated CVS on SourceForge. It might be best then to move > everything to Github to not complicate things further. > > Best regards > Tomaž > > On 10. 10. 2011 00:32, Uilton Dutra wrote: > > Hi Tomaž, > > > > First of all I wish to thank you for the work and support you gave to > > Wikiprep. I came across the script after reading Evgeniy’s papers and I > > was very pleased to find out that the package was maintained. Saved me a > > bunch of time! > > > > Have you considered to move the source-code to Github? I think such > > platform makes it easier to collaborate. I don’t have the time to be > > the oficial maintainer but I can help codding some stuff or testing the > > dump processing. > > > > Thanks again and hope everything is great in Slovenia! > > > > Cheers from Brazil. > > > > - Uilton > > > > > > 2011/10/4 Tomaž Šolc <tom...@ta... <mailto: > tom...@ta...>> > > > > Hi everyone > > > > I believe you should know that I am no longer employed at Zemanta. > > > > This means that I no longer have access to machines needed to process > > Wikipedia dumps and have neither the means nor the motivation to further > > develop Wikiprep. > > > > Until I find someone who will take over the maintenance of this project > > I will do my best to fix any bugs that get reported and review and > > commit any submitted patches. But the way things currently look you > > can't expect any major new Wikiprep development from me. > > > > For the time being I have moved the git repository to my own server: > > > > http://www.tablix.org/~avian/git/wikiprep.git/ > > > > Best regards > > Tomaž > > - > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > <mailto:Wik...@li...> > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > > > > > > > ------------------------------------------------------------------------------ > > All of the data generated in your IT infrastructure is seriously > valuable. > > Why? It contains a definitive record of application performance, security > > threats, fraudulent activity, and more. Splunk takes this data and makes > > sense of it. IT sense. And common sense. > > http://p.sf.net/sfu/splunk-d2dcopy2 > > > > > _______________________________________________ > > Wikiprep-user mailing list > > Wik...@li... > > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iD8DBQFOkwVksAlAlRhL9q8RAj58AJ40Nrk7vl8wwLcGlYeELw1Pb/ItMgCgimfo > hXn5R3P9UcisUjo6MxHvfBM= > =QE2K > -----END PGP SIGNATURE----- > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > |
|
From: Tomaž Š. <tom...@ta...> - 2011-10-10 14:47:17
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Uilton I'm glad you found Wikiprep useful. Short of cloning some repositories from Github I don't have much experience with it, so I don't know what you gain from hosting code there. People have forked some other projects of mine on Github and I've pulled their changes back into my repository without problems. I see you already have your own Wikiprep repository on Github. If you think it will help, I can open an account and create another clone of the repository. I think there's already a confusion about repositories, because there's a very out-dated CVS on SourceForge. It might be best then to move everything to Github to not complicate things further. Best regards Tomaž On 10. 10. 2011 00:32, Uilton Dutra wrote: > Hi Tomaž, > > First of all I wish to thank you for the work and support you gave to > Wikiprep. I came across the script after reading Evgeniy’s papers and I > was very pleased to find out that the package was maintained. Saved me a > bunch of time! > > Have you considered to move the source-code to Github? I think such > platform makes it easier to collaborate. I don’t have the time to be > the oficial maintainer but I can help codding some stuff or testing the > dump processing. > > Thanks again and hope everything is great in Slovenia! > > Cheers from Brazil. > > - Uilton > > > 2011/10/4 Tomaž Šolc <tom...@ta... <mailto:tom...@ta...>> > > Hi everyone > > I believe you should know that I am no longer employed at Zemanta. > > This means that I no longer have access to machines needed to process > Wikipedia dumps and have neither the means nor the motivation to further > develop Wikiprep. > > Until I find someone who will take over the maintenance of this project > I will do my best to fix any bugs that get reported and review and > commit any submitted patches. But the way things currently look you > can't expect any major new Wikiprep development from me. > > For the time being I have moved the git repository to my own server: > > http://www.tablix.org/~avian/git/wikiprep.git/ > > Best regards > Tomaž - ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Wikiprep-user mailing list Wik...@li... <mailto:Wik...@li...> https://lists.sourceforge.net/lists/listinfo/wikiprep-user > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFOkwVksAlAlRhL9q8RAj58AJ40Nrk7vl8wwLcGlYeELw1Pb/ItMgCgimfo hXn5R3P9UcisUjo6MxHvfBM= =QE2K -----END PGP SIGNATURE----- |
|
From: Uilton D. <uil...@gm...> - 2011-10-09 22:33:25
|
Hi Tomaž, First of all I wish to thank you for the work and support you gave to Wikiprep. I came across the script after reading Evgeniy’s papers and I was very pleased to find out that the package was maintained. Saved me a bunch of time! Have you considered to move the source-code to Github? I think such platform makes it easier to collaborate. I don’t have the time to be the oficial maintainer but I can help codding some stuff or testing the dump processing. Thanks again and hope everything is great in Slovenia! Cheers from Brazil. - Uilton 2011/10/4 Tomaž Šolc <tom...@ta...> > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi everyone > > I believe you should know that I am no longer employed at Zemanta. > > This means that I no longer have access to machines needed to process > Wikipedia dumps and have neither the means nor the motivation to further > develop Wikiprep. > > Until I find someone who will take over the maintenance of this project > I will do my best to fix any bugs that get reported and review and > commit any submitted patches. But the way things currently look you > can't expect any major new Wikiprep development from me. > > For the time being I have moved the git repository to my own server: > > http://www.tablix.org/~avian/git/wikiprep.git/ > > Best regards > Tomaž > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iD8DBQFOitt5sAlAlRhL9q8RAuntAKCMozY+36BNcx2Du0jr+77qbsUPoACfXS4N > 5bXmpL5pfqrMJu8cc8ZrDEo= > =WwWL > -----END PGP SIGNATURE----- > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > |
|
From: Tomaž Š. <tom...@ta...> - 2011-10-04 10:10:19
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi everyone I believe you should know that I am no longer employed at Zemanta. This means that I no longer have access to machines needed to process Wikipedia dumps and have neither the means nor the motivation to further develop Wikiprep. Until I find someone who will take over the maintenance of this project I will do my best to fix any bugs that get reported and review and commit any submitted patches. But the way things currently look you can't expect any major new Wikiprep development from me. For the time being I have moved the git repository to my own server: http://www.tablix.org/~avian/git/wikiprep.git/ Best regards Tomaž -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFOitt5sAlAlRhL9q8RAuntAKCMozY+36BNcx2Du0jr+77qbsUPoACfXS4N 5bXmpL5pfqrMJu8cc8ZrDEo= =WwWL -----END PGP SIGNATURE----- |
|
From: Gašper Š. <gas...@ze...> - 2011-09-08 10:32:13
|
On Thu, Sep 8, 2011 at 10:38 AM, Tomaž Šolc <tom...@ta...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi > > > 1) 'make test' fails on some tests, I have attached the command output > > below. > > This is most likely because of the XML::Writer module you are using. > Some versions order the XML attributes differently and the tests that > are failing just perform a dumb diff on the XML files. > > Run tests with "make test TEST_VERBOSE=1". You should see a diff > displayed for tests that fail. If just the order of the attributes in > XML tags differ, then you can safely ignore these test failures. > > XML::Writer 0.611 is known to produce no test failures. > As you have said, this was only the matter of xml attributes order. > > > 2) It seems this version of wikiprep can only parse MediaWiki dumps of > > versions up to 0.4. Current dump version is 0.5. Are there any plans to > > support this version of dumps in the nearby future? > > Parsing the XML format is purely the responsibility of > Parse::MediaWikiDump library. Differences between format 0.4 and 0.5 are > minimal (two attributes have been added). I see no reason why version > 0.5 couldn't be supported. As far as I know recent versions of > Parse::MediaWikiDump ignore the dump version number altogether, so I'm > not sure what made you think 0.5 isn't supported. > > I was having problems since I failed to observe that I was not using the latest version of Parse::MediaWikiDump library. Upgrading the library resolved this issue. > > After a little bit of researching I think that wikiprep should probably > > migrate to MediaWiki::DumpFile::Compat as the current library > > Parse::MediaWikiDump is retired (see > > > http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.6/lib/Parse/MediaWikiDump.pm#Migration > ). > > Would this be a step into right direction? > > Probably. It appears that module is a drop-in replacement for > Parse::MediaWikiDump, so trying it out should be easy. > > I tried to install MediaWiki::DumpFile::Compat but failed miserably. It seems there is an issue with dependencies, but I was not able to resolve them. Thank you for all your help, I really appreciated your swift response. Best, Gašper > Regards > Tomaž > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iD8DBQFOaH8FsAlAlRhL9q8RAnwMAKCfBgBGZEpio3/U+hcK9Zq4ohQm0gCgxHSi > h12FcalJwiPVB4N6JGXi9oI= > =BIDE > -----END PGP SIGNATURE----- > > > ------------------------------------------------------------------------------ > Doing More with Less: The Next Generation Virtual Desktop > What are the key obstacles that have prevented many mid-market businesses > from deploying virtual desktops? How do next-generation virtual desktops > provide companies an easier-to-deploy, easier-to-manage and more affordable > virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/ > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > |
|
From: Tomaž Š. <tom...@ta...> - 2011-09-08 08:54:56
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi > 1) 'make test' fails on some tests, I have attached the command output > below. This is most likely because of the XML::Writer module you are using. Some versions order the XML attributes differently and the tests that are failing just perform a dumb diff on the XML files. Run tests with "make test TEST_VERBOSE=1". You should see a diff displayed for tests that fail. If just the order of the attributes in XML tags differ, then you can safely ignore these test failures. XML::Writer 0.611 is known to produce no test failures. > 2) It seems this version of wikiprep can only parse MediaWiki dumps of > versions up to 0.4. Current dump version is 0.5. Are there any plans to > support this version of dumps in the nearby future? Parsing the XML format is purely the responsibility of Parse::MediaWikiDump library. Differences between format 0.4 and 0.5 are minimal (two attributes have been added). I see no reason why version 0.5 couldn't be supported. As far as I know recent versions of Parse::MediaWikiDump ignore the dump version number altogether, so I'm not sure what made you think 0.5 isn't supported. > After a little bit of researching I think that wikiprep should probably > migrate to MediaWiki::DumpFile::Compat as the current library > Parse::MediaWikiDump is retired (see > http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.6/lib/Parse/MediaWikiDump.pm#Migration). > Would this be a step into right direction? Probably. It appears that module is a drop-in replacement for Parse::MediaWikiDump, so trying it out should be easy. Regards Tomaž -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFOaH8FsAlAlRhL9q8RAnwMAKCfBgBGZEpio3/U+hcK9Zq4ohQm0gCgxHSi h12FcalJwiPVB4N6JGXi9oI= =BIDE -----END PGP SIGNATURE----- |
|
From: Gašper Š. <gas...@ze...> - 2011-09-07 09:14:38
|
Hello, I downloaded the latest version of wikiprep (3.04) and I have two issues running it: 1) 'make test' fails on some tests, I have attached the command output below. 2) It seems this version of wikiprep can only parse MediaWiki dumps of versions up to 0.4. Current dump version is 0.5. Are there any plans to support this version of dumps in the nearby future? After a little bit of researching I think that wikiprep should probably migrate to MediaWiki::DumpFile::Compat as the current library Parse::MediaWikiDump is retired (see http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.6/lib/Parse/MediaWikiDump.pm#Migration). Would this be a step into right direction? Thank you for your time and considerations. Regards, Gašper +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ $ make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/cases.........NOK 12/248 # Failed test 'check t/cases/asse.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........ok 16/248 t/cases.........NOK 18/248# Failed test 'check t/cases/barzilla.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........ok 73/248 t/cases.........NOK 75/248# Failed test 'check t/cases/gallery.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........NOK 103/248 # Failed test 'check t/cases/images.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........NOK 116/248 # Failed test 'check t/cases/interwiki-new.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........NOK 142/248 # Failed test 'check t/cases/microsoft-new.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........ok 207/248 t/cases.........NOK 208/248# Failed test 'check t/cases/stub.gum.xml' # at t/cases.t line 69. # got: '1' # expected: '0' t/cases.........ok 217/248Unicode character 0x10ffff is illegal at bin/wikiprep line 566. Unicode character 0x10ffff is illegal at bin/wikiprep line 566. Unicode character 0x10ffff is illegal at bin/wikiprep line 566. Unicode character 0x10ffff is illegal at /home/gape/src/Wikiprep-3.04/blib/lib/Wikiprep/Disambig.pm line 21. Unicode character 0x10ffff is illegal at /home/gape/src/Wikiprep-3.04/blib/lib/Wikiprep/Disambig.pm line 21. Unicode character 0x10ffff is illegal at /home/gape/src/Wikiprep-3.04/blib/lib/Wikiprep/Disambig.pm line 21. Unicode character 0x10ffff is illegal at /home/gape/src/Wikiprep-3.04/blib/lib/Wikiprep/css.pm line 23. Unicode character 0x10ffff is illegal at /home/gape/src/Wikiprep-3.04/blib/lib/Wikiprep/css.pm line 23. Unicode character 0x10ffff is illegal at /home/gape/src/Wikiprep-3.04/blib/lib/Wikiprep/css.pm line 23. Unicode character 0x10ffff is illegal at bin/wikiprep line 969. Unicode character 0x10ffff is illegal at bin/wikiprep line 969. Unicode character 0x10ffff is illegal at bin/wikiprep line 969. Unicode character 0x10ffff is illegal at bin/wikiprep line 984. Unicode character 0x10ffff is illegal at bin/wikiprep line 984. Unicode character 0x10ffff is illegal at bin/wikiprep line 984. Unicode character 0x10ffff is illegal at bin/wikiprep line 784. Unicode character 0x10ffff is illegal at bin/wikiprep line 784. Unicode character 0x10ffff is illegal at bin/wikiprep line 784. t/cases.........ok 241/248# Looks like you failed 7 tests of 248. t/cases.........dubious Test returned status 7 (wstat 1792, 0x700) DIED. FAILED tests 12, 18, 75, 103, 116, 142, 208 Failed 7/248 tests, 97.18% okay t/css...........ok t/ctemplates....ok t/images........ok t/languages.....ok t/namespace.....ok t/nowiki........ok t/revision......ok t/templates.....ok t/utils.........ok Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/cases.t 7 1792 248 7 12 18 75 103 116 142 208 Failed 1/10 test scripts. 7/462 subtests failed. Files=10, Tests=462, 36 wallclock secs (30.46 cusr + 4.00 csys = 34.46 CPU) Failed 1/10 test programs. 7/462 subtests failed. make: *** [test_dynamic] Error 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
|
From: Tomaž Š. <tom...@ta...> - 2011-02-11 08:39:34
|
On 11. 02. 2011 06:15, Tony Plate wrote: > The second and subsequent errors you show about > "...pages-articles.title2id.db: No such file or directory" are from the > transform stage. What was happening for me was that the pre-processing > stage was crashing, and hence not creating its output, the .db files. > Then when the transform stage tried to run, it couldn't find the .db > files and printed messages just like what you show. Thanks for pointing this out. I just commited a fix that implements better diagnostics of forked worker processes and stops Wikiprep at the first signs of trouble instead of trying to push on. Best regards Tomaž > > You can see if your original file is legal XML using xmllint on it, like > this: > > xmllint --stream --noout dewiki-20101013-pages-articles.xml > > (using the appropriate filename) > > You should see no output, unless there are xml errors. If you see XML > errors, then you'll probably need to fix them before proceeding. > > -- Tony Plate > > (thanks to Tomaz Solc who helped me track down my similar problems after > I mailed this list a few weeks ago.) > > > On 2/10/2011 1:30 PM, Jose Quesada wrote: >> Hi, >> >> I preprocessed the .fr and .es wikis with the latest wikiprep. But >> when I run the same thing on the .de one I get: >> >> perl ~/projIfollow/wikiprep/lib/wikipre >> >> no element found at line 22451642, column 0, byte 1471064466 at >> /usr/lib64/perl5/site_perl/5.12.2/Parse/MediaWikiDump/Revisions.pm >> line 233 >> ./dewiki-20101013-pages-articles.title2id.db: No such file or >> directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line >> 476. >> No such file or directory at >> /home/quesada/projIfollow/wikiprep/lib/wikiprep line 355. >> ./dewiki-20101013-pages-articles.title2id.db: No such file or >> directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line >> 476. >> ./dewiki-20101013-pages-articles.title2id.db: No such file or >> directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line >> 476. >> ./dewiki-20101013-pages-articles.title2id.db: No such file or >> directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line >> 476. >> >> Any idea why this is? >> Thanks! >> >> -- >> Best, >> -Jose >> >> Jose Quesada, PhD. >> Research scientist, >> Max Planck Institute, >> Center for Adaptive Behavior and Cognition, >> Berlin >> http://www.josequesada.name/ >> http://twitter.com/Quesada >> >> > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > > > > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user |
|
From: Tony P. <tp...@ac...> - 2011-02-11 05:15:56
|
Jose, I was seeing some errors like this, that turned out to be due to a corrupted original file (due to apparent hardware errors in a VM running on physical hardware that 24 hours of diagnostics can find no problem with... go figure...) The second and subsequent errors you show about "...pages-articles.title2id.db: No such file or directory" are from the transform stage. What was happening for me was that the pre-processing stage was crashing, and hence not creating its output, the .db files. Then when the transform stage tried to run, it couldn't find the .db files and printed messages just like what you show. You can see if your original file is legal XML using xmllint on it, like this: xmllint --stream --noout dewiki-20101013-pages-articles.xml (using the appropriate filename) You should see no output, unless there are xml errors. If you see XML errors, then you'll probably need to fix them before proceeding. -- Tony Plate (thanks to Tomaz Solc who helped me track down my similar problems after I mailed this list a few weeks ago.) On 2/10/2011 1:30 PM, Jose Quesada wrote: > Hi, > > I preprocessed the .fr and .es wikis with the latest wikiprep. But when I run the same thing on the .de one I get: > > perl ~/projIfollow/wikiprep/lib/wikipre > > no element found at line 22451642, column 0, byte 1471064466 at /usr/lib64/perl5/site_perl/5.12.2/Parse/MediaWikiDump/Revisions.pm line 233 > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 355. > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > > Any idea why this is? > Thanks! > > -- > Best, > -Jose > > Jose Quesada, PhD. > Research scientist, > Max Planck Institute, > Center for Adaptive Behavior and Cognition, > Berlin > http://www.josequesada.name/ > http://twitter.com/Quesada > > |