Would someone have any suggestions about the best/easiest way to download the Media files included in a page (i.e. the files included in the [[Media:...] tag) ?
For downloading images I use getImages()/downloadImage() but for Media files I am not too sure about the best way to proceed.
Any help appreciated.
Currently there is no special method to get files included in the [[Media:...]] tags, so it should be done manually. Just add "Media" tag on line 443 of DotNetWikiBot.cs file (as of version 2.65) like that:
wikiImageRE = new Regex(@"\[\[(?i)((File|Image|Media" +
After that, and after recompiling the DotNetWikiBot.cs file, getImages() function will "see" files in [[Media:...]] tags.
DownloadImage() function downloads audio/video files correctly.
Thanks a lot, this tip really helped and now all the media files are downloaded.
Nonetheless I am experiencing 2 problems:
1) The downloaded 'pdf' files are all corrupted : when opening them with a regular text editor it appears their content is an xhtml document.
2) Sometimes there are weird special characters at the end of the downloaded file name.
Any ideas what could go wrong ?
It seems that DownloadImage() function needs some upgrade too to properly handle PDF files. Please, download new version of that function from CVS: http://dotnetwikibot.cvs.sourceforge.net/viewvc/\*checkout*/dotnetwikibot/framework/DotNetWikiBot.cs
The downloaded pdf files are OK now, many thanks !
2 small issues:
1) For this to work I had to add again "|Media" line 443. Would it be possible for you to keep this in the new release or maybe have it as an optional parameter somewhere ?
2) There are still sometimes (around 10% of downloaded files) a weird special character appended at the end of the file name. When I copy/paste this character in WordPad it looks like a vertical bar with a very small right arrow at the top of the vertical bar, something a bit like this:
Any ideas where this special character can come from ?
1) I'll try to find some way to implement that.
2) I have no idea why that happens. Maybe I could say something if I see your code.
1) OK many thanks for that :-)
2) After further investigation I found out the special characters were in the wikitext itself. I didn't spot them before because the Firefox textbox doesn't display them but when I copied/pasted the text from the Firefox textbox to WordPad they suddenly appeared. So I just removed these special characters from the wikitext and now it works fine so it was not a code problem just a data problem. How these special characters ended up in the wikitext in the first place for now remains a mystery, maybe some copy/paste operations that went pear-shaped.
Anyway thank you for your help !
New function was added in version 2.7:
GetImagesEx(bool withNameSpacePrefix, bool includeFileLinks)
I've tested version 2.7 and the includeFileLinks parameter in GetImagesEx() works great indeed, many thanks for that :-)
Also, I previously had a problem with downloading a file which had an ampersand in its name but now it works OK with version 2.7. Thank you for this fix as well :-)
All the best,
Glad to be of assistance.