Menu

How to compare links containing space or underscore

Help
2015-02-24
2015-02-25
  • Juergen Thomas

    Juergen Thomas - 2015-02-24

    Can you tell the best way to look for a given link in all links of a page if the link contains spaces or underscores? Example in German wikibooks https://de.wikibooks.org :

    1- PageList.FillFromAllPages("Wikijunior Europa" ...);
    2- Remove manually all pages that do not start with "Wikijunior Europa/"
    3- The issue is to look for all pages in this page list that contain [[File:Flag of Armenia.svg]].
    3a- string s = "Datei:Flag of Armenia.svg"; // with spaces
    3b- foreach(Page p in PagesList)
    3c- p.Load()
    3d- listOfLinks = p.GetAllLinks();
    3e- if(listOfLinks.Contains(s)) >> OK, 2 pages will be found

    You may neglect that s has to use four cases: Datei/Bild/File/Image. But if you use "Datei:Flag_of_Armenia.svg" with underscores, no one page will be found. What would be the best way to check such a situation?

    • replace space by underscore in listOfLinks and s before the contains instruction
    • use Bot.UrlEncode in listOfLinks and s before the contains instruction
    • use a RegEx or Linq feature instead of Contains (I have few knowledge of RegEx and none of Linq)
    • add a feature to the Bot framework to standardize links and return all links by that feature (optionally by an additional bool parameter)

    I know that a work-around is better in this specific situation: first, PageList.FillFromPagesUsingImage - next remove all pages that do not match the requested page name. But my bot is looking for all pages that match a general condition (page name, category, etc.) and then removing all pages that don't match a more specific condition. Moreover, the problem space vs. underscore may appear in many other situations. Therefore I ask for a more general solution.

    Thanks in advance for hints, Juergen

     
  • CodeDriller

    CodeDriller - 2015-02-24

    You can do just:

    if(listOfLinks.Contains(s) || listOfLinks.Contains(s.Replace(' ','_')))
    

    I'll add this correction to FillFromPageLinks().

     
  • Juergen Thomas

    Juergen Thomas - 2015-02-25

    Thank you for your help and extending FillFromPageLinks(). Juergen

    This problem is resolved.

     

Log in to post a comment.