Questions on Parser and Fields

Help
spabhat
2010-05-05
2012-09-13
  • spabhat

    spabhat - 2010-05-05

    Hi Again,

    Thanks for your previous answers it was quite helpful.

    I have some questions related to web page parsing.

    1. Is it possible to index only the content specified between certain tags, like googlemini does:
      ex: <GoogleOff> .... </GoogleOff> --- unwanted content

    <GoogleOn> ... <GoogleOn> -- On

    SOMETHING HERE
    

    ly this will be parsed

    1. In the returned fields is it possible to have extra fields from Meta tags.
      Like:

    --- meta name="zipcode" content="45212,45208,45218"
    --- meta name="keywords" content="opensearch, search server"
    --- meta name="author" content="kim"
    if possible could you please guide how?

    Can this be made automatic? Like converting all meta tags into return fields?

     
  • spabhat

    spabhat - 2010-05-05

    Sorry strangely there was something wrong and some of my query were displayed
    incorrectly.

    Here are my questions again:

    1. Is it possible to index only the content specified between certain tags, like googlemini does:
      ex: <GoogleOff> .... </GoogleOff> --- unwanted content and GoogleMini ignores
      these content

    <GoogleOn> ... <GoogleOn> -- Only the content inside this will be parsed
    parsed by GoogleMini

    1. In the returned fields is it possible to have extra fields from Meta tags.
      Like:

    --- <meta name="zipcode" content="45212,45208,45218"/>
    --- <meta name="keywords" content="opensearch, search server"/>
    --- <meta name="author" content="kim"/>
    if possible could you please guide how?

    Can this be made automatic? Like converting all meta tags into return fields?

     
  • Emmanuel Keller

    Emmanuel Keller - 2010-05-06

    Currently, the 1.1 branch does not provide this.

    I think that both of your questions are very interesting features. I will add
    <OssOn/> and <OssOff/> tags support in the 1.2 branch.

    We plan to release the first 1.2 beta version next week. Perhaps we have time
    to implements it.

    Extracting the meta informations requires to implement a dynamic way to create
    fields in HTML parser. It's a good idea too...

     
  • spabhat

    spabhat - 2010-05-06

    Hi Emmanuel,

    This is really a great news indeed. I am keenly looking forward for these
    features. It is also interesting to have the ability to preserve some HTMl
    tags that are within some specific tag. This might be very useful if we would
    wish to show some of the images from the specific page being searched :)

    I am keenly looking forward for the 1.2 beta.

    I almost got most of it's features I really really loved this product. Thank
    you for your great work.

     
  • Emmanuel Keller

    Emmanuel Keller - 2010-05-15

    Thank you for your support.

    The first 1.2 developer release is available. We have implemented the <oss ignore="yes"> feature.

    [http://sourceforge.net/projects/opensearchserve/files/Developer_release/1.2/]
    (http://sourceforge.net/projects/opensearchserve/files/Developer_release/1.2/)

    Let us know if it works for you.

     
  • spabhat

    spabhat - 2010-05-17

    Hi Emmanuel,

    Today I tested, I see that this feature works really perfect. I also noticed
    two new important options the new Privileges tab for security, as well as the
    option to Erase an index. These are really great, and they are already working
    perfect.

    I am also looking out to test the new meta tag to fields implementation :)

    Thank you for your efforts.

     
  • spabhat

    spabhat - 2010-05-17

    By the way,

    Seems like the PHP API do not have the updated code to support the new
    privileges, or am I looking in some wrong location. I am not yet sure.

    I also tried a simple query like the following url:

    http://192.168.1.4:8080/select?use=pbtest&query=mission

    This returned the following exception:

    com.jaeksoft.searchlib.web.ServletException:
    com.jaeksoft.searchlib.SearchLibException: Bad credential

    So how do I make a query or how should the API key be added, could you let me
    know?

    Thanking you in advance.

     
  • Pascal MERCIER

    Pascal MERCIER - 2010-05-17

    Hello,

    the credential support are not implemented in PHP API for the moment.

    I'll have time to implement all the evolutions the next week.

     
  • spabhat

    spabhat - 2010-05-21

    Thanks for the information. Yes, I am able to search using this way.

     
  • Emmanuel Keller

    Emmanuel Keller - 2010-05-26

    We provide now an alternative way to the <oss ignore="yes"> feature. The goal
    is to preserve the validity of XHTML pages.

    You can now use:

    <div class="opensearchserver.ignore">
        <p>This text should not be indexed.</p>
    </div>
    
     
  • Emmanuel Keller

    Emmanuel Keller - 2010-05-27

    The meta tag to fields implementation is available for testing (1.2 revision
    759).

    You can see HTML sample here.

    http://www.open-search-server.com/test/

     
  • Pascal MERCIER

    Pascal MERCIER - 2010-05-27

    Hello,

    credentials have been added to the OSS_Search.class.php.

    OSS_API.class.php will be commited once I've added support for Schema API.

     
  • spabhat

    spabhat - 2010-05-28

    Hello,

    These are truly great news. Thank you for these information.

    I hope to test these features soon and I will surely update you on the same.

    Thank you once again

     
  • spabhat

    spabhat - 2010-06-18

    Today I made a test to check the automated field generation from the meta
    tags. I however could not get any idea where I can see these fields being
    listed!

    I used the exact same content as given by you in this url: http://www.open-
    search-server.com/test/oss-field.html.

    I even checked the xml output that we get from the following query:

    [http://localhost:8080/select?use=myIndex&query=text&login=oss&key=8f35dce...]
    (http://localhost:8080/select?use=myIndex&query=text&login=oss&key=8f35dce...)

    I even checked in the backend "Returned Fields" Tab inside the "Query" tab

    I still could not find where to find the field named "ccategory".

    Could you please let me know how and where to find these fields and how to get
    the field values?

    Thanking you.

     
  • Emmanuel Keller

    Emmanuel Keller - 2010-06-20

    The fields are not created dynamically. Did you create the field in the schema
    ?

     
  • spabhat

    spabhat - 2010-06-21

    I went to the schema tab, and tried to add the "category field as follows:

    Name: category

    Indexed: yes

    Stored: yes

    TermVector:positions_offsets

    Analyzer: StandardAnalyzer

    I got the error saying "Unknown exception: java.lang.NullPointerException."
    What could be the problem, could you please let me know the exact procedure?

     
  • spabhat

    spabhat - 2010-06-21

    Strange, just now I was able to add the category field. However to my surprise
    I see the "NullPointerException" for whatever operation I do in the backend!

    What could be going wrong? Any idea?

     
  • spabhat

    spabhat - 2010-07-08

    Hi Emmanuel,

    These features are working pretty cool.

    By the way, I have a a small suggestion, if you don't mind.

    It would have been nice if we could omit the login and the key being passed
    along with the URL. Otherwise when we start querying for different fields, the
    URL length might exceed the limit.

     
  • Emmanuel Keller

    Emmanuel Keller - 2010-07-11

    Good suggestion.

    A first workaround is to use the POST method. OSS handled both HTTP GET and
    POST.

    I will add a "no password" option for user. Sill need to pass the
    login=username, but no more API key.

     
  • spabhat

    spabhat - 2010-07-15

    Hi Emmanuel,

    Thank you for the information.

    By the way, I noticed another issue. Today I updated to the latest revision
    822.

    In the backend I went to > Query > Faceted Fields > selected new field "title"

    Add field.

    Then if I hit the Search button, and I get a Message box titled "ZK" with the
    message "3" !

     
  • spabhat

    spabhat - 2010-07-15

    Well I found the issue.

    For a field to be a Facet I had to ensure that TermVector is set to "No".

     
  • Emmanuel Keller

    Emmanuel Keller - 2010-07-19

    Let me correct that. For field to be a facet, it only need to be indexable
    ("Indexation of content" checked in the schema tab panel).

     

Log in to post a comment.