Menu

#68 Can we stop inserting "." in deformatters? (Or make it optional)

closed
nobody
2015-09-30
2015-04-16
No
$ echo hi | apertium-destxt
hi.[][
]

apertium-destxt inserts text that wasn't here: a period before EOF and at empty lines. Similarly, apertium-deshtml puts periods after
's and such.

Maybe this helps with handling certain headlines, but it makes too many assumptions (and often headlines aren't marked in such a way, or already have punctuation, or the language doesn't even use "." as end-of-sentence markers), and it can be a real annoyance. Can we remove it?

1 Attachments

Related

Tickets: #68

Discussion

  • Mikel L. Forcada

    Kevin,
    how can I comment on this ticket through SF?
    Mikel

    2015-04-16 12:17 GMT+02:00 Kevin Brubeck Unhammer unhammer@users.sf.net:


    Status: open
    Milestone: 1.0
    Labels: format handling
    Created: Thu Apr 16, 2015 10:17 AM UTC by Kevin Brubeck Unhammer
    Last Updated: Thu Apr 16, 2015 10:17 AM UTC
    Owner: nobody

    $ echo hi | apertium-destxthi.[][]

    apertium-destxt inserts text that wasn't here: a period before EOF and at
    empty lines. Similarly, apertium-deshtml puts periods after
    's and such.

    Maybe this helps with handling certain headlines, but it makes too many
    assumptions (and often headlines aren't marked in such a way, or already
    have punctuation, or the language doesn't even use "." as end-of-sentence
    markers), and it can be a real annoyance. Can we remove it?


    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/apertium/tickets/68/

    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/

    --
    Mikel L. Forcada E-mail: mlf@dlsi.ua.es
    Departament de Llenguatges Phone: +34-96-590-9776
    i Sistemes Informàtics also +34-96-590-3772.
    UNIVERSITAT D'ALACANT Fax: +34-96-590-9326, -3464
    E-03071 ALACANT, Spain.

    URL: http://www.dlsi.ua.es/~mlf

     

    Related

    Tickets: #68

    • Jim O'Regan

      Jim O'Regan - 2015-04-16

      Through the link to the ticket: http://sourceforge.net/p/apertium/tickets/68
      but replying to the email seems to work, too.

       
    • Kevin Brubeck Unhammer

      You just did :-)

       
  • Jim O'Regan

    Jim O'Regan - 2015-04-16

    I'm more for making it a (non-default) option.

     
    • Kevin Brubeck Unhammer

      We need to insert the [] i^Hunconditionally though; I tried running without that but it messes up other tools down the line

       
      • Kevin Brubeck Unhammer

        maybe

         
  • Jim O'Regan

    Jim O'Regan - 2015-04-16

    Ah, ok. Updated.

     
    • Kevin Brubeck Unhammer

      :-) you commit? I see no downsides …

       
  • Jim O'Regan

    Jim O'Regan - 2015-04-16

    Mikel clearly had a comment to add, I'm going to wait to read it.

     
    • Sergio Ortiz

      Sergio Ortiz - 2015-04-16

      For some languages it will degrade the performance of the POS tagger and apply wrong rules to sentences when translating XML-based document types due to the lack of explicit boundaries in titles, lists and other sentences.

      As you might see, the “.[]” is not introduced always but when it makes sense.

      The string being inserted is .[]. Anyone interested can remove the string with a simple sed line.

      Sergio

      El 16/4/2015, a las 13:33, Jimmy O Regan jimregan@users.sf.net escribió:

      Mikel clearly had a comment to add, I'm going to wait to read it.


      ** [tickets:#68] Can we stop inserting "." in deformatters? (Or make it optional)**

      Status: open
      Milestone: 1.0
      Labels: format handling
      Created: Thu Apr 16, 2015 10:17 AM UTC by Kevin Brubeck Unhammer
      Last Updated: Thu Apr 16, 2015 11:22 AM UTC
      Owner: nobody

      $ echo hi | apertium-destxt
      hi.[][
      ]

      apertium-destxt inserts text that wasn't here: a period before EOF and at empty lines. Similarly, apertium-deshtml puts periods after
      's and such.

      Maybe this helps with handling certain headlines, but it makes too many assumptions (and often headlines aren't marked in such a way, or already have punctuation, or the language doesn't even use "." as end-of-sentence markers), and it can be a real annoyance. Can we remove it?


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/apertium/tickets/68/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Tickets: #68

      • Kevin Brubeck Unhammer

        $ echo "This is a single period: .<br/>" | apertium-deshtml
        This is a single period: ..[][<br\/>
        ]
        
         
        • Kevin Brubeck Unhammer

          but yeah, if we only remove it when the user specifies it, everyone should be ok?

           
          • Sergio Ortiz

            Sergio Ortiz - 2015-04-16

            A new option is always welcomed :)

            El 16/4/2015, a las 14:34, Kevin Brubeck Unhammer unhammer@users.sf.net escribió:

            but yeah, if we only remove it when the user specifies it, everyone should be ok?


            ** [tickets:#68] Can we stop inserting "." in deformatters? (Or make it optional)**

            Status: open
            Milestone: 1.0
            Labels: format handling
            Created: Thu Apr 16, 2015 10:17 AM UTC by Kevin Brubeck Unhammer
            Last Updated: Thu Apr 16, 2015 11:33 AM UTC
            Owner: nobody

            $ echo hi | apertium-destxt
            hi.[][
            ]

            apertium-destxt inserts text that wasn't here: a period before EOF and at empty lines. Similarly, apertium-deshtml puts periods after
            's and such.

            Maybe this helps with handling certain headlines, but it makes too many assumptions (and often headlines aren't marked in such a way, or already have punctuation, or the language doesn't even use "." as end-of-sentence markers), and it can be a real annoyance. Can we remove it?


            Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/apertium/tickets/68/

            To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

             

            Related

            Tickets: #68

            • Jim O'Regan

              Jim O'Regan - 2015-04-16

              So... you're ok with my patch? (It adds a new option, -n, to remove the dot; it is not enabled by default)

               
              • Sergio Ortiz

                Sergio Ortiz - 2015-04-17

                Ok

                El 16/4/2015, a las 21:43, Jimmy O Regan jimregan@users.sf.net escribió:

                So... you're ok with my patch? (It adds a new option, -n, to remove the dot; it is not enabled by default)


                ** [tickets:#68] Can we stop inserting "." in deformatters? (Or make it optional)**

                Status: open
                Milestone: 1.0
                Labels: format handling
                Created: Thu Apr 16, 2015 10:17 AM UTC by Kevin Brubeck Unhammer
                Last Updated: Thu Apr 16, 2015 11:33 AM UTC
                Owner: nobody

                $ echo hi | apertium-destxt
                hi.[][
                ]

                apertium-destxt inserts text that wasn't here: a period before EOF and at empty lines. Similarly, apertium-deshtml puts periods after
                's and such.

                Maybe this helps with handling certain headlines, but it makes too many assumptions (and often headlines aren't marked in such a way, or already have punctuation, or the language doesn't even use "." as end-of-sentence markers), and it can be a real annoyance. Can we remove it?


                Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/apertium/tickets/68/

                To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

                 

                Related

                Tickets: #68

                • Jim O'Regan

                  Jim O'Regan - 2015-04-17

                  Great, committed in r59917.

                   
  • Jim O'Regan

    Jim O'Regan - 2015-04-17
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,3 @@
    -
         $ echo hi | apertium-destxt
         hi.[][
         ]
    
    • status: open --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB