You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: <Ala...@up...> - 2008-03-26 13:22:09
|
Congratulations, Aarne, for all the job you did during this period (and still now, of course!). But... et la bouteille de Champagne? Alain Selon Aarne Ranta <aa...@cs...>: > Dear GF Pioneers, and others as well, > > It was on 20 March, 1998, that the first public talk on GF was given > at INRIA Nancy. > Many of you were present, and probably many others, too. Some of you > were not present > but had already contributed to GF. > > To celebrate this occasion, I managed to find the Nancy slides and > even the source > code of GF version 0.1. Both documents now appear on the GF homepage, > digitalgrammars.com/gf, right in the beginning under the title "News". > They already seem to have the kind of patina that obsolete code and > documentation > should have... > > Thanks for all contributions, support, and comments during all these > years - > and happy Easter, of course! > > Regards > > Aarne. > > > > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. |
|
From: Aarne R. <aa...@cs...> - 2008-03-20 10:29:24
|
Dear GF Pioneers, and others as well, It was on 20 March, 1998, that the first public talk on GF was given at INRIA Nancy. Many of you were present, and probably many others, too. Some of you were not present but had already contributed to GF. To celebrate this occasion, I managed to find the Nancy slides and even the source code of GF version 0.1. Both documents now appear on the GF homepage, digitalgrammars.com/gf, right in the beginning under the title "News". They already seem to have the kind of patina that obsolete code and documentation should have... Thanks for all contributions, support, and comments during all these years - and happy Easter, of course! Regards Aarne. |
|
From: Henning T. <le...@he...> - 2007-04-27 16:22:03
|
Hi Aarne, On Fri, 27 Apr 2007, Aarne Ranta wrote: > > this one > > http://saxophone.jpberlin.de/Loschka?source=3Dhttp://www.cs.chalmers.s= e/%7Eaarne/GF/ > > Looks like Swedish children's "r=F6varspr=E5k", robber language, which wa= s > maybe invented by Astrid Lindgren. It makes > a ghastly effect when all the links you click are also translated. > The parallel web idea looks quite powerful! I should note that this is not my idea. :-) However, it's probably the first implementation in Haskell. > I meant: "is this stuff wine (that is) older than 10 years" > In German maybe "ist dies Wein =E4lterer als 10 Jahre"? I see. Indeed in German there is only an equivalent to "is this stuff wine that is older than 10 years" which is "Ist dieses Zeug Wein, der =E4lter als 10 Jahre ist" An abbreviated form without "that is" does not exist. Best regards, Henning |
|
From: Aarne R. <aa...@cs...> - 2007-04-27 16:12:51
|
Dear Henning, Thanks for the corrected lexicon - as you will see, I've now pushed it to darcs, and also given the authors' names including you. On Fri, 27 Apr 2007, Henning Thielemann wrote: > I had the same experience. I don't want to complain, since it is already > great, that this complicated task can be performed within a reasonable > time at all. However, I imagine I translate a real world document, this > will take certainly too much time. Maybe you know my toy translators, like Yes. We don't envisage to use these grammars to parse real-world documents, but to encode linguistic rules which sometimes can be quite abstract. The grammars in example/shallow are a first experiment in showing that they can be easily converted to more efficient grammars. But more work is needed of course. > this one > http://saxophone.jpberlin.de/Loschka?source=http://www.cs.chalmers.se/%7Eaarne/GF/ Looks like Swedish children's "rövarspråk", robber language, which was maybe invented by Astrid Lindgren. It makes a ghastly effect when all the links you click are also translated. The parallel web idea looks quite powerful! > Now imagine it would translate texts by GF, or does some grammar > checking, with several users accessing it in parallel. Maybe I should wait > for the next generation of CPUs or Haskell compilers. ;-) This has certainly solved some problems we had before ;-) We are also working on the next generation of GF grammar compiler. Unfortunately, some grammars are inherently complex, and their parsing complexity can grow to any exponent in the polynomial realm (the result of Peter Ljunglöf's thesis). Therefore grammar writers will not be freed from thinking about complexity. > > Is there a tutorial on how to resolve such "conflicts"? > Not really, yet. > GF> p -cat=Text "I run, because I am a sheep." > > However, > > GF> p -cat=Text "I run because I am a sheep." | linearize -lang=German > > Ich laufe weil ich ein Schaf bin. > > is almost correct, except the missing comma between 'laufe' and 'weil'. OK, we should add some commas to the German resource. > I want to note, that GF is here already more precise than many Germans who > say incorrectly > "Ich laufe, weil ich bin ein Schaf" > according to English word order! > > >>> This one works >>> >>>> parse -lang=LangEng "He hunts the sheep" | linearize -multi >>> >>> but this one does not: >>> >>>> parse -lang=LangEng "He does not hunt the sheep" | linearize -multi >>> >>> 457 msec >> >> LangEng has no contracted negations, but english/English.gf does. > > Nice. > > GF> parse -lang=English "He does not hunt the sheep" | linearize -lang=German > > EnglishAbs.UncNegCl: S EnglishAbs.TPres: Tense EnglishAbs.ASimul: Ant > (EnglishAbs.PredVP: Cl (EnglishAbs.UsePron: NP EnglishAbs.he_Pron: Pron) > (EnglishAbs.ComplV2: VP EnglishAbs.hunt_V2: V2 (EnglishAbs.DetCN: NP > (EnglishAbs.DetPl: Det (EnglishAbs.PlQuant: QuantPl EnglishAbs.DefArt: > Quant) EnglishAbs.NoNum: NumEnglishAbs.NoOrd: Ord) (EnglishAbs.UseN: CN > EnglishAbs.sheep_N: N )))) > > EnglishAbs.UncNegCl: S EnglishAbs.TPres: Tense EnglishAbs.ASimul: Ant > (EnglishAbs.PredVP: Cl (EnglishAbs.UsePron: NP EnglishAbs.he_Pron: Pron) > (EnglishAbs.ComplV2: VP EnglishAbs.hunt_V2: V2 (EnglishAbs.DetCN: NP > (EnglishAbs.DetSg: Det (EnglishAbs.SgQuant: QuantSg EnglishAbs.DefArt: > Quant) EnglishAbs.NoOrd: Ord) (EnglishAbs.UseN: CN EnglishAbs.sheep_N: N > )))) > > > I assume that this means, that there are some things not implemented in > German.gf, which are supported by English.gf. Yes, right. Each language has extensions not supported by all other languages. We should at some point write translations for such fragments. > > > Summarized, there seems to be more interesting stuff for beginners than > what I learned from the tutorial. I admit, that when reading > http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html > I did no longer read carefully when it came to the details of > grammars. The library documentation includes some more material. In a longer perspective, there should be something like a book. >>> must be "neunhundertachzig" >> >> You are right. We should place suitable unlexers in the numeral grammars. >> The main problem is to obtain a correct parsing again - i.e. to send >> neun, hundert, achzig as separate tokens to the parser. > > I see. That seems to be an ugly problem, because you need language > sensitive lexers. Exactly. We have no nice solutions to this yet. > >>> Strange sentences >>> ----------------- >>> >>>> p -lang=LangEng -fcfg "Is this man older than a tree" | l -multi >>> >>> Cet homme est plus vieux qu' un arbre >>> Is this man older than a tree >>> Ist dieser Mann älter als ein Baum >>> >>> --> this is correct German >>> >>> Ceci est de l' homme plus vieux qu' un arbre >>> Is this man older than a tree >>> Ist dies älterer als ein Baum Mann >>> >>> --> this sounds like Chinese German >> >> Do you mean it is completely ungrammatical or just strange? > > Completely ungrammatical and I don't know, what is meant. > >> The structure rendered here is the same as in >> >> sie will einen älteren als John Mann verheiraten > > I don't understand that one. :-( > > Do you mean > sie will einen älteren Mann als John heiraten > ? > >> which would perhaps be changed to >> >> sie will einen Mann älteren als John verheiraten >> >>> I don't understand the grammar tree of the above sentence, >>> so I can't judge whether the parsing went wrong or the linearization. >> >> Yes, it's a very strange parse, which is meant for mass terms: >> >> ist dies älterer als ... Wein > > ist dies älter als 2 Liter Wein ? I meant: "is this stuff wine (that is) older than 10 years" In German maybe "ist dies Wein älterer als 10 Jahre"? >>> Ein Mann läuft zu dem Haus >> >> Exactly. It is the translation equivalence again. > > How to resolve that? > By a more fine-grained semantic classification of nouns, and/or the use of parameters regulating the use of prepositions. >> Yes. There is a time-out that applies. You can also try >> >> gr -cf >> >> which is a more efficient (but less general) random generation >> strategy. > > I see. In this case it would be nice to have a regarding note in the > tutorial or README. Because when I compile the package on my machine, and > there arise several heap overflows, I am uncertain whether this is a build > problem or a known problem. Good point. Maybe there should be a separate web document on known performance issues. How much memory each operation is known to consume, etc. > Thanks for your answer and GF at all! > Henning Thanks for good questions and the German patches! Aarne. |
|
From: Henning T. <le...@he...> - 2007-04-27 15:31:28
|
Dear Aarne, On Tue, 24 Apr 2007, Aarne Ranta wrote: > On Tue, 24 Apr 2007, Henning Thielemann wrote: > > > like exchanging words by synonyms with the correct flexion. > > However, the speed of the standard parser > > currently seems not to allow the processing of real sized texts. > > (Parsing of short sentences need several seconds in English, > > and exceed heap in German.) > > You seem to refer to parsing in the resource grammar library. Parsing > is slow because the GF grammars expand to large parser grammars (the MCFG > format). On a typical modern computer, the first English sentence takes > 5 to 10 seconds, which includes building the MCFG. After that, later > sentences should be parsed in less than a second. > German takes much longer, and the Romance languages cannot be parsed > on my computer at all. I had the same experience. I don't want to complain, since it is already great, that this complicated task can be performed within a reasonable time at all. However, I imagine I translate a real world document, this will take certainly too much time. Maybe you know my toy translators, like this one http://saxophone.jpberlin.de/Loschka?source=3Dhttp://www.cs.chalmers.se/%= 7Eaarne/GF/ Now imagine it would translate texts by GF, or does some grammar checking, with several users accessing it in parallel. Maybe I should wait for the next generation of CPUs or Haskell compilers. ;-) > This is of course something that should be improved. But the resource > grammars have not been designed with parsing in mind. They > are intended as libraries, in application grammars with which parsing is > much more efficient. Even big grammars, such as > examples/big/BigShallowEng.gf, can be parsed efficiently if they > eliminate some of the disjoint-constitient structures than the resource > library (esp. the VP category). Is there a tutorial on how to resolve such "conflicts"? > > I'm also irritated that interpunction > > like periods, acclamation and question marks are not supported. > > They are included in the category Text. So try > > p -cat=3DText "What is this? A cat!" Works, great! > > Also the complexity of sentences seems to be limited, > > that is I'm not able to construct composed sentences with commas. > > Is this right, or am I using the functions the wrong way? > > Do you mean conjunctions? > > p -cat=3DS "I sleep, we sleep and they sleep" > > This works. GF> p -cat=3DText "I run." GF> p -cat=3DText "I am a sheep." work, but not GF> p -cat=3DText "I run, because I am a sheep." However, GF> p -cat=3DText "I run because I am a sheep." | linearize -lang=3DGerman Ich laufe weil ich ein Schaf bin. is almost correct, except the missing comma between 'laufe' and 'weil'. I want to note, that GF is here already more precise than many Germans who say incorrectly "Ich laufe, weil ich bin ein Schaf" according to English word order! > > This one works > > > >> parse -lang=3DLangEng "He hunts the sheep" | linearize -multi > > > > but this one does not: > > > >> parse -lang=3DLangEng "He does not hunt the sheep" | linearize -multi > > > > 457 msec > > LangEng has no contracted negations, but english/English.gf does. Nice. GF> parse -lang=3DEnglish "He does not hunt the sheep" | linearize -lang=3D= German EnglishAbs.UncNegCl: S EnglishAbs.TPres: Tense EnglishAbs.ASimul: Ant (EnglishAbs.PredVP: Cl (EnglishAbs.UsePron: NP EnglishAbs.he_Pron: Pron) (EnglishAbs.ComplV2: VP EnglishAbs.hunt_V2: V2 (EnglishAbs.DetCN: NP (EnglishAbs.DetPl: Det (EnglishAbs.PlQuant: QuantPl EnglishAbs.DefArt: Quant) EnglishAbs.NoNum: NumEnglishAbs.NoOrd: Ord) (EnglishAbs.UseN: CN EnglishAbs.sheep_N: N )))) EnglishAbs.UncNegCl: S EnglishAbs.TPres: Tense EnglishAbs.ASimul: Ant (EnglishAbs.PredVP: Cl (EnglishAbs.UsePron: NP EnglishAbs.he_Pron: Pron) (EnglishAbs.ComplV2: VP EnglishAbs.hunt_V2: V2 (EnglishAbs.DetCN: NP (EnglishAbs.DetSg: Det (EnglishAbs.SgQuant: QuantSg EnglishAbs.DefArt: Quant) EnglishAbs.NoOrd: Ord) (EnglishAbs.UseN: CN EnglishAbs.sheep_N: N )))) I assume that this means, that there are some things not implemented in German.gf, which are supported by English.gf. Summarized, there seems to be more interesting stuff for beginners than what I learned from the tutorial. I admit, that when reading http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html I did no longer read carefully when it came to the details of grammars. > > In GF/lib/resource-1.0/Makefile the closing -RTS option is missing. > > This has never been a problem since the RTS block is the last on the > command line - but sure, closing it would be more robust. I added some path options and then it became a problem. > > Numerals > > -------- > > > > Numerals below one million are written without spaces in German languag= e. > > (Duden 1982, rule 282) > > > >> gr -cat=3DSub1000 | l -lang=3DLangGer > > Neun hundert achzig > > > > must be "neunhundertachzig" > > You are right. We should place suitable unlexers in the numeral grammars. > The main problem is to obtain a correct parsing again - i.e. to send > neun, hundert, achzig as separate tokens to the parser. I see. That seems to be an ugly problem, because you need language sensitive lexers. > > Strange sentences > > ----------------- > > > >> p -lang=3DLangEng -fcfg "Is this man older than a tree" | l -multi > > > > Cet homme est plus vieux qu' un arbre > > Is this man older than a tree > > Ist dieser Mann =E4lter als ein Baum > > > > --> this is correct German > > > > Ceci est de l' homme plus vieux qu' un arbre > > Is this man older than a tree > > Ist dies =E4lterer als ein Baum Mann > > > > --> this sounds like Chinese German > > Do you mean it is completely ungrammatical or just strange? Completely ungrammatical and I don't know, what is meant. > The structure rendered here is the same as in > > sie will einen =E4lteren als John Mann verheiraten I don't understand that one. :-( Do you mean sie will einen =E4lteren Mann als John heiraten ? > which would perhaps be changed to > > sie will einen Mann =E4lteren als John verheiraten > > > I don't understand the grammar tree of the above sentence, > > so I can't judge whether the parsing went wrong or the linearization. > > Yes, it's a very strange parse, which is meant for mass terms: > > ist dies =E4lterer als ... Wein ist dies =E4lter als 2 Liter Wein ? Given that in my example 'Is this man older than a tree' the article 'a' is considered as numeral, just like in 'Is this man older than two tons of tree', then the German translation would be as well: "Ist dieser Mann =E4lter als zwei Tonnen Baum" > > Progressive forms > > ----------------- > > > >> p -lang=3DLangEng -fcfg "A man is running" | l -lang=3DLangGer > > Ein Mann l=E4uft eben > > > > This sounds unnatural. > > > > You could replace 'eben' by > > Ein Mann l=E4uft gerade > > this is ambigous, because 'gerade' means both 'now' and 'straight' > > Ein Mann l=E4uft zurzeit > > Ein Mann l=E4uft derzeit > > both of the last ones sound artificial > > > > You could drop the distinction between progressive and normal form and > > just write: > > Ein Mann l=E4uft > > I agree: other languages don't make so much use of progressives as > English. But once again, there is a restriction in what the resource > grammars can do: they do not guarantee translation equivalence, which > can only be granted in more limited application grammars. Of course, I cannot expect that translation from German back to English yields what I translated from English to German. However, when I translate something with rich information (English: progressive vs. no progressive) to something with poor information (German: no progressive at all), shouldn't this yield something reasonable? > > Prepositions > > ------------ > > > > I assume that correct prepositions are a non-trivial and ubiquitous pro= blem. > > Here is one instance: > > > >> p -lang=3DLangEng -fcfg "A man runs to the house" | l -lang=3DLangGer > > > > Ein Mann l=E4uft nach dem Haus > > > > must be > > > > Ein Mann l=E4uft zu dem Haus > > Exactly. It is the translation equivalence again. How to resolve that? > > Random trees > > ------------ > > > > When generating random phrases, I sometimes get > > > >> gr -cat=3DS > > not completed > > no tree found > > 1887 msec > > > > Does it mean, that this command does not always succeed > > in generating valid trees? > > Yes. There is a time-out that applies. You can also try > > gr -cf > > which is a more efficient (but less general) random generation > strategy. I see. In this case it would be nice to have a regarding note in the tutorial or README. Because when I compile the package on my machine, and there arise several heap overflows, I am uncertain whether this is a build problem or a known problem. Thanks for your answer and GF at all! Henning |
|
From: Aarne R. <aa...@cs...> - 2007-04-24 15:47:50
|
Dear Henning, Thanks for the very valuable comments on GF. Some of them will no doubt lead to fixes in GF implementation, grammars, and documents. Let me briefly comment on some of your points. With best regards Aarne. On Tue, 24 Apr 2007, Henning Thielemann wrote: > > I've recently tested GF and like to share some experiences and ideas with you. > > > Applications > ============ > > It is already very impressive what the Grammar Framework can do. > The translated sentences sound often very natural. > I'm looking forward maintaining multilingual documents > (Is there some tutorial on how to do that? > Also in connection with HTML or LaTeX?), > doing grammar checking of documents > or doing some automatic text processing, There is some work and also software on generating latex and html documents in the GF-KeY project, http://www.key-project.org/oclnl/ particularly in David Burke's MSc thesis http://www.key-project.org/oclnl/burke/ (the full thesis seems currently unreachable, but there is an article). > like exchanging words by synonyms with the correct flexion. > However, the speed of the standard parser > currently seems not to allow the processing of real sized texts. > (Parsing of short sentences need several seconds in English, > and exceed heap in German.) You seem to refer to parsing in the resource grammar library. Parsing is slow because the GF grammars expand to large parser grammars (the MCFG format). On a typical modern computer, the first English sentence takes 5 to 10 seconds, which includes building the MCFG. After that, later sentences should be parsed in less than a second. German takes much longer, and the Romance languages cannot be parsed on my computer at all. This is of course something that should be improved. But the resource grammars have not been designed with parsing in mind. They are intended as libraries, in application grammars with which parsing is much more efficient. Even big grammars, such as examples/big/BigShallowEng.gf, can be parsed efficiently if they eliminate some of the disjoint-constitient structures than the resource library (esp. the VP category). > I'm also irritated that interpunction > like periods, acclamation and question marks are not supported. They are included in the category Text. So try p -cat=Text "What is this? A cat!" > Also the complexity of sentences seems to be limited, > that is I'm not able to construct composed sentences with commas. > Is this right, or am I using the functions the wrong way? Do you mean conjunctions? p -cat=S "I sleep, we sleep and they sleep" This works. > This one works > >> parse -lang=LangEng "He hunts the sheep" | linearize -multi > > but this one does not: > >> parse -lang=LangEng "He does not hunt the sheep" | linearize -multi > > 457 msec LangEng has no contracted negations, but english/English.gf does. > When I found a Music directory in the GF package > I expected a grammar describing music, > which would allow generation of random music, say in Haskore, > by randomly generated trees from a music grammar. > I found that my phantasy did not come true, but nonetheless: > Do you know if something along these lines has been done by someone? Sounds like a nice idea! Maybe a way to get a hold of phrasing in music is to use a grammar. > When it comes to processing of unknown texts with unknown words - > Is it possible for GF to determine properties of words in a sentence, > that are not part of a dictionary? No. Only words recognized by the current grammar can be analysed. > Can GF derive the kind of the word (noun, verb, adjective, its flexion) > from the context of the sentence? > Can GF be told to simply accept unknown words > and linearize them as they are? > This is the usual way humans handle unknown words when translating texts. There is a lexer devoted to this task: it yields a tree of type String for each unknown word. With other lexers, string literals must be entered as quoted tokens. See 'h -lexer' for the currently available lexers. > Installation > ============ > > In examples/numerals/README it is explained how to generate the program > 'gft', > but: > > GF/src> make gft > "/usr/bin/ghc" --make -fglasgow-exts -package readline -DUSE_READLINE > -DUSE_INTERRUPT -itranslate translate/GFT.hs -o gft-bin > Chasing modules from: translate/GFT.hs > ghc-6.4.1: can't find file `translate/GFT.hs' > make: *** [gft] Fehler 1 > > It seems to be moved to GF/Translate/GFT.hs Thanks - we'll have to fix this. > > Then I start > > GF/examples/numerals> gf <mkNumerals.gfs > > and get lots of "parsing old ..." messages and finally a "See you." > Then I run > > GF/examples/numerals> gf numerals.gfcm > > gf: numerals.gfcm: openFile: does not exist (No such file or directory) > > I can only find numerals.Abs.gf . > I guess that numerals.gcfm must be generated by 'gf <mkNumerals', > but it was not. Strange - on my computer the file is generated. I see that the file mkNumerals.gfs does not end with a newline: maybe this is a problem on some platforms? > > > In GF/lib/resource-1.0/Makefile the closing -RTS option is missing. This has never been a problem since the RTS block is the last on the command line - but sure, closing it would be more robust. > I liked to run 'make' in GF/lib/resource-1.0/, > but this didn't succeed due to infinite swapping. > I added heap limiting options to the GF variable in the Makefile, > which are appropriate for my machine (+RTS -M384M -c30 -RTS). > This failed, too, due to heap overflow. Yes, you may need as much as 1000M. > I then switched back to the files from compiled.tar.gz. > However I felt that it is not good to be restricted > to the data shipped with GF. > lib/resource-1.0/README says that I shall start > 'gf -nocf langs.gfcm' > but the file langs.gfcm does not exists. > I hoped it would be in compiled.tar.gz. > I guess it must be build by 'make langs', but this fails due to heap > overflow. Good idea. Using it is much cheaper than building it. > I'm also afraid, 'make' used the files that I could not compile > instead of the pre-compiled ones. > > I then tried to play without langs.gfcm. > I started > > gf +RTS -M384M -c30 -RTS -path=prelude:present present/LangGer.gfc > present/LangEng.gfc present/LangFre.gfc > > and > > gf +RTS -M384M -c30 -RTS -path=prelude:alltenses alltenses/LangGer.gfc > alltenses/LangEng.gfc alltenses/LangFre.gfc > > and this works. > I think this should be noted in the README > for people who have problems with building the whole resource package. > It's also useful if you encounter errors in the grammar files, > that you like to fix quickly. > It is certainly not a good idea to rebuild langs.gcfm everytime. On the other hand, if anything else changes then langs.gfcm should change also. > Text processing > =============== > > Numerals > -------- > > Numerals below one million are written without spaces in German language. > (Duden 1982, rule 282) > >> gr -cat=Sub1000 | l -lang=LangGer > Neun hundert achzig > > must be "neunhundertachzig" > >> gr -cat=Sub1000000 | l -lang=LangGer > Ein hundert ein tausend neunzehn > > must be "einhunderteintausendneunzehn" You are right. We should place suitable unlexers in the numeral grammars. The main problem is to obtain a correct parsing again - i.e. to send neun, hundert, achzig as separate tokens to the parser. > Irregular verbs > --------------- > > In LexiconGer there are several verbs marked as regular, > which are actually irregular. > IrregGer seems to be more precise in this respect. > Can I send you a corrected LexiconGer as file or darcs patch? You are most welcome to do this! > Strange sentences > ----------------- > >> p -lang=LangEng -fcfg "Is this man older than a tree" | l -multi > > Cet homme est plus vieux qu' un arbre > Is this man older than a tree > Ist dieser Mann älter als ein Baum > > --> this is correct German > > Ceci est de l' homme plus vieux qu' un arbre > Is this man older than a tree > Ist dies älterer als ein Baum Mann > > --> this sounds like Chinese German Do you mean it is completely ungrammatical or just strange? The structure rendered here is the same as in sie will einen älteren als John Mann verheiraten which would perhaps be changed to sie will einen Mann älteren als John verheiraten > I don't understand the grammar tree of the above sentence, > so I can't judge whether the parsing went wrong or the linearization. Yes, it's a very strange parse, which is meant for mass terms: ist dies älterer als ... Wein > > Progressive forms > ----------------- > >> p -lang=LangEng -fcfg "A man is running" | l -lang=LangGer > Ein Mann läuft eben > > This sounds unnatural. > > You could replace 'eben' by > Ein Mann läuft gerade > this is ambigous, because 'gerade' means both 'now' and 'straight' > Ein Mann läuft zurzeit > Ein Mann läuft derzeit > both of the last ones sound artificial > > You could drop the distinction between progressive and normal form and > just write: > Ein Mann läuft I agree: other languages don't make so much use of progressives as English. But once again, there is a restriction in what the resource grammars can do: they do not guarantee translation equivalence, which can only be granted in more limited application grammars. > As noted by Bastian Sick ("Der Dativ ist dem Genitiv sein Tod"), > there is a progressive form in German slang using the preposition 'am': > Ein Mann ist am Laufen Nice - this should be included somewhere in the German extensions. > Prepositions > ------------ > > I assume that correct prepositions are a non-trivial and ubiquitous problem. > Here is one instance: > >> p -lang=LangEng -fcfg "A man runs to the house" | l -lang=LangGer > > Ein Mann läuft nach dem Haus > > must be > > Ein Mann läuft zu dem Haus Exactly. It is the translation equivalence again. > Random trees > ------------ > > When generating random phrases, I sometimes get > >> gr -cat=S > not completed > no tree found > 1887 msec > > Does it mean, that this command does not always succeed > in generating valid trees? Yes. There is a time-out that applies. You can also try gr -cf which is a more efficient (but less general) random generation strategy. |
|
From: Henning T. <le...@he...> - 2007-04-24 12:17:51
|
I've recently tested GF and like to share some experiences and ideas with y=
ou.
Applications
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
It is already very impressive what the Grammar Framework can do.
The translated sentences sound often very natural.
I'm looking forward maintaining multilingual documents
(Is there some tutorial on how to do that?
Also in connection with HTML or LaTeX?),
doing grammar checking of documents
or doing some automatic text processing,
like exchanging words by synonyms with the correct flexion.
However, the speed of the standard parser
currently seems not to allow the processing of real sized texts.
(Parsing of short sentences need several seconds in English,
and exceed heap in German.)
I'm also irritated that interpunction
like periods, acclamation and question marks are not supported.
Also the complexity of sentences seems to be limited,
that is I'm not able to construct composed sentences with commas.
Is this right, or am I using the functions the wrong way?
This one works
> parse -lang=3DLangEng "He hunts the sheep" | linearize -multi
Il chasse les moutons
He hunts the sheep
Er jagt die Schafe
Il chasse le mouton
He hunts the sheep
Er jagt das Schaf
390 msec
but this one does not:
> parse -lang=3DLangEng "He does not hunt the sheep" | linearize -multi
457 msec
When I found a Music directory in the GF package
I expected a grammar describing music,
which would allow generation of random music, say in Haskore,
by randomly generated trees from a music grammar.
I found that my phantasy did not come true, but nonetheless:
Do you know if something along these lines has been done by someone?
When it comes to processing of unknown texts with unknown words -
Is it possible for GF to determine properties of words in a sentence,
that are not part of a dictionary?
Can GF derive the kind of the word (noun, verb, adjective, its flexion)
from the context of the sentence?
Can GF be told to simply accept unknown words
and linearize them as they are?
This is the usual way humans handle unknown words when translating texts.
Installation
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
In examples/numerals/README it is explained how to generate the program
'gft',
but:
GF/src> make gft
"/usr/bin/ghc" --make -fglasgow-exts -package readline -DUSE_READLINE
-DUSE_INTERRUPT -itranslate translate/GFT.hs -o gft-bin
Chasing modules from: translate/GFT.hs
ghc-6.4.1: can't find file `translate/GFT.hs'
make: *** [gft] Fehler 1
It seems to be moved to GF/Translate/GFT.hs
Then I start
GF/examples/numerals> gf <mkNumerals.gfs
and get lots of "parsing old ..." messages and finally a "See you."
Then I run
GF/examples/numerals> gf numerals.gfcm
gf: numerals.gfcm: openFile: does not exist (No such file or directory)
I can only find numerals.Abs.gf .
I guess that numerals.gcfm must be generated by 'gf <mkNumerals',
but it was not.
In GF/lib/resource-1.0/Makefile the closing -RTS option is missing.
I liked to run 'make' in GF/lib/resource-1.0/,
but this didn't succeed due to infinite swapping.
I added heap limiting options to the GF variable in the Makefile,
which are appropriate for my machine (+RTS -M384M -c30 -RTS).
This failed, too, due to heap overflow.
I then switched back to the files from compiled.tar.gz.
However I felt that it is not good to be restricted
to the data shipped with GF.
lib/resource-1.0/README says that I shall start
'gf -nocf langs.gfcm'
but the file langs.gfcm does not exists.
I hoped it would be in compiled.tar.gz.
I guess it must be build by 'make langs', but this fails due to heap
overflow.
I'm also afraid, 'make' used the files that I could not compile
instead of the pre-compiled ones.
I then tried to play without langs.gfcm.
I started
gf +RTS -M384M -c30 -RTS -path=3Dprelude:present present/LangGer.gfc
present/LangEng.gfc present/LangFre.gfc
and
gf +RTS -M384M -c30 -RTS -path=3Dprelude:alltenses alltenses/LangGer.gfc
alltenses/LangEng.gfc alltenses/LangFre.gfc
and this works.
I think this should be noted in the README
for people who have problems with building the whole resource package.
It's also useful if you encounter errors in the grammar files,
that you like to fix quickly.
It is certainly not a good idea to rebuild langs.gcfm everytime.
Text processing
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Numerals
--------
Numerals below one million are written without spaces in German language.
(Duden 1982, rule 282)
> gr -cat=3DSub1000 | l -lang=3DLangGer
Neun hundert achzig
must be "neunhundertachzig"
> gr -cat=3DSub1000000 | l -lang=3DLangGer
Ein hundert ein tausend neunzehn
must be "einhunderteintausendneunzehn"
Irregular verbs
---------------
In LexiconGer there are several verbs marked as regular,
which are actually irregular.
IrregGer seems to be more precise in this respect.
Can I send you a corrected LexiconGer as file or darcs patch?
Strange sentences
-----------------
> p -lang=3DLangEng -fcfg "Is this man older than a tree" | l -multi
Cet homme est plus vieux qu' un arbre
Is this man older than a tree
Ist dieser Mann =E4lter als ein Baum
--> this is correct German
Ceci est de l' homme plus vieux qu' un arbre
Is this man older than a tree
Ist dies =E4lterer als ein Baum Mann
--> this sounds like Chinese German
866 msec
I don't understand the grammar tree of the above sentence,
so I can't judge whether the parsing went wrong or the linearization.
> p -lang=3DLangEng -fcfg "Is this man older than a tree"
PhrUtt NoPConj (UttQS (UseQCl TPres ASimul PPos (QuestCl (PredVP (DetCN
(DetSg (SgQuant this_Quant) NoOrd) (UseN man_N)) (UseComp (CompAP (ComparA
old_A (DetCN (DetSg (SgQuant IndefArt) NoOrd) (UseN tree_N))))))))) NoVoc
PhrUtt NoPConj (UttQS (UseQCl TPres ASimul PPos (QuestCl (PredVP this_NP
(UseComp (CompNP (DetCN (DetSgMassDet NoOrd) (AdjCN (ComparA old_A (DetCN
(DetSg (SgQuant IndefArt) NoOrd) (UseN tree_N))) (UseN man_N)))))))))
NoVoc
445 msec
Progressive forms
-----------------
> p -lang=3DLangEng -fcfg "A man is running" | l -lang=3DLangGer
Ein Mann l=E4uft eben
This sounds unnatural.
You could replace 'eben' by
Ein Mann l=E4uft gerade
this is ambigous, because 'gerade' means both 'now' and 'straight'
Ein Mann l=E4uft zurzeit
Ein Mann l=E4uft derzeit
both of the last ones sound artificial
You could drop the distinction between progressive and normal form and
just write:
Ein Mann l=E4uft
As noted by Bastian Sick ("Der Dativ ist dem Genitiv sein Tod"),
there is a progressive form in German slang using the preposition 'am':
Ein Mann ist am Laufen
Prepositions
------------
I assume that correct prepositions are a non-trivial and ubiquitous problem=
=2E
Here is one instance:
> p -lang=3DLangEng -fcfg "A man runs to the house" | l -lang=3DLangGer
Ein Mann l=E4uft nach dem Haus
must be
Ein Mann l=E4uft zu dem Haus
Random trees
------------
When generating random phrases, I sometimes get
> gr -cat=3DS
not completed
no tree found
1887 msec
Does it mean, that this command does not always succeed
in generating valid trees?
|
|
From: Aarne R. <aa...@cs...> - 2006-12-22 15:22:53
|
Dear All, GF Version 2.7 has been released. Some highlights: - JavaScript and VoiceXML generation. These together support the generation of a complete dialogue system from grammar. - Overloading and a new library API. - GFCC format. - C code generation for ultimate efficiency with the GFCC format. - Resource library version 1.1: extensions and bug fixes to 1.0. Merry Christmas! Aarne and Bj=F6rn. |
|
From: Aarne R. <aa...@cs...> - 2006-06-23 08:18:43
|
Now available at Source Forge. Highlights: fast parser (FCFG) by Krasimir Angelov resource grammar library version 1.0 bug fixes Regards Aarne. |
|
From: <bri...@cs...> - 2006-03-23 15:33:41
|
We are pleased to announce the release of version 2.5 of Grammatical=20
Framework.
Some highlights of version 2.5:
* Treebank generation and reuse.
* Regular expression patterns.
* Resource Grammar Library v. 1.0 beta release.
See http://www.cs.chalmers.se/~aarne/GF/doc/gf-history.html for more=20
details.
You can download the new version at=20
http://sourceforge.net/project/showfiles.php?group_id=3D132285.
/Bj=F6rn
|
|
From: Aarne R. <aa...@cs...> - 2005-12-22 20:32:46
|
Some highlights: * Speech input. * Transfer modules. * Probabilistic grammars. See http://www.cs.chalmers.se/~aarne/GF/ Merry Christmas! Aarne. |
|
From: <bri...@cs...> - 2005-11-30 13:41:42
|
Hi, GF has now moved to a world-readable darcs repository. Instructions on=20 how to use it are available at: http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/da= rcs.html Happy hacking! /Bj=F6rn |
|
From: David H. <dh...@li...> - 2005-06-23 13:22:03
|
I would like to write a newline character to a file using the GF "wf" command. Is that possible? /David |
|
From: Bjorn B. <bri...@cs...> - 2005-05-13 09:27:14
|
David Hjelm wrote: > So are there plans for putting GF sources up on sourceforge as well? We are planning to put the downloads on SourceForge, but the CVS=20 repository will probably remain on internal Chalmers servers because we=20 have more control over it there, and it seems more reliable than the SF=20 CVS service, which has been a little unreliable in the past. However, to provide regular access to the development version of GF,=20 there is an experimental darcs [1] repo at=20 http://www.cs.chalmers.se/~bringert/darcs/GF/ which is synced every=20 night to the CVS repository. You need to install darcs in order to check out the code. If you want to=20 view individual files, you can simply look at the repo in a web browser. To check out the first time, use: $ darcs get http://www.cs.chalmers.se/~bringert/darcs/GF/ To update your copy to the current version, enter the GF directory and ru= n: $ darcs pull -a To procedure for building is the same as when building from CVS: you=20 have to run autoconf before running configure. /Bj=F6rn [1] http://www.darcs.net/ |
|
From: David H. <dh...@li...> - 2005-05-13 08:42:52
|
So are there plans for putting GF sources up on sourceforge as well? /David |