multiword-expressions Mailing List for Multiword Expressions

Brought to you by: ceramisch, jelenamitrovic, preslav, schtepf, stukot

multiword-expressions — OLD mailing list for the Multiword Expression community

This list is closed, nobody may subscribe to it.

2008	Jan	Feb (24)	Mar (1)	Apr (1)	May	Jun (1)	Jul	Aug (4)	Sep (3)	Oct	Nov	Dec
2009	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb (1)	Mar (1)	Apr	May (2)	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec (1)
2011	Jan (1)	Feb	Mar (2)	Apr	May	Jun (1)	Jul (1)	Aug (1)	Sep (1)	Oct (1)	Nov (3)	Dec (1)
2012	Jan (1)	Feb (5)	Mar (3)	Apr (1)	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2013	Jan (2)	Feb (4)	Mar (1)	Apr (2)	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (1)	Nov (2)	Dec (4)
2014	Jan (6)	Feb (1)	Mar (3)	Apr	May (2)	Jun (1)	Jul (2)	Aug (6)	Sep (1)	Oct (9)	Nov (2)	Dec (2)
2015	Jan (4)	Feb (4)	Mar (14)	Apr (4)	May (1)	Jun (1)	Jul (1)	Aug	Sep	Oct (2)	Nov (3)	Dec
2016	Jan (3)	Feb (2)	Mar	Apr (3)	May (4)	Jun (1)	Jul (1)	Aug (2)	Sep (1)	Oct (3)	Nov (2)	Dec (2)
2017	Jan (4)	Feb	Mar (1)	Apr	May (7)	Jun (1)	Jul	Aug (1)	Sep (1)	Oct (1)	Nov (1)	Dec (3)
2018	Jan (7)	Feb (3)	Mar (5)	Apr (12)	May (1)	Jun (2)	Jul (2)	Aug (2)	Sep (1)	Oct	Nov (1)	Dec (2)
2019	Jan (3)	Feb (4)	Mar (4)	Apr (5)	May (2)	Jun (2)	Jul (2)	Aug	Sep (2)	Oct	Nov (1)	Dec (2)
2020	Jan (3)	Feb (4)	Mar (4)	Apr (2)	May	Jun (2)	Jul (2)	Aug	Sep (2)	Oct (3)	Nov (1)	Dec (2)
2021	Jan (6)	Feb (1)	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

1 2 3 .. 12 > >> (Page 1 of 12)

[SIGLEX-MWE-OLD] LAST REMINDER: Mailing list shutting down in 3, 2, 1...

From: Carlos R. <car...@li...> - 2021-03-08 18:49:41

Dear all,

This list has been the main communication tool of the MWE community for 14
years or so.
We will proceed to *shutting down this list in the next couple of days*.
The archive should still be available, but the list will no longer be
active.

The new mailing list <https://groups.google.com/g/siglex-mwe-members> now
has more than 240 members: thanks for having re-registered.
If you have not subscribed yet, feel free to do it at any time to avoid
missing the latest news.
To do so, simply register as a SIGLEX member <https://siglex.org/members>,
it's simple and free.

All the best
Carlos, on behalf of the SIGLEX-MWE Section

On Thu, Feb 18, 2021 at 9:24 AM Carlos Ramisch <car...@li...>
wrote:

> Dear all,
>
> Please remember to re-register to SIGLEX <https://siglex.org/members.html>
> and check the MWE Section checkbox, if you have not already done so.
> The new mailing list <https://multiword.org/mailinglist> is already
> active; this mailing list at sourceforge will be *shut down on March 1st.*
>
> Best regards,
> Carlos
>
> On Thu, Jan 28, 2021 at 12:11 PM Carlos Ramisch <car...@li...>
> wrote:
>
>> Dear all,
>>
>> We are happy to announce that the SIGLEX-MWE Section is upgrading its
>> infrastructure.
>> We have a new website <https://multiword.org/> and our members list was
>> integrated with the SIGLEX members <https://siglex.org/members>
>> directory.
>> We have also created a new mailing list
>> <https://multiword.org/mailinglist> for registered members in the SIGLEX
>> directory.
>>
>> Therefore, we will be shortly *shutting down* this list
>> mul...@li...
>> We kindly ask you, if you have not already done so, to register as a
>> SIGLEX member <https://siglex.org/members>.
>> Do not forget to tick the "MWE" checkbox so that we can add you to the
>> new MWE mailing list.
>>
>> Please, take action in the next 30 days. *The list will be definitely
>> shut down on March 1, 2021*.
>> If you have any trouble registering to the new directory and mailing
>> list, just drop a line:
>> sig...@go...
>>
>> Best
>> Carlos, on behalf of the MWE Section's standing committee
>> <https://multiword.org/organization/standingcommittee>
>>
>

[SIGLEX-MWE-OLD] ACTION REQUIRED - Mailing list shutdown on March 1, 2021 - Reminder

From: Carlos R. <car...@li...> - 2021-02-18 08:24:56

Dear all,

Please remember to re-register to SIGLEX <https://siglex.org/members.html>
and check the MWE Section checkbox, if you have not already done so.
The new mailing list <https://multiword.org/mailinglist> is already active;
this mailing list at sourceforge will be *shut down on March 1st.*

Best regards,
Carlos

On Thu, Jan 28, 2021 at 12:11 PM Carlos Ramisch <car...@li...>
wrote:

> Dear all,
>
> We are happy to announce that the SIGLEX-MWE Section is upgrading its
> infrastructure.
> We have a new website <https://multiword.org/> and our members list was
> integrated with the SIGLEX members <https://siglex.org/members> directory.
> We have also created a new mailing list
> <https://multiword.org/mailinglist> for registered members in the SIGLEX
> directory.
>
> Therefore, we will be shortly *shutting down* this list
> mul...@li...
> We kindly ask you, if you have not already done so, to register as a
> SIGLEX member <https://siglex.org/members>.
> Do not forget to tick the "MWE" checkbox so that we can add you to the new
> MWE mailing list.
>
> Please, take action in the next 30 days. *The list will be definitely
> shut down on March 1, 2021*.
> If you have any trouble registering to the new directory and mailing list,
> just drop a line:
> sig...@go...
>
> Best
> Carlos, on behalf of the MWE Section's standing committee
> <https://multiword.org/organization/standingcommittee>
>

[SIGLEX-MWE] ACTION REQUIRED - Mailing list shutdown on March 1, 2021

From: Carlos R. <car...@li...> - 2021-01-28 11:12:02

Dear all,

We are happy to announce that the SIGLEX-MWE Section is upgrading its
infrastructure.
We have a new website <https://multiword.org/> and our members list was
integrated with the SIGLEX members <https://siglex.org/members> directory.
We have also created a new mailing list <https://multiword.org/mailinglist>
for registered members in the SIGLEX directory.

Therefore, we will be shortly *shutting down* this list
mul...@li...
We kindly ask you, if you have not already done so, to register as a SIGLEX
member <https://siglex.org/members>.
Do not forget to tick the "MWE" checkbox so that we can add you to the new
MWE mailing list.

Please, take action in the next 30 days. *The list will be definitely shut
down on March 1, 2021*.
If you have any trouble registering to the new directory and mailing list,
just drop a line:
sig...@go...

Best
Carlos, on behalf of the MWE Section's standing committee
<https://multiword.org/organization/standingcommittee>

[SIGLEX-MWE] MWE 2021: 17th Workshop on Multiword Expressions - first CfP

From: Shiva T. <sh....@gm...> - 2021-01-12 14:49:36

----------------------------------------

[apologies for any cross-posting]
----------------------------------------

*17th Workshop on Multiword Expressions (MWE 2021)*

Colocated with ACL-IJCNLP 2021 (Bangkok, Thailand), 5 or 6 August 2021

*Deadline: April 19, 2021*
https://multiword.org/mwe2021/

*Organised and sponsored by: SIGLEX, the Special Interest Group on the
Lexicon of the ACL*



*** FIRST CALL FOR PAPERS ***

Multiword expressions (MWEs) are word combinations which exhibit lexical,
syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Baldwin &
Kim 2010), such as by and large, hot dog, pay a visit and pull one's leg.
The notion encompasses closely related phenomena: idioms, compounds,
light-verb constructions, rhetorical figures, institutionalised phrases,
collocations, etc. The behaviour of MWEs is often unpredictable, in
particular their meanings are not regularly composed of the meanings of
their parts. Thus, MWEs are a major challenge in computational linguistics
(Constant et al. 2017), including linguistic modelling (e.g. treebanking),
computational modelling (e.g. parsing), and end user NLP applications (e.g.
natural language understanding, machine translation, and social media
mining).



Modelling and processing MWEs for NLP has been the topic of the MWE
workshop organised by the MWE section <http://multiword.org/> of SIGLEX
<https://siglex.org/> in conjunction with major NLP conferences since 2003.
Although much progress has been made in the field, MWE processing in
end-user NLP tasks is currently under-explored, and most studies still
introduce MWEs as future work. Nonetheless, there are recent studies in
which MWEs gained particular attention in end-user applications, including
machine translation (Zaninello & Birch 2020), text simplification (Kochmar
et al. 2020, Liu & Hwa 2016), language learning and assessment (Paquot et
al. 2019, Christiansen & Arnon 2017), social media mining (Maisto et al.
2017), and abusive language detection (Zampieri et al. 2020, Caselli et al.
2020).



The special focus for this 17th edition of the workshop is on *MWE
processing in end-user applications* such as those listed above. On the one
hand, the PARSEME shared tasks (Ramisch et al. 2020, Ramisch et al. 2018,
Savary et al. 2017), among others, fostered significant progress in MWE
identification, providing datasets, evaluation measures and tools that now
allow fully integrating MWE identification into end-user applications. On
the other hand, NLP seems to be shifting towards end-to-end neural models
capable of solving complex end-user tasks with little or no intermediary
linguistic symbols, questioning the extent to which MWEs should be
implicitly or explicitly modelled. Therefore, one goal of this workshop is
to bring together and encourage researchers in various NLP subfields to
submit MWE-related research, so that approaches that deal with MWEs in
various applications could benefit from each other.



Following the success of previous joint workshops LAW-MWE-CxG 2018
<http://multiword.sourceforge.net/lawmwecxg2018/>, MWE-WN 2019
<http://multiword.sourceforge.net/mwewn2019/> and MWE-LEX 2020
<http://multiword.sourceforge.net/mwelex2020/>, we further extend the scope
of the workshop to MWEs in e-lexicons and WordNets, MWE annotation, as well
as grammatical constructions.



The 17th Workshop on MWEs invites submissions on (but not limited to) the
following topics:



*Traditional MWE topics:*

   - Computationally-applicable theoretical work on MWEs and constructions
   in psycholinguistics and corpus linguistics
   - MWE and construction annotation and representation in resources such
   as corpora, treebanks, e-lexicons and WordNets
   - Processing of MWEs and constructions in syntactic and semantic
   frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.)
   - Discovery and identification methods for MWEs and constructions
   - MWEs and constructions in language acquisition, language learning, and
   non-standard language (e.g. tweets, speech)
   - Evaluation of annotation and processing techniques for MWEs and
   constructions
   - Retrospective comparative analyses from the PARSEME shared tasks on
   automatic identification of MWEs

*Topics on MWEs and end-user applications:*

   - Processing of MWEs and constructions in end-user applications (e.g.
   MT, NLU, summarisation, social media mining, computer assisted language
   learning)
   - Implicit and explicit representation of MWEs and constructions in
   end-user applications
   - Evaluation of end-user applications concerning MWEs and constructions
   - Resources and tools for MWEs and constructions (e.g. lexicons,
   identifiers) in end-user applications

*** JOINT SESSION WITH WOAH WORKSHOP ***

Pursuing the MWE Section's tradition of synergies with other communities
and in accordance with ACL-IJCNLP 2021's theme track on NLP for social
good, we will organise a joint session with the Workshop on Online Abuse
and Harm (WOAH) <https://www.workshopononlineabuse.com/>. We believe that
MWEs are important in online abuse detection, and that the latter can
provide an interesting testbed for MWE processing technology. The main goal
is to pave the way towards the creation of data for a shared task involving
both communities. The format of the session is under discussion, and we
welcome suggestions from the community. Submissions describing research on
MWEs and abusive language, especially introducing new datasets, are also
welcome.

*** SUBMISSION MODALITIES ***

   - Long papers (8 content pages + references) should report on solid and
   finished research including new experimental results, resources and/or
   techniques.
   - Short papers (4 content pages + references) should report on small
   experiments, focused contributions, ongoing research, negative results
   and/or philosophical discussion.


In regular research papers, the reported research should be substantially
original. Papers available as preprints can also be submitted provided that
they fulfil the conditions defined by the ACL Policies for Submission,
Review and Citation
<https://www.aclweb.org/portal/content/new-policies-submission-review-and-citation>.
Notice that double submission to ACL-IJCNLP 2021 main conference and MWE
2021 is allowed but should be notified at submission time, as per the
ACL-IJCNLP
2021 call for papers
<https://2021.aclweb.org/calls/papers/#multiple-submission-policy>: "[...]
papers can be dual-submitted to both ACL-IJCNLP 2021 and an ACL-IJCNLP 2021
workshop which has its submission deadline falling before our notification
date of May 5, 2021."

Submission is ***double-blind*** as per the ACL-IJCNLP 2021 guidelines
<https://2021.aclweb.org/calls/papers/#paper-submission-information>. For
all types of submission, the ACL-IJCNLP 2021 templates
<https://2021.aclweb.org/calls/papers/#paper-submission-and-templates> must
be used. There is no limit on the number of reference pages. An extra page
will be allowed to take the reviewers' comments into account in the final
versions of accepted papers (long = 9 content pages, short = 5 content
pages).



The decisions as to oral or poster presentations of the selected papers
will be taken by the PC chairs, depending on the available infrastructure
for participation (presential and/or virtual). No distinction between
papers presented orally and as posters is made in the workshop proceedings.



All papers should be submitted via the workshop's START space, available
soon. Please choose the appropriate submission modality (long/short).


*** CONTACT ***

For any inquiries regarding the workshop please send an email to
mwe...@gm...



*** IMPORTANT DATES ***

All deadlines are at 23:59 UTC-12 (anywhere in the world).

   - April 19, 2021: Paper Submission Deadline
   - May 28, 2021: Notification of Acceptance
   - June 7, 2021: Camera-ready papers due
   - August 5 or 6, 2021: Workshop (Date TBD)

*** ORGANIZERS ***

   - *Program chairs: *Paul Cook, Jelena Mitrović, Carla Parra Escartín and
   Ashwini Vaidya
   - *Publication chairs:* Petya Osenova and Shiva Taslimipoor
   - *Communication chair: *Carlos Ramisch

Re: [SIGLEX-MWE] [MWE] Re: 2020 report of the SIGLEX-MWE section

From: Israel C. <coh...@gm...> - 2021-01-07 13:47:54

The actual link is
http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf

It was published on 8 January 2021.

As an alternative, you can copy and paste the link that Carlos sent us into
your browser's URL area and manually change 2019 to 2020. That's how I
retrieved the current Report.

Moral of this story: Don't believe everything you see.

Regards to all,
Izzy
coh...@gm...

On Thu, Jan 7, 2021 at 2:59 PM Carlos Ramisch <car...@gm...>
wrote:

> Dear all,
> Sorry the link to the report was pointing to the 2019 report.
> Here's the correct link for 2020:
> http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf
> Carlos
>
> On Thu, Jan 7, 2021 at 1:37 AM Carlos Ramisch <car...@gm...>
> wrote:
>
>> Dear SIGLEX-MWE Section members,
>>
>> Welcome to the new MWE Section mailing list!
>>
>> According to the Section's constitution
>> <http://multiword.sourceforge.net/MWE_MAIN/SIGLEX-MWE-section-constitution-2020-11-21.pdf>,
>> the MWE Section's Standing Committee should report yearly on the
>> Section's activities to its members.
>>
>> Please, find the 2020 report at:
>>
>> http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf
>> <http://multiword.sourceforge.net/MWE_MAIN/2019-SIGLEX-MWE-yearly-report.pdf>
>>
>> Best regards,
>>
>> Carlos Ramisch (also on behalf of Agata Savary, Paul Cook, Jelena
>> Mitrović, Petya Osenova, Carla Parra Escartín, Shiva Taslimipoor, Ashwini
>> Vaidya)
>>
>> P.S.: if you want to modify your subscription to this mailing list,
>> please update you SIGLEX membership information on the SIGLEX's members
>> directory <https://siglex.org/members.html>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "SIGLEX-MWE Section members" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sig...@go....
>

Re: [SIGLEX-MWE] 2020 report of the SIGLEX-MWE section

From: Carlos R. <car...@gm...> - 2021-01-07 12:59:28

Dear all,
Sorry the link to the report was pointing to the 2019 report.
Here's the correct link for 2020:
http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf
Carlos

On Thu, Jan 7, 2021 at 1:37 AM Carlos Ramisch <car...@gm...>
wrote:

> Dear SIGLEX-MWE Section members,
>
> Welcome to the new MWE Section mailing list!
>
> According to the Section's constitution
> <http://multiword.sourceforge.net/MWE_MAIN/SIGLEX-MWE-section-constitution-2020-11-21.pdf>,
> the MWE Section's Standing Committee should report yearly on the
> Section's activities to its members.
>
> Please, find the 2020 report at:
> http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf
> <http://multiword.sourceforge.net/MWE_MAIN/2019-SIGLEX-MWE-yearly-report.pdf>
>
> Best regards,
>
> Carlos Ramisch (also on behalf of Agata Savary, Paul Cook, Jelena
> Mitrović, Petya Osenova, Carla Parra Escartín, Shiva Taslimipoor, Ashwini
> Vaidya)
>
> P.S.: if you want to modify your subscription to this mailing list, please
> update you SIGLEX membership information on the SIGLEX's members directory
> <https://siglex.org/members.html>
>

Re: [SIGLEX-MWE] [MWE] 2020 report of the SIGLEX-MWE section

From: Francis B. <bo...@ie...> - 2021-01-07 03:06:19

Thank you!

On Thu, Jan 7, 2021 at 8:37 AM Carlos Ramisch <car...@gm...>
wrote:

> Dear SIGLEX-MWE Section members,
>
> Welcome to the new MWE Section mailing list!
>
> According to the Section's constitution
> <http://multiword.sourceforge.net/MWE_MAIN/SIGLEX-MWE-section-constitution-2020-11-21.pdf>,
> the MWE Section's Standing Committee should report yearly on the
> Section's activities to its members.
>
> Please, find the 2020 report at:
> http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf
> <http://multiword.sourceforge.net/MWE_MAIN/2019-SIGLEX-MWE-yearly-report.pdf>
>
> Best regards,
>
> Carlos Ramisch (also on behalf of Agata Savary, Paul Cook, Jelena
> Mitrović, Petya Osenova, Carla Parra Escartín, Shiva Taslimipoor, Ashwini
> Vaidya)
>
> P.S.: if you want to modify your subscription to this mailing list, please
> update you SIGLEX membership information on the SIGLEX's members directory
> <https://siglex.org/members.html>
>
> --
> You received this message because you are subscribed to the Google Groups
> "SIGLEX-MWE Section members" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sig...@go....
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University

[SIGLEX-MWE] 2020 report of the SIGLEX-MWE section

From: Carlos R. <car...@gm...> - 2021-01-07 00:37:27

Dear SIGLEX-MWE Section members,

Welcome to the new MWE Section mailing list!

According to the Section's constitution
<http://multiword.sourceforge.net/MWE_MAIN/SIGLEX-MWE-section-constitution-2020-11-21.pdf>,
the MWE Section's Standing Committee should report yearly on the Section's
activities to its members.

Please, find the 2020 report at:
http://multiword.sourceforge.net/MWE_MAIN/2020-SIGLEX-MWE-yearly-report.pdf
<http://multiword.sourceforge.net/MWE_MAIN/2019-SIGLEX-MWE-yearly-report.pdf>

Best regards,

Carlos Ramisch (also on behalf of Agata Savary, Paul Cook, Jelena Mitrović,
Petya Osenova, Carla Parra Escartín, Shiva Taslimipoor, Ashwini Vaidya)

P.S.: if you want to modify your subscription to this mailing list, please
update you SIGLEX membership information on the SIGLEX's members directory
<https://siglex.org/members.html>

[SIGLEX-MWE] First Call for Papers: ACL-IJCNLP2021 Workshop on "Benchmarking: Past, Present and Future"

From: Valia K. <eva...@an...> - 2020-12-23 11:44:31

Apologies for cross-postings
------------------------------------------------------
Call for Papers

Workshop on "Benchmarking: Past, Present and Future"

Co-located with ACL-IJCNLP 2021 to be held in Bangkok, on August 5-6, 2021

Webpage:
https://github.com/kwchurch/Benchmarking_past_present_future/blob/master/README.md

Important Dates

* April 26, 2021: Paper submission deadline
* May 28, 2021: Notification of acceptance
* June 7, 2021: Camera-ready papers due
* August 5-6, 2021: Workshop dates
--------------------------------------------------------

It is easier to talk about the past than the future. These days,
benchmarks evolve more bottom up (such as papers with code). There used to
be more top-down leadership from government (and industry, in the case of
systems, with benchmarks such as SPEC). Going forward, there may be more
top-down leadership from organizations like MLPerf and/or influencers like
David Ferrucci, who was responsible for IBM’s success with Jeopardy, and
has recently written a paper suggesting how the community should think
about benchmarking for machine comprehension (To Test Machine
Comprehension, Start by Defining Comprehension). Tasks such as reading
comprehension become even more interesting as we move beyond English.
Multilinguality introduces many challenges, and even more opportunities.

Keynote Talks

We have an amazing collection of invited talks, many with direct
first-hand knowledge of the history, and many insights for the future:

1. Past
a. John Makhoul
b. Mark Liberman: Reproducible Research and the Common Task Method
c. Ellen Voorhees

2. Present
a. Ming Zhou (Microsoft)
b. Hua Wu and Jing Liu (Baidu)
c. Neville Ryant DIHARD
d. Brian MacWhinney and Saturnino Haider, Dementia Challenge
e. Samuel Bowman (GLUE)
f. Douwe Kiela (https://dynabench.org/)
g. Eunsol Choi

3. Future
a. MLPerf Greg Diamos The 2021 SpeechNet Challenge
b. David Ferrucci
c. Ido Dagan

Submissions

We accept three types of submissions, long papers, short papers and
abstracts, all following the ACL2021 style, and the ACL submission policy:
https://www.aclweb.org/adminwiki/index.php?title=ACL_Policies_for_Submission,_Review_and_Citation

Long papers may consist of up to eight (8) pages of content, plus
unlimited references, short papers may consist of up to four (4) pages of
content; final versions will be given one additional page of content so
that reviewers' comments can be taken into account. Abstracts may consist
of up to two (2) pages of content, plus unlimited references but will not
be given any additional page upon acceptance. Submissions should be sent
in electronic forms, using the Softconf START conference management
system. The submission site will be announced on the workshop page once
available.

We invite original research papers from a wide range of topics, including
but not limited to:

1. What important technologies and underlying sciences need to be
fostered, now and in the future?
2. In each case, are there existing tasks/benchmarks that move the field
in the right direction?
3. Where are there gaps?
4. For the gaps, are there initial steps that are accessible, attractive,
and cost effective?
5. How large should a benchmark be?
a. How much data do we need to measure significant differences?
b. How much data do machines need to obtain good performance?
c. How much data do babies need to learn language?

Submissions are open to all, and are to be submitted anonymously. All
papers will be refereed through a double-blind peer review process by at
least three reviewers with final acceptance decisions made by the workshop
organizers.

The workshop is scheduled to last for one day either August 5th or 6th. If
you have any questions, contact us at
pc-...@go...

Workshop organizers

Kenneth Church (Baidu USA)
Mark Liberman (University of Pennsylvania)
Valia Kordoni (Humboldt-Universität zu Berlin)

[SIGLEX-MWE] Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020)

From: Carlos R. <car...@li...> - 2020-12-01 10:53:17

Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX
2020)

*Workshop at **COLING 2020* <http://coling2020.org/>*, December 13, 2020.*

Organized and sponsored by:
Special Interest Group on the Lexicon (SIGLEX <http://www.siglex.org>) of
the Association for Computational Linguistics (ACL
<https://www.aclweb.org/portal/>)
ELEXIS <https://elex.is/> - European Lexicographic Infrastructure.

This joint event is the 16th edition of the *Workshop on Multiword
Expressions (**MWE*
<http://multiword.sourceforge.net/PHITE.php?sitesig=CONF>*)*.

*CALL FOR PARTICIPATION*



We would like to inform you that the MWE-LEX 2020 program can be accessed
at:

http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-LEX_2020___lb__COLING__rb__&subpage=CONF_20_Program



We are happy to announce that Prof. Roberto Navigli will give the Keynote
Speech:

http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-LEX_2020___lb__COLING__rb__&subpage=CONF_10_Keynote_Speaker



You can access MWE-LEX 2020 home page at:

http://multiword.sourceforge.net/mwelex2020/

We would like to let you know that in order to participate you should
register at  https://coling2020.org/pages/registration.html  (if you have
not  done so already).

Looking forward to seeing you at the workshop

MWE-LEX 2020 organizers

[SIGLEX-MWE] [IMPORTANT!] SIGLEX website re-registration needed!

From: Carlos R. <car...@li...> - 2020-11-19 13:25:28

Dear MWE Section members,

We would like to ask for 5min of your time to (re-)register as a SIGLEX
(and MWE) member:
https://docs.google.com/forms/d/e/1FAIpQLSfldnrynfsqwMu_xwI-c8nxajUUeALJd9INhEPcSb8zCD-GBQ/viewform?usp=sf_link

Membership is free and open to anyone interested in MWE research.
Registered members can vote for SIGLEX board, including the Section
representative.
Don't forget to tick the MWE Section box ;-)

Thanks
Carlos

---------- Forwarded message ---------
From: Preslav Nakov <pre...@gm...>
Date: Thu, Nov 19, 2020 at 6:57 AM
Subject: SIGLEX website re-registration needed! (also MWE and SemEval)
To: Preslav Nakov <pre...@gm...>


(Apologies for the spam)

Dear all,

The SIGLEX community (and its MWE and SemEval sections) have migrated to a
new website:

https://siglex.org/

Now, we would like to kindly ask all members to re-register using the
registration form:

https://docs.google.com/forms/d/e/1FAIpQLSfldnrynfsqwMu_xwI-c8nxajUUeALJd9INhEPcSb8zCD-GBQ/viewform?usp=sf_link

Many thanks to Steven for taking care of this!

Regards,
Preslav

[SIGLEX-MWE] Support MWE 2021 proposal

From: Carlos R. <car...@li...> - 2020-10-28 14:15:57

Dear MWE community,

We have sent a proposal for the MWE workshop in 2021:
http://multiword.sourceforge.net/mwe2021/

Please, fill in the following survey to support our proposal:
https://docs.google.com/forms/d/e/1FAIpQLScYZ7vPmIo72mGP1-reuvuRGM835DdMeX9zSg6qx7iuwyGWPQ/formResponse

MWE 2021 appears in the "Applications, including bioNLP and finance"
section.

Best,
Carlos, on behalf of the SIGLEX-MWE Standing Committee

-- 
Carlos RAMISCH
http://pageperso.lis-lab.fr/carlos.ramisch
Assistant professor at LIS/TALEP <https://www.lis-lab.fr/talep/> and Aix
Marseille University, France

[SIGLEX-MWE] Final CfP - *Extended deadline*: Semantic Web Journal Special Issue "Latest Advancements in Linguistic Linked Data"

From: Julia B. G. <jb...@un...> - 2020-10-28 06:15:18

 Apologies for cross-posting

======
Final Call for Papers: Special Issue on
*Latest Advancements in Linguistic Linked Data*
http://www.semantic-web-journal.net/blog/call-papers-special-issue-latest-advancements-linguistic-linked-data
Contact email: swj...@go...
*Deadline: 25th of January, 2021 (extended) *
======

In recent years, various efforts have arisen with regard to the
representation and publication of linguistic resources such as lexicons,
dictionaries, corpora, terminologies and linguistic ontologies. These
efforts have exploited Semantic Web technologies and the Linguistic Linked
Data (LLD) publication paradigm to facilitate and enhance the discovery,
interoperability, integration and reusability of language resources.
Initiatives such as the H2020 projects ELEXIS and Prêt-à-LLOD and the COST
Action NexusLinguarum aim at developing robust ecosystems and networks of
experts to address the LLD lifecycle, from identifying the requirements
concerning the representation of linguistic resources to their exploitation
by natural language processing (NLP) applications. With the rapid growth of
the Linguistic Linked Open Data (LLOD) cloud and the increasing interest in
the use of linked data for NLP, new challenges emerge concerning particular
use cases and domain applications, language-specific features and quality
dimensions, the evolution of LLD resources throughout time and the leverage
of linguistic resources along LD technologies in NLP research, among other
diverse aspects.

This special issue on the latest advancements in LLD invites high-quality
contributions, supported by a robust evaluation, which present an
advancement in the state-of-the-art in the field of LLD methodologies and
technologies and their use for NLP and provide insights into the new
challenges ahead. The list of topics includes, but is not limited to, the
following:

   - Knowledge Representation for Linguistic Data
      - Ontologies, vocabularies and linguistic category registries for
      linguistic data
      - Representation languages for linguistic data as LLD
      - Modelling challenges with state-of-the-art LLD models (e.g.
      OntoLex-Lemon)
      - Use case-based representation requirements for LLD
      - Ontology engineering for linguistic data representation: building,
      evaluation, evolution, alignment and reuse of ontologies for
computational
      linguistics and NLP
   - LLD Generation and Evolution
      - Methodologies and workflows for LLD generation
      - Diachronic and sociolinguistic approaches to LLD generation and
      evolution
      - Innovative approaches to automatic LLD generation
      - Technically robust and systematically evaluated LLD resources
      - LLD for under-resourced and underrepresented languages and domains
      - Linking LLD sets across multiple dimensions and levels of
      linguistic description
      - LLD quality evaluation and resource curation
      - LLD extension, enrichment and evolution
   - LLD Publication, Querying and Visualization
      - Publication and metadata
      - IPR, licensing and privacy issues
      - LLD specific query techniques and languages
      - Supporting interfaces for different steps of the LLD lifecycle
      - Visualization of LLD
   - LLD and NLP research
      - LLD for NLP and NLP for LLD
      - Integration, exploitation and added value of LLD technologies and
      interoperable linguistic resources in NLP systems
      - LLD in Deep Learning-based NLP approaches
      - LLD in Big Data contexts
   - Applications and Use Cases
      - Automatic approaches for different steps of the LLD lifecycle
      - Knowledge extraction and representation from linguistic resources
      - LLD for research in specific domains (e.g. linguistics, digital
      humanities, life sciences, law, journalism, etc.)
      - LLD specific features and requirements from domain experts

DeadlineSubmission deadline: 20th of November, 2020 25th of January, 2021
(extended). Papers submitted before the deadline will be reviewed upon
receipt.

Guest editors

The guest editors can be reached at swj...@go... .

Julia Bosque-Gil, University of Zaragoza, Spain
Milan Dojchinovski, Czech Technical University in Prague, Czech Republic
Marieke van Erp, KNAW Humanities Cluster, Amsterdam, Netherlands
Christian Chiarcos, Goethe Universität Frankfurt, Germany
Philipp Cimiano, Bielefeld University, Germany


-- 
Julia Bosque-Gil
Aragon Institute of Engineering Research (I3A)
University of Zaragoza
Pronouns: she/her

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

[SIGLEX-MWE] MWE-LEX 2020 invites papers accepted at "Findings of EMNLP"

From: Stella M. <sti...@gm...> - 2020-10-06 19:00:31

 The MWE-LEX 2020 workshop is inviting authors of accepted papers at
"Findings of EMNLP" to present their work at our workshop.
To submit your paper for a presentation slot, please send an email to
mwe...@gm... by Friday October 9, 2020 with:
  *   Your paper
  *   One or two sentences explaining why it would be a good fit for the
scope of MWE-LEX 2020

The MWE-LEX 2020 Organizing Committee

[SIGLEX-MWE] 2nd Call for Papers: Semantic Web Journal Special Issue "Latest Advancements in Linguistic Linked Data"

From: Julia B. G. <jb...@un...> - 2020-09-23 12:04:45

 Apologies for cross-posting

======
2nd Call for Papers: Special Issue on
*Latest Advancements in Linguistic Linked Data*
http://www.semantic-web-journal.net/blog/call-papers-special-issue-latest-advancements-linguistic-linked-data
Contact email: swj...@go...
*Deadline: 20th of November, 2020 *
======

In recent years, various efforts have arisen with regard to the
representation and publication of linguistic resources such as lexicons,
dictionaries, corpora, terminologies and linguistic ontologies. These
efforts have exploited Semantic Web technologies and the Linguistic Linked
Data (LLD) publication paradigm to facilitate and enhance the discovery,
interoperability, integration and reusability of language resources.
Initiatives such as the H2020 projects ELEXIS and Prêt-à-LLOD and the COST
Action NexusLinguarum aim at developing robust ecosystems and networks of
experts to address the LLD lifecycle, from identifying the requirements
concerning the representation of linguistic resources to their exploitation
by natural language processing (NLP) applications. With the rapid growth of
the Linguistic Linked Open Data (LLOD) cloud and the increasing interest in
the use of linked data for NLP, new challenges emerge concerning particular
use cases and domain applications, language-specific features and quality
dimensions, the evolution of LLD resources throughout time and the leverage
of linguistic resources along LD technologies in NLP research, among other
diverse aspects.

This special issue on the latest advancements in LLD invites high-quality
contributions, supported by a robust evaluation, which present an
advancement in the state-of-the-art in the field of LLD methodologies and
technologies and their use for NLP and provide insights into the new
challenges ahead. The list of topics includes, but is not limited to, the
following:

   - Knowledge Representation for Linguistic Data
      - Ontologies, vocabularies and linguistic category registries for
      linguistic data
      - Representation languages for linguistic data as LLD
      - Modelling challenges with state-of-the-art LLD models (e.g.
      OntoLex-Lemon)
      - Use case-based representation requirements for LLD
      - Ontology engineering for linguistic data representation: building,
      evaluation, evolution, alignment and reuse of ontologies for
computational
      linguistics and NLP
   - LLD Generation and Evolution
      - Methodologies and workflows for LLD generation
      - Diachronic and sociolinguistic approaches to LLD generation and
      evolution
      - Innovative approaches to automatic LLD generation
      - Technically robust and systematically evaluated LLD resources
      - LLD for under-resourced and underrepresented languages and domains
      - Linking LLD sets across multiple dimensions and levels of
      linguistic description
      - LLD quality evaluation and resource curation
      - LLD extension, enrichment and evolution
   - LLD Publication, Querying and Visualization
      - Publication and metadata
      - IPR, licensing and privacy issues
      - LLD specific query techniques and languages
      - Supporting interfaces for different steps of the LLD lifecycle
      - Visualization of LLD
   - LLD and NLP research
      - LLD for NLP and NLP for LLD
      - Integration, exploitation and added value of LLD technologies and
      interoperable linguistic resources in NLP systems
      - LLD in Deep Learning-based NLP approaches
      - LLD in Big Data contexts
   - Applications and Use Cases
      - Automatic approaches for different steps of the LLD lifecycle
      - Knowledge extraction and representation from linguistic resources
      - LLD for research in specific domains (e.g. linguistics, digital
      humanities, life sciences, law, journalism, etc.)
      - LLD specific features and requirements from domain experts

DeadlineSubmission deadline: 20th of November, 2020. Papers submitted
before the deadline will be reviewed upon receipt.

Guest editors

The guest editors can be reached at swj...@go... .

Julia Bosque-Gil, University of Zaragoza, Spain
Milan Dojchinovski, Czech Technical University in Prague, Czech Republic
Marieke van Erp, KNAW Humanities Cluster, Amsterdam, Netherlands
Christian Chiarcos, Goethe Universität Frankfurt, Germany
Philipp Cimiano, Bielefeld University, Germany

-- 
Julia Bosque-Gil
Aragon Institute of Engineering Research (I3A)
University of Zaragoza
Pronouns: she/her

Re: [SIGLEX-MWE] SIGLEX-MWE: call for SC officers

From: Carlos R. <car...@li...> - 2020-09-02 18:11:52

Dear all,

We remind you that the deadline to apply for the SIGLEX-MWE section's
Standing Committee
<http://multiword.sourceforge.net/PHITE.php?sitesig=MWE#sc> is this *Friday,
September 4.*
If you have any questions about the committee's functions, do not hesitate
to contact me and/or the current SC members.
Any member of the MWE-SIGLEX section can send an expression of interest via
the online form <https://forms.gle/nHj8UrvwNRUSiezK9>.

We are looking forward to your submissions and to working together for the
MWE community!

Carlos Ramisch
SIGLEX-MWE section representative
on behalf of the Standing Committee

On Fri, Jul 31, 2020 at 1:00 PM Carlos Ramisch <car...@li...>
wrote:

> Dear SIGLEX-MWE Section members,
>
> This is a call for officers of the SIGLEX-MWE Section Standing Committee
> <http://multiword.sourceforge.net/PHITE.php?sitesig=MWE#sc> (SC).
>
> According to the Section's constitution
> <http://multiword.sourceforge.net/MWE_MAIN/SIGLEX-MWE-section-constitution-2017-08-23.pdf>,
> the SC consists of one elected representative and 4 nominated officers.
> The nominated officers are selected by the SIGLEX <http://www.siglex.org/>
> board from a list proposed by the Section representative.
> The duration of the term of an SC nominated officer is *2 years*.
> The SC officers must be members of the Section (and of SIGLEX) and have
> published research work in topics related to multiword expressions.
> The duties of the SC are defined by the constitution.
>
> This year, 2 officers are stepping down, and 2 new officers will be
> nominated.
> If you are interested in becoming one of them, and influencing future
> developments of the MWE community, please, submit your expression of
> interest via the web form <https://forms.gle/nHj8UrvwNRUSiezK9> until *September
> 4, 2020*.
>
> We are looking forward to your submissions and to working together for the
> MWE community!
>
> Carlos Ramisch
> SIGLEX-MWE section representative
> on behalf of the Standing Committee
>

[SIGLEX-MWE] SIGLEX-MWE: call for SC officers

From: Carlos R. <car...@li...> - 2020-07-31 11:02:21

Dear SIGLEX-MWE Section members,

This is a call for officers of the SIGLEX-MWE Section Standing Committee
<http://multiword.sourceforge.net/PHITE.php?sitesig=MWE#sc> (SC).

According to the Section's constitution
<http://multiword.sourceforge.net/MWE_MAIN/SIGLEX-MWE-section-constitution-2017-08-23.pdf>,
the SC consists of one elected representative and 4 nominated officers.
The nominated officers are selected by the SIGLEX <http://www.siglex.org/>
board from a list proposed by the Section representative.
The duration of the term of an SC nominated officer is *2 years*.
The SC officers must be members of the Section (and of SIGLEX) and have
published research work in topics related to multiword expressions.
The duties of the SC are defined by the constitution.

This year, 2 officers are stepping down, and 2 new officers will be
nominated.
If you are interested in becoming one of them, and influencing future
developments of the MWE community, please, submit your expression of
interest via the web form <https://forms.gle/nHj8UrvwNRUSiezK9> until
*September
4, 2020*.

We are looking forward to your submissions and to working together for the
MWE community!

Carlos Ramisch
SIGLEX-MWE section representative
on behalf of the Standing Committee

[SIGLEX-MWE] PARSEME shared task 1.2 - evaluation phase starting

From: Carlos R. <car...@li...> - 2020-07-01 07:57:27

Dear PARSEMErs,

*The evaluation phase of the PARSEME shared task 1.2 on semi-supervised
identification of verbal MWEs has just started!*

We have released the blind test data for all 14 languages on our public
Gitlab repo:
https://gitlab.com/parseme/sharedtask-data/
You can also use the larger unannotated corpora available here (also in
closed track):
https://gitlab.com/parseme/corpora/-/wikis/Raw-corpora-for-the-PARSEME-1.2-shared-task
This year's focus is on unseen VMWEs: the general ranking will emphasize
results on unseen VMWEs.

The *deadline* for the submission of results was *extended to July 6*
(anywhere in the world).

Results submission is to be made on the MWE-LEX softconf page:
https://www.softconf.com/coling2020/MWE-LEX/
Results must be a single compressed archive ("*zip*") with one folder per
language, named according to the 2-letter language code (e.g. GA/ for
Irish).
Each output must be named *test.system.cupt* and conform to the *.cupt*
format <http://multiword.sourceforge.net/cupt-format>.
Before submitting, please, download the format validation script
<https://gitlab.com/parseme/sharedtask-data/blob/master/1.2/bin/validate_cupt.py>
and check the format as follows:
./validate_cupt.py --input test.system.cupt

If you participate in both the closed and open tracks, please  make
distinct submissions for each.
Each team can submit 2 results per track, i.e. at most 4 in total (with one
result per language in each submission).
It is not mandatory to cover all languages, but then the macro-averages
will not be comparable to other systems.

Subscribe and use the participants' mailing list if you find a bug or if
you have questions:
ver...@go...
To reach the organizers, you can write to Par...@nl...

Best
Agata, Ashwini, Bruno, Carlos, Jakub, Marie

[SIGLEX-MWE] PARSEME Shared Task 1.2 - Final Call for Participation

From: Agata S. <aga...@un...> - 2020-06-19 11:18:32

*

PARSEME shared task 1.2 on semi-supervised 
identification of verbal multiword expressions

http://multiword.sourceforge.net/sharedtask2020

Final call for participation

(Apologies for cross-posting)


The third edition of the PARSEME shared task on 
automatic identification of verbal multiword 
expressions (VMWEs) aims at identifying **verbal 
MWEs**in running texts.  Verbal MWEs include, 
among others, verbal idioms (to let the cat out of 
the bag), light-verb constructions (to make a 
decision), verb-particle constructions (to give 
up), multi-verb constructions (to make do) and 
inherently reflexive verbs (s'évanouir 'to 
faint'in French).  Their identification is a 
well-known challenge for NLP applications, due to 
their complex characteristics including 
discontinuity, overlaps, non-compositionality, 
heterogeneity and syntactic variability.


Editions 1.0 
<http://multiword.sourceforge.net/sharedtask2017/>(2017) 
and 1.1 
<http://multiword.sourceforge.net/sharedtask2018/>(2018) 
have shown that, while some systems reach high 
performance (F1>0.7) for identifying VMWEs that 
were seen in training corpus, performance on 
unseen VMWEs is very low (F1<0.2). Hence for this 
third edition, **emphasis will be put on 
discovering VMWEs that were not seen in the 
training corpus**.


We kindly ask potential participant teams to 
register using the expression of interest form:

https://docs.google.com/forms/d/e/1FAIpQLSfcmbd6MmKjFuBxCoaTWGCPGqoH5FoJ-th8IAZk3kh_ECDaZQ/viewform?usp=sf_link


Task updates and questions will be posted on the 
shared task website:

http://multiword.sourceforge.net/sharedtask2020

and announced on our public mailing list (anyone 
can join):

http://groups.google.com/group/verbalmwe



#### Publication and workshop


Shared task participants will be invited to submit 
a system description paper to a special track of 
the Joint Workshop on Multiword Expressions and 
Electronic Lexicons (MWE-LEX 2020), at COLING 
2020, to be held on December 13, 2020, in 
Barcelona, Spain (postponed):

http://multiword.sourceforge.net/mwelex2020


Submitted system description papers must follow 
the workshop submission instructions and will go 
through double-blind peer reviewing.  Their 
acceptance depends on the quality of the paper 
rather than on the ranking in the shared task.  
Authors of the accepted papers will present their 
work as posters/demos in a dedicated session of 
the MWE-LEX 2020 workshop.  The submission of a 
system description paper is not mandatory.


Due to double blind review, participants are asked 
to provide a nickname (i.e. a name that does not 
identify authors, universities, research groups 
etc.) for their systems when submitting results 
and system description papers.



#### Provided corpora


The PARSEME team has prepared corpora in which 
VMWEs were manually annotated: 
https://gitlab.com/parseme/corpora/wikis/home. The 
provided annotations follow the PARSEME 1.2 
guidelines: 
https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.2/.


On March 23, 2020, we released, for each language:

* a training corpusmanually annotated for VMWEs;

* a development corpusto tune/optimize the 
systems' parameters ; and

* a syntactically parsed raw corpus, not annotated 
for VMWEs, to support semi- and unsupervised 
methods for VMWE discovery (for each language, the 
size is between 12 million tokens and 2.5 billion 
tokens)


On July 1, 2020, we will release, for each language:

* A blind test corpusto be used as input to the 
systems during the evaluation phase, during which 
the VMWE annotations will be kept secret.


On July 3, 2020, participants will have to upload 
their annotated version of the test corpus at

https://www.softconf.com/coling2020/MWE-LEX/


Morphosyntactic annotations (parts of speech, 
lemmas, morphological features, and syntactic 
dependencies) are also provided, both for 
annotated and raw corpora.  Depending on the 
language, the information comes from treebanks 
(mostly Universal Dependencies v2) or from 
automatic parsers trained on UD v2 treebanks 
(e.g., UDPipe).


The annotated training and development corpora are 
released in the CUPT format 
<http://multiword.sourceforge.net/cupt-format/>(which 
is the CoNLL-U format with an extra column for the 
MWE annotations). The raw corpora are released in 
the CoNLL-U format 
<https://universaldependencies.org/format>. The 
blind test corpus will be released in the CUPT 
format, with an underspecified 11th column to be 
predicted. Reference annotations for the test 
copus will be released after the evaluation phase.


The trial data, training and dev sets are 
available on the shared task's release repository: 
https://gitlab.com/parseme/sharedtask-data/tree/master/1.2

The raw corpus is available on the corpus 
initiative website:

https://gitlab.com/parseme/corpora/wikis/Raw-corpora-for-the-PARSEME-1.2-shared-task


Corpora are available for the following languages: 
German (DE), Greek (EL), Basque (EU), French (FR), 
Irish (GA), Hebrew (HE), Hindi (HI), Italian (IT), 
Polish (PL), Brazilian Portuguese (PT), Romanian 
(RO), Swedish (SV), Turkish (TR), Chinese (ZH).


The amount of annotated data in the training, 
development, test, and raw corpus depends on the 
language.



#### Tracks


System results can be submitted in two tracks:

   * Closed track: Systems using only the provided 
training and development corpora (with VMWE and 
morpho-syntactic annotations) + provided raw corpora.

   * Open track: Systems using or not the provided 
training corpus, plus any additional resources 
deemed useful (MWE lexicons, symbolic grammars, 
wordnets, other raw corpora, word embeddings and 
language models trained on external data, etc.). 
This track includes notably purely symbolic and 
rule-based systems.


In both tracks, the use of the corpora from the 
previous PARSEME shared tasks, and from the 
PARSEME source repositories 
<https://gitlab.com/parseme/corpora/-/wikis/home#active-languages>, 
is strictly forbidden, as material may have moved 
during corpus splits.


Teams submitting systems in the open track will be 
requested to describe and provide references to 
all resources used at submission time. Teams are 
encouraged to favor freely available resources for 
better reproducibility of their results.



#### Evaluation metrics


Participants will provide the output produced by 
their systems on the test corpus in the CUPT 
format, with the 11th column containing their 
predictions. This output will be compared with the 
gold standard (ground truth) using both generic 
and specialised precision, recall and F1 scores.


The evaluation metrics will be the same as for the 
1.1 edition, as described in:

http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-CxG_2018___lb__COLING__rb__&subpage=CONF_50_Evaluation_metrics


Note that for the 1.2 edition the published 
general ranking will emphasize 3 metrics:

    * global MWE-based

    * global Token-based

    * unseen MWE-based


A VMWE from the test corpus is considered seen if 
a VMWE with the same (multi-)set of lemmas is 
annotated at least once in the training or 
development corpus.


#### Corpus split


For each language, the annotated sentences are 
shuffled and split, in a way which ensures that 
there is a minimum of 300 VMWEs in the test set 
which are unseen in the training + dev sets. This 
means that the natural sequence of sentences in a 
document will not be respected in the proposed 
corpus split. Note the unseen ratio, that is, the 
proportion of unseen VMWEs wrt all VMWEs in the 
test set, may vary across languages. To guide 
participants on this hard task, the number and 
rate of unseen VMWEs for the dev corpora are 
available on the shared task website. In both 
tracks, the use of previous shared task editions' 
corpora, and from the PARSEME source repositories 
<https://gitlab.com/parseme/corpora/-/wikis/home#active-languages>, 
is strictly forbidden, as material may have moved 
during corpus splits.



#### Important dates (updated)

   * Feb 19, 2020: trial data and evaluation 
script released

   * Mar 23, 2020: training and development corpus 
+ raw corpus released

   * Jul 01, 2020: blind test corpus released

   *Jul 03, 2020: submission of system results

   * Jul 09, 2020: announcement of results

   * Sep 02, 2020: shared task system description 
papers due (same as regular papers)

   * Oct 16, 2020: notification of acceptance

   * Nov 01, 2020: camera-ready system description 
papers due

   * Dec 13, 2020: shared task session at 
theMWE-LEX 2020 
<http://multiword.sourceforge.net/mwelex2020>workshop 
at Coling 2020



#### Organizing team


Carlos Ramisch, Marie Candito, Bruno Guillaume, 
Agata Savary, Ashwini Vaidya, and Jakub Waszczuk

Contact: par...@nl... 
<mailto:par...@nl...>*

[SIGLEX-MWE] Call for Participation: LDL-2020 Online Workshop

From: John P. M. <jo...@mc...> - 2020-06-13 07:19:53

Apologies for cross-posting
------

Due to the global situation and cancellation of the LREC 2020 conference,
our workshop, Linked Data in Linguistics will take place virtually on
June 22nd and 23rd. The program and details about participation can be
found here: http://ldl2020.linguistic-lod.org/program.html soon. Meanwhile,
the proceedings are already published on the LREC website:
https://lrec2020.lrec-conf.org/en/workshops-and-tutorials/2020-workshops/

In order to participate in the workshop, please register in advance, a link
to a registration form can be found here:
https://forms.gle/cK8TqpiqDBEWQRk7A

Information about the workshop:

Since its establishment in 2012, the Linked Data in Linguistics (LDL)
workshop series has become the major forum for presenting, discussing and
disseminating technologies, vocabularies, resources and experiences
regarding the application of semantic technologies and the Linked Open Data
(LOD) paradigm to language resources in order to facilitate their
visibility, accessibility, interoperability, reusability, enrichment,
combined evaluation and integration. The LDL workshop series is organized
by the Open Linguistics Working Group of the Open Knowledge Foundation and
has contributed greatly to the emergence and growth of the Linguistic
Linked Open Data (LLOD) cloud. LDL workshops contribute to the discussion,
dissemination and establishment of community standards that drive this
development, most notably the OntoLex-lemon model for lexical resources, as
well as standards for other types of language resources still under
development.

Past years have seen a growing interest in the application of knowledge
graphs and Semantic Web technologies to language resources, and their
publication as linked data on the Web. As of today, a large number of
language resources were either converted or created natively as linked data
on the basis of data models specifically designed for the representation of
linguistic content. Examples are wordnets, dictionaries, corpora — research
papers describing the creation of these resources were presented at the
previous editions of both LREC and LDL. At the same time, the growth of the
LLOD cloud is far from over: new use-cases call for new data models and new
resources to be created or converted.

However, even though a critical mass of LLOD is already in place, there is
still a pressing need for a robust ecosystem of tools that consume
linguistic linked data. Recently started research networks and European
projects, such as NexusLinguarum, ELEXIS, and Prêt-à-LLOD are working in
the direction of building sustainable infrastructures around LRs, with
linked data as one of the core technologies.

By collocating the 7th edition of the workshop series with LREC, we
encourage this interdisciplinary community to participate in the dialogue
on these issues, to present and to discuss use cases, experiences, best
practices, recommendations and technologies among each other and in
interaction with the language resource community.

The LDL workshop series has a general focus on LOD-based resources,
vocabularies, infrastructures and technologies as means for managing,
improving and using language resources on the Web. As technology and
resources increasingly converge towards a LOD-based ecosystem, we
particularly encourage submissions on Linked-Data Aware Tools and Services
and Linked Language Resources Infrastructure, i.e. managing, curating and
applying LLOD technologies and resources in a reliable and reproducible way
for the needs of linguistics, NLP and digital humanities.

[SIGLEX-MWE] ACL 2020 Workshop on Figurative Language Processing: Deadline extension

From: Ekaterina S. <ka...@ic...> - 2020-04-13 18:32:26

FINAL CALL FOR PAPERS: Note the *new submission deadline* due to the
COVID-19 situation


ACL 2020 Workshop on Figurative Language Processing

July 9, 2020

https://sites.google.com/view/figlang2020/

Submission deadline: April 23, 2020


WORKSHOP DESCRIPTION

Figurative language processing is a rapidly growing area in NLP,
including processing of metaphors, idioms, puns, irony, sarcasm, as
well as other figures. Characteristic to all areas of human activity
(from poetic to ordinary to scientific) and, thus, to all types of
discourse, figurative language becomes an important problem for NLP
systems. Its ubiquity in language has been established in a number of
corpus studies and the role it plays in human reasoning has been
confirmed in psychological experiments. This makes figurative language
an important research area for computational and cognitive
linguistics, and its automatic identification and interpretation
indispensable for any semantics-oriented NLP application.

The main focus of the workshop will be on computational modelling of
figurative language using state-of-the-art NLP techniques. However,
papers on cognitive, linguistic, social, rhetorical, and applied
aspects are also of interest, provided that they are presented within
a computational, a formal, or a quantitative framework. In addition,
we will also conduct two shared tasks on metaphor and sarcasm
detection.

The workshop invites both full papers and short papers for either oral
or poster presentation.

Submission site: https://www.softconf.com/acl2020/flp/


IMPORTANT DATES

April 23, 2020 Paper submissions due (23:59 West Coast USA time)

May 15, 2020 Notification of acceptance

May 25, 2020 Camera-ready papers due

July 9, 2020 Workshop (taking place virtually, alongside ACL 2020)


WORKSHOP CO-CHAIRS

Beata Beigman Klebanov, Educational Testing Service, USA

Ekaterina Shutova, University of Amsterdam, The Netherlands

Smaranda Muresan, Columbia University, USA

Patricia Lichtenstein, University of California, Merced, USA

Ben Leong, Educational Testing Service, USA

Anna Feldman, Montclair State University, USA

Debanjan Ghosh, Educational Testing Service, USA

[SIGLEX-MWE] CLiC-it 2020, Conference Announcement and First Call for Papers

From: <jm...@un...> - 2020-04-08 17:07:44

*** Apologies for cross postings ***

-------------------------------------------------------------------------

                             CLiC-it 2020
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Seventh Italian Conference on Computational Linguistics
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

                   November 30th – December 2nd, 2020
                               Bologna

           Conference Announcement and First Call for Papers
                     http://clic2020.ilc.cnr.it

                              ---------

The Italian Conference on Computational Linguistics, CLiC-it, aims at
establishing a reference forum for the Italian community of researchers
working in the fields of Computational Linguistics (CL) and Natural
Language Processing (NLP). CLiC-it promotes and disseminates high-level,
original research on all aspects of automatic language processing, both
written and spoken, and targets state-of-the-art theoretical results,
experimental methodologies, technologies, as well as application
perspectives, which may contribute to the advancement of the CL and NLP
fields.

The spirit of the conference is inclusive. In the conviction that the
complexity of language phenomena needs cross-disciplinary competences,
CLiC-it intends to bring together researchers of related disciplines
such as Computational Linguistics, Natural Language Processing,
Linguistics, Cognitive Science, Machine Learning, Computer Science,
Knowledge Representation, Information Retrieval, and Digital Humanities.
CLiC-it is open to contributions on all languages, with a particular
emphasis on Italian.

The seventh edition of CLiC-it will be held in Bologna, on November 30th
– December 2nd, 2020. The conference will be followed by EVALITA 2020
(http://www.evalita.it/2020), the 7th evaluation campaign of Natural
Language Processing and speech tools for the Italian language. Both
CLiC-it and EVALITA are initiatives of the Italian Association of
Computational Linguistics (AILC — http://www.ai-lc.it).

We know, due to the COVID-19 pandemia, the situation is uncertain, but
we have to think positively, think about rebuilding and reinforcing the
community and therefore we believe that we can all see each other in
December in Bologna enjoying the event and, if we eventually cannot do
it, we will set up some technical solution to make it electronically and
publish the proceedings. This important moment for exchanging thoughts,
works, solutions and affect will be preserved, in some way.


Requirements
---------

The conference invites the submission of papers on all aspects of
automated language processing. Relevant topics for the conference
include, but are not limited to, the following areas:

     Dialogue, Discourse and Natural Language Generation
     Information Extraction, Information Retrieval and Question Answering
     Language Resources and Evaluation
     Language and Cognition
     Linguistic Issues in CL and NLP
     Machine Learning for NLP
     Machine Translation and Multilinguality
     Morphology and Syntax Processing
     NLP for Digital Humanities
     NLP for Web and Social Media
     Pragmatics and Creativity
     Research and Industrial NLP Applications
     Semantics and Knowledge Representation
     Spoken Language Processing and Automatic Speech Understanding
     Vision, Robotics, Multimodal and Grounding

CLiC-it 2020 has the goal of a broad technical program. We invite papers
in theoretical computational linguistics, empirical/data-driven
approaches, resources and their evaluation, as well as NLP applications
and tools. We also invite papers describing a challenge in the field,
position papers, survey papers, and papers that describe a negative
result.

We are also favouring a parallel submission policy for outstanding
papers that have been submitted and accepted elsewhere in 2020. If you
are the author of a paper accepted at a major international CL
conference or journal in 2020, you can present your work at CLiC-it 2020
in the form of a short research communication, within a dedicated
session at the conference. Research communications will not be published
in the proceedings, but are mostly intended to enforce dissemination of
excellence in research within the Italian CL community.


Submission Format
————————–

Papers may consist of up to four (4) pages of content, and two (2)
additional pages of references. Papers can be either in English or
Italian, with the abstract both in English and Italian. Accepted papers
will be published on-line and will be presented at the conference either
orally or as a poster. For research communications (see above) a two (2)
page abstract is required. The deadline for all types of submissions is
July 15, 2020.

Submissions should follow the ACL two-column format. We strongly
recommend the use of LaTeX style files or Microsoft Word style files
according to the ACL format, which will be available on the conference
website under “Information for Authors”. Submission must be electronic
in PDF, using the Easychair submission software.

Reviewing will NOT be blind, so there is no need to remove author
information from manuscripts.


Important Dates
---------

15/07/2020: Paper submission deadline
23/09/2020: Notification to authors of reviewing outcome
15/10/2020: Camera-ready version of accepted papers
30/11 - 2/12/2020: CLiC-it Conference, Bologna


People
---------

Program co-chairs:
  Felice Dell’Orletta (Istituto di Linguistica Computazionale
                       “A.Zampolli” – CNR)
  Johanna Monti (Università di Napoli “L’Orientale”)
  Fabio Tamburini (Università di Bologna)


Further information
---------

Conference website:
http://clic2020.ilc.cnr.it/

Mail:
cli...@gm...

[SIGLEX-MWE] New book: The role of constituents in multiword expressions: An interdisciplinary, cross-lingual perspective

From: Agata S. <aga...@gm...> - 2020-03-31 14:33:21

*

The role of constituents in multiword expressions: 
An interdisciplinary, cross-lingual perspective

Sabine Schulte im Walde and Eva Smolka (eds.) 
(editors)

Volume 4 of Phraseology and Multiword 
Expressions(PMWE), a book series at Language 
Science Press (LSP)

Book URL:http://langsci-press.org/catalog/book/239

Electronic ISBN: 978-3-96110-184-9

Pages: 209

Price: Europe EURO 0

Comment: Open Access


Synopsis

Multiword expressions (MWEs), including noun 
compounds (such as nicknamein English and 
Ohrwurmin German), complex verbs (such as give 
upin English and aufgebenin German) and idioms 
(such as break the icein English and das Eis 
brechenin German), may be interpreted literally 
but often undergo meaning shifts with respect to 
their constituents. Theoretical, psycholinguistic 
as well as computational linguistic research 
remain puzzled by when and how MWEs receive 
literal vs. meaning-shifted interpretations, what 
the contributions of the MWE constituents are to 
the degree of semantic transparency (i.e., meaning 
compositionality) of the MWE, and how literal vs. 
meaning-shifted MWEs are processed and computed.


This edited volume presents an interdisciplinary 
selection of seven papers on recent findings 
across linguistic, psycholinguistic, corpus-based 
and computational research fields and 
perspectives, discussing the interaction of 
constituent properties and MWE meanings, and how 
the constituents contribute to the processing and 
representation of MWEs. The collection is based on 
a workshop at the 2017 annual conference of the 
German Linguistic Society (DGfS) that took place 
at Saarland University in Saarbrücken, Germany.


Available material

Language Science Press, as a fully Open Access 
publisher, provides the following on-line material 
with the volume:

  *

    pdf files of each chapter and of the whole
    book
    <https://langsci-press.org/catalog/view/239/1886/1761-1>

  *

    Latex source codes
    <https://github.com/langsci/239>of the whole
    volume (on GitHub)

  *

    The bibliography
    <https://langsci-press.org/catalog/download/239/1887/1760-1>of
    the whole volume in .bib format

All this comes to the readers for free!

Chapters

  *

    Constituents in multiword expressions: What is
    their role, and why do we care?
    <https://langsci-press.org/catalog/view/239/1888/1762-1>

Sabine Schulte im Walde & Eva Smolka

  *

    Aiming with → arrows ← at particles: Towards a
    conceptual analysis of directional meaning
    components in German particle verbs
    <https://langsci-press.org/catalog/view/239/1889/1763-1>

Sylvia Springorum & Sabine Schulte im Walde

  *

    Do semantic features capture a syntactic
    classification of compounds? Insights from
    compositional distributional semantics
    <https://langsci-press.org/catalog/view/239/1890/1764-1>

Sandro Pezzelle & Marco Marelli

  *

    Compositionality in English deverbal
    compounds: The role of the head
    <https://langsci-press.org/catalog/view/239/1891/1765-1>

Gianina Iordăchioaia, Lonneke van der Plas & 
Glorianna Jagfeld

  *

    What can we learn from novel compounds?
    <https://langsci-press.org/catalog/view/239/1892/1766-1>

Gary Libben

  *

    Internal constituent variability and semantic
    transparency in N Prep N constructions in
    Romance languages
    <https://langsci-press.org/catalog/view/239/1893/1767-1>

Inga Hennecke

  *

    Production of multiword referential phrases:
    Inclusion of over-specifying information and a
    preference for modifier-noun phrases
    <https://langsci-press.org/catalog/view/239/1894/1768-1>

Christina L. Gagné, Thomas L. Spalding, J. Claire 
Burry & Jessica Tellis Adams

  *

    Can you reach for the planets or grasp at the
    stars? – Modified noun, verb, or preposition
    constituents in idiom processing
    <https://langsci-press.org/catalog/view/239/1895/1769-1>

Eva Smolka & Carsten Eulitz

*

[SIGLEX-MWE] PhD position in computational linguistics: Tours/Orléans, France

From: Agata S. <aga...@un...> - 2020-03-26 09:47:49

The Universities of Tours and Orléans in France 
offer a PhD position in computational linguistics:

*Design and automatic induction of a multiword 
expression lexicon at the service of linguistic 
diversity

*Application deadline: 14 May 2020 (or until filled)

More details can be found at:
http://www.info.univ-tours.fr/ICVL/doc/jobs/2020-PhD-topic-MWE-lexicon-induction.pdf

-- 
Agata Savary
Associate Professor
University of Tours
3 place Jean-Jaurès, 41029 Blois, France
phone: +33 (0)2 54 55 21 47
aga...@un...
http://www.info.univ-tours.fr/~savary/
PMWE book series:https://langsci-press.org/catalog/series/pmwe
ICVL federation:http://www.info.univ-tours.fr/ICVL

[SIGLEX-MWE] PARSEME ST1.2 train/dev data released!

From: Carlos R. <car...@li...> - 2020-03-24 00:20:24

Dear all,

We are happy to announce the release of the **training and development
data** for the PARSEME shared task 1.2
<http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-LEX_2020___lb__COLING__rb__&subpage=CONF_40_Shared_Task>
on semi-supervised identification of verbal multiword expressions (VMWEs):

https://gitlab.com/parseme/sharedtask-data/tree/master/1.2


### Languages

We provide full training and development sets for 14 languages: German
(DE), Greek (EL), Basque (EU), French (FR), GA (Irish), Hebrew (HE), Hindi
(HI), Italian (IT), Polish (PL), Brazilian Portuguese (PT), Romanian (RO),
Swedish (SV), Turkish (TR) and Chinese (ZH).


### Annotated data

We provide .cupt files <http://multiword.sourceforge.net/cupt-format> that
contain VMWE annotations and morphosyntactic data. The annotation
guidelines for VMWEs were slightly extended with respect to previous
editions to accomodate for Chinese and Swedish phenomena, and to fix minor
issues in Hindi-specific tests, leading to PARSEME guidelines 1.2
<https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.2/>.

The accompanying morphosyntactic information (POS tags, lemmas,
morphological features and/or syntactic dependencies) uses the UD v2 scheme
<http://universaldependencies.org/> (the exact version of the UD-based data
depends on the language). Depending on the language, the morphosyntactic
information was manually or automatically annotated. All annotations are
available under open licenses, notably various flavors of the Creative
Commons license.

We remind you that the blind test data will be released on April 28, and
the submission of system results is due on April 30.


### Additional raw corpora

We also provide "raw" corpora, meant to help identify VMWEs that were
unseen at training time. Here are the instructions for downloading these
raw corpora
<https://gitlab.com/parseme/corpora/-/wikis/Raw-corpora-for-the-PARSEME-1.2-shared-task>
.

The raw corpora were automatically parsed with UD v2 tools (the exact
version depending on the language) and are provided in the CoNLL-U
<https://universaldependencies.org/format.html> format. Their sizes vary
from language to language, see the raw corpora page
<https://gitlab.com/parseme/corpora/-/wikis/Raw-corpora-for-the-PARSEME-1.2-shared-task>
for statistics.


### Split of the annotated data

We provide a training (train), development (dev) and test sets for each
language. The test set will be released later, after the evaluation phase
is over. The data split was performed with a focus on unseen VMWE
identification in mind. The split is random but we controlled the following
factors for each language:

  - Test contains about 300 VMWEs which are unseen in train+dev

  - Dev contains about 100 VMWEs which are unseen in train

  - The ratio of unseen VMWEs in test with respect to train+dev (resp. dev
with respect to train) is as close as possible to an average (see below for
details)

Unseen VMWEs are defined as in the evaluation script, that is, a VMWE in test
(resp. dev) is considered unseen in train+dev (resp. train) if its
multi-set of lemmas does not occur as an annotated VMWE, with the same
multi-set of lemmas, in train+dev (resp. train).

The ratios of unseen VMWEs vary from language to language.  For most
languages, the ratios of unseen VMWEs in test (with respect to train+dev)
and in dev (with respect to train) are comparable, but this was not
possible for languages with little data.

To choose the final split, we first estimated the number of sentences in
test (resp. dev) needed to provide 300 (resp. 100) unseen VWMEs in train+dev
(resp. train). Then, we ran several random splits and selected one for
which the unseen ratio is as close as possible to the average.


### Guidelines to participants

During the system development phase, and for computing the results on the
test sets, the participants are free to use train+dev in any way. In other
words, the dev set can be added to the train set for machine learning
purposes.

In both tracks, **no data from the previous editions should be used**.

The evaluation metrics
<http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-CxG_2018___lb__COLING__rb__&subpage=CONF_50_Evaluation_metrics>
will be the same as for edition 1.1. However, for edition 1.2, the
published general ranking will emphasize 3 metrics:

* global MWE-based,

* global token-based,

* unseen MWE-based.

Do not forget to register to the participants' mailing list
<https://groups.google.com/forum/#!forum/verbalmwe>. We will also post the
latest updates on the shared task 1.2 website
<http://multiword.sourceforge.net/sharedtask2020/>.

As seen from the previous PARSEME shared task editions, supervised VMWE
identifiers are rather efficient for seen VMWEs, but very poor for unseen
ones. We hope that this highly multilingual dataset will foster the
development of systems with increased ability to identify VMWEs unseen at
training time.

This has been a tremendous collective effort, possible only with the strong
commitment of many annotators, language leaders, organizers and technical
support experts. We would like to thank all contributors for the time and
enthusiasm they invested in the creation of this resource. In particular,
the following people helped us by managing language-specific annotations
and preparing the raw corpora: Abigail Walsh, Archna Bhatia, Chaya
Liebeskind, Federico Sangati, Johanna Monti, Menghan Jiang, Hongzhi Xu,
Rafael Ehren, Renata Ramisch, Sara Stymne, Timm Lichte, Tunga Güngör, Uxoa
Iñurrieta, Verginica Barbu Mititelu, Voula Giouli, Zeynep Yirmibeşoğlu.


All the best,

Carlos Ramisch, Bruno Guillaume, Agata Savary, Jakub Waszczuk, Marie
Candito and Ashwini Vaidya

6 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 .. 12 > >> (Page 1 of 12)