61
International Conference on the Computational Processing of the Portuguese Language Conference Programme PROPOR 2016 13 15, July , 2016 Tomar, Portugal

Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

International Conference on the Computational Processing of the Portuguese Language

Conference Programme

PROPOR 2016 13 – 15, July , 2016 Tomar, Portugal

Page 2: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

1

Page 3: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

2

VENUE

Hotel dos Templários

Conference Rooms: Infante I, Gualdim & Rio. Largo Cândido dos Reis, 1 – Apartado 91 2304-909 Tomar, Portugal (LAT) 36.461, (LONG) 24.850 Tel. + 351 249 310 100

Fax + 351 249 322 191

[email protected]

http://www.hoteldostemplarios.pt

Page 4: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

3

COMMITTEES

General chair

António Branco (Univ Lisboa, Portugal)

Program chair (language)

Paulo Quaresma (Univ Évora, Portugal)

Program chair (speech)

André Adami (Univ Caxias do Sul, Brazil)

Editorial chairs

João Silva (Univ Lisboa, Portugal)

Ricardo Ribeiro (ISCTE/IUL, Portugal)

Demos committee

Vládia Pinheiro (Univ Fortaleza, Brazil), chairperson

Hugo Oliveira (Univ Coimbra, Portugal)

PROPOR 2016 Student Research Workshop chairs

Pedro Balage (Univ São Paulo, in São Carlos, Brazil)

Fernando Batista (ISCTE-IUL, Portugal)

Co-located Workshops committee

Pablo Gamallo (Univ Santiago Compostela, Galiza, Spain), chairperson

Maria das Graças Volpe Nunes (USP-SC, Brazil)

Amália Mendes (Univ Lisboa, Portugal)

Renata Vieira (PUC-RS, Brazil)

José Ramom Pichel (Imaxin|Software, Galiza, Spain)

Alberto Simões (Univ Minho, Portugal)

Page 5: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

4

COMMITTEES

Tutorials chairs

Fernando Perdigão (Univ Coimbra, Portugal)

Margarita Correia (Univ Lisboa, Portugal)

Lúcia Specia (Univ Sheffield, UK)

Chair of Best Dissertation on Language Technology for Portuguese Contest

Aline Villavicencio (UFRGS, Brazil)

Chair of Jobshop and Innovation Forum

Rosa Del Gaudio (Higher Functions, Portugal)

Publicity and Sponsorship committee

Daniela Braga (defined crowd, USA), chairperson

Ana Tavares (Univ Lisboa, Portugal)

Organization committee

António Branco (Univ Lisboa, Portugal), chairperson

Daniel Pereira (Univ Lisboa, Portugal)

Ana Tavares (Univ Lisboa, Portugal)

Steering committee

Thiago Pardo (Univ São Paulo – São Carlos, Brazil), chairperson

António Branco (Univ Lisboa, Portugal)

Sara Candeias (Microsoft, Portugal)

Nuno Mamede (Univ Lisboa, Portugal)

Cláudia Freitas (PUC-Rio de Janeiro, Brazil)

Page 6: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

5

COMMITTEES

Program committee

Alberto Abad (Univ Lisboa, Portugal) Alberto Simões (Univ Minho, Portugal) Alexandre Rademaker (IBM, Brazil) Aline Villavicencio (UFRGS, Brazil) Amália Andrade (Univ Lisboa, Portugal) Amália Mendes (Univ Lisboa, Portugal) Ana Luís (Univ Coimbra, Portugal) Anabela Barreiro (INESC-ID, Portugal) André Adami (Univ Caxias do Sul, Brazil) Andreia Bonfante (UFMT, Brazil) Andreia Rauber (UCPEL, Brazil) António Bonafonte (UPC, Spain) António Branco (Univ Lisboa, Portugal) António Serralheiro (Academia Militar, Portugal) António Teixeira, Univ Aveiro, Portugal) Ariani Di Felippo (UFSCAR, Brazil) Augusto Soares Silva (Univ Católica, Portugal) Bento da Silva (UNESP, Brazil) Berthold Crysmann (CNRS, France) Brett Drury (USP-SC, Brazil) Carlos Prolo (UFRN, Brazil) Cícero Nogueira dos Santos (IBM Watson, USA) Daniela Braga (defined crowd, USA) David Martins de Matos (Univ Lisboa, Portugal) Derek Wong (Univ Macau, China) Diamantino Freitas (Univ Porto, Portugal) Eraldo Rezende Fernandes (UFMS, Brazil) Eric Laporte (Univ Paris Est, France) Fábio Kepler (UNIPAMPA, Brazil) Fábio Violaro (UNICAMP, Brazil) Fernando Batista (ISCTE-IUL, Portugal) Fernando Perdigão (Univ Coimbra, Portugal) Fernando Resende Junior (UFRJ, Brazil) Gladis Almeida (UFSCAR, Brazil) Helena Caseli (UFSCAR, Brazil) Helena Moniz (INESC-ID, Portugal) Hugo Meinedo (Microsoft, Portugal) Hugo Oliveira (Univ Coimbra, Portugal) Irene Rodrigues (Univ Évora, Portugal) Isabel Falé (Univ Aberta, Portugal)

Page 7: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

6

COMMITTEES

Isabel Trancoso (Univ Lisboa, Portugal) Ivandré Paraboni (USP, Brazil) João Balsa (Univ Lisboa, Portugal) João Luís Rosa (USP-SC, Brazil) João Silva (Univ Lisboa, Portugal) Joaquim Llisterri (Univ Autonoma de Barcelona, Spain) Jorge Baptista (Univ Algarve., Portugal) José João Almeida (Univ Minho, Portugal) Laura Alonso Alemany (Univ Cordoba, Argentina) Leandro Oliveira (EMBRAPA, Brazil) Lúcia Specia (Univ Sheffield, UK) Magali Sanches Duran (USP-SC, Brazil) Margarita Correia (Univ Lisboa, Portugal) Maria das Graças Volpe Nunes (USP-SC, Brazil) Maria José Finatto (UFRGS, Brazil) Mário Silva (Univ Lisboa, Portugal) Nelson Neto (UFP, Brazil) Norton Roman (USP-EACH, Brazil) Nuno Cavalheiro Marques (Univ Nova Lisboa, Portugal) Nuno Mamede (Univ Lisboa, Portugal) Pablo Gamallo (Univ Santiago Compostela, Galiza, Spain) Palmira Marrafa (Univ Lisboa, Portugal) Paulo Gomes (Univ Coimbra, Portugal) Paulo Quaresma (Univ Évora, Portugal) Plínio Barbosa (UNICAMP, Brazil) Renata Vieira (PUC-RS, Brazil) Ricardo Ribeiro (ISCTE-IUL, Portugal) Rosa Del Gaudio (Higher Functions, Portugal) Ruy Luiz Milidiú (PUC-Rio, Brazil) Sandra Aluísio (USP-SC, Brazil) Sara Candeias (Microsoft, Portugal) Ted Pederson (Univ Minnesota, USA) Teresa Gonçalves (Univ Évora, Portugal) Thiago Pardo (USP-SC, Brazil) Thomas Pellegrini (Univ Toulouse III – Paul Sabatier, France) Valeria de Paiva (Nuance Communications, USA) Valéria Feltrim (UEM, Brazil) Vera Strube de Lima (PUC-RS, Brazil) Vítor Rocio (Univ Aberta, Portugal) Vládia Pinheiro (Univ Fortaleza, Brazil)

Page 8: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

7

PROGRAM AT A GLANCE

09h00 tutorial 09h00 3 oral pr 3 oral pr 09h00

09h30 T1 09h30 O1a O1b 09h30

10h00 10h00 10h00 2 oral pr 2 oral pr

10h30 10h30 10h30 O3a O3b

11h00 11h00 3 oral pr 3 oral pr 11h00

11h30 11h30 O2a O2b 11h30 3 oral pr 3 oral pr

12h00 12h00 12h00 O4a O4b

12h30 12h30 12h30

13h00 13h00 13h00

13h30 tutorial 13h30 13h30

14h00 T2 14h00 14h00

14h30 14h30 14h30

15h00 15h00 15h00

15h30 15h30 15h30 demos

16h00 16h00 16h00 coffee served

16h30 16h30 16h30

17h00 17h00 17h00

17h30 17h30 17h30

18h00 18h00 18h00

18h30 18h30 18h30

19h00 19h00 19h00

19h30 19h30 19h30

20h00 20h00 20h00

20h30 20h30 20h30

21h00 21h00 21h00

21h30 21h30 21h30

14 July, Thu

coffee break

15 July, Fri

invited talk

I2

welcome reception

13 July, Wed

lunch

coffee break

opening address

OA

invited talk

I1

awards ceremony

jobshop

coffee break

lunch

coffee served

student +

3 wksps

WS +

W1, 2, 3

posters

community meeting

farewell

coffee break

networking event

lunch

conference dinner

Page 9: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

8

Tutorial 1: Gaussian Processes for NLP Day 1, Wednesday, 13 July, 9h00 – 12h30

Room: Rio

T1: Gaussian Processes for Natural Language Processing Instructor: Daniel Beck (University of Sheffield, UK)

Abstract: As a case study, the tutorial will cover the basics of Quality Estimation (QE) [Blatz et al., 2004, Specia et al., 2009], where the goal is to predict a quality metric for unseen Machine Translation outputs. Since this metric is usually a real-valued score, regression models are common for this task and GPs are the state-of-the-art. The tutorial will start with a simple feature-based GP and then improving gradually on this model by incorporating different properties of GPs, including Automatic Relevance Determination (ARD), multi-output models [Cohn and Specia, 2013] and structural kernels [Beck et al., 2015]. The GPy1 toolkit will be used to demonstrate these models in practice. The final part of the tutorial will cover advanced topics and highlight other previous work in GPs for NLP applications. Outline: 1. GP Regression Fundamentals (60 mins) – From Gaussian Distributions to Processes – Kernels – Inference – Model Selection 2. NLP case study: Quality Estimation (50 mins) – Task Overview – Single Output – (coffee break) – Multi Output – Structural Kernels 3. Further Topics (50 mins) – Other NLP Applications – Advanced Models – Scalabilty – Optimization

Schedule:

09h00 – Start 10h30 – 11h00 – Coffee break, lobby Bar 12h30 – End 12h30 – 13h30 – Lunch break

Page 10: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

9

Tutorial 2: Translation Quality Estimation Day 1, Wednesday, 13 July, 13h30 – 16h00

Room: Rio

T2: Translation Quality Estimation

Instructors: Lúcia Specia (University of Sheffield, UK) and Carolina Scarton (University of Sheffield, UK)

Abstract: In this tutorial we will introduce the background and state of the art on translation quality estimation at different levels of granularity (word, sentence and document) and discuss open challenges in the area. We will then demonstrate QuEst++ (https://github.com/ghpaetzold/questplusplus), a framework for quality estimation, including how to set up an experiment and run the code and how to implement new features and add other machine learning algorithms to the pipeline.

Tutorial Structure:

This tutorial will be structured in 2h30 hours, as follows: • 1h: theoretical part, presenting the QE task, its levels of feature extraction and prediction and challenges. • 1h30: hands-on QuEst++ -how to install, run, add a new feature, add a new machine learning algorithm. Participants will be asked to install tools and dependencies on their laptops previously to the tutorial (e.g.: SRILM tool, TreeTagger).

Schedule:

13h30 – Start 16h00 – End 16h00 – 16h30 – Coffee break, lobby Bar

Page 11: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

10

Student Workshop Day 1, Wednesday, 13 July, 9h00 – 12h30

Foyer Infante I - Gualdim

WS: Student Research Workshop

Co-chairs: Pedro Balage (Priberam / University of São Paulo) and Fernando Batista (ISCTE-IUL,

Portugal)

Abstract: The Student Research Workshop (SRW) is designed to provide a venue for student researchers in Computational Linguistics and Natural Language Processing to present their work and receive feedback from an experienced researcher in the field.

Poster Presentations:

ArgMine: A Framework for Argumentation Mining Gil Rocha, Henrique Lopes Cardoso and Jorge Teixeira

A simple but potentially powerful approach for multilingual parsing Pablo Botton Da Costa, Helena de Medeiros Caseli and Fabio Natanael Kepler

BooViews: Aspect-based Sentiment Analysis on Product Reviews combining SVM and CRF in Portuguese Guilherme Nobre, Alan Justino, Fernando Tadao, Danilo Nunes, Daniel Takabayashi and Rayssa Küllian

Mapping Grammatical Structures onto Proficiency Levels Rui Talhadas

NEPAL: A Toll for Never-Ending Paraphrase Learning Paulo César Polastri, Helena De Medeiros Caseli and Eloize Rossi Marques Seno

Question Answering Based on Distributional Semantics Vladislav Maraev

Uncovering differences between synonyms and antonyms in Word Space Models Bruna Thalenberg

Schedule:

09h00 – Start 10h30 – 11h00 – Coffee break, lobby Bar 12h30 – End 12h30 – 13h30 – Lunch break

Page 12: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

11

Workshop: ASSIN Day 1, Wednesday, 13 July, 9h00 – 12h30

Room: Gualdim

W1: ASSIN: Avaliação de Similaridade Semântica e Inferência Textual

Chairs: Erick Fonseca (ICMC/University of São Paulo, Brazil), Sandra Aluísio (ICMC/University of São Paulo, Brazil), Marcelo Criscuolo (ICMC/University of São Paulo, Brazil), Leandro Santos (ICMC/University of São Paulo, Brazil)

Abstract: ASSIN (Avaliação de Similaridade Semântica e Inferência Textual) is an evaluation forum for two related and relevant tasks: semantic similarity and textual entailment recognition. It introduces a large-scale dataset annotated for both phenomena in Portuguese, allowing the development of machine learning-based NLP systems capable of solving them.

The task of measuring semantic similarity has been introduced in SemEval 2012 in the Semantic Textual Similarity (STS) track, and textual entailment recognition first appeared in the RTE Challenges. The SICK shared task brought both together in SemEval 2014, and now ASSIN presents both tasks with Portuguese data.

The ASSIN workshop will promote discussions on the subject, its difficulties and their possible solutions, as well as a comparison of the contribution of different computational techniques, tools and linguistic resources.

Program:

09h00 - 09h30 - Overview of the Shared Task and Corpus ASSIN

09h30 - 09h50 - Solo Queue

09h50 - 10h10 - Reciclagem

10h10 - 10h30 - Blue Man Group

11h00 - 11h20 - ASAPP

11h20 - 11h40 - LEC-UNIFOR

11h40 - 12h00 - L2F/UNESC-ID

12h00 - 12h30 - Talk by Hugo Gonçalo Oliveira (University of Coimbra, Portugal): Portuguese Lexical Knowledge Bases

Schedule:

09h00 – Start 10h30 – 11h00 – Coffee break, lobby Bar 12h30 – End 12h30 – 13h30 – Lunch break

Page 13: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

12

Workshop: LexSem+Logics Day 1, Wednesday, 13 July, 12h00 – 16h00

Room: Gualdim

W2: LexSem+Logics Workshop: Third Workshop on Logics and Ontologies / First Workshop on Lexical Semantics for Lesser-Resourced Languages

Chairs: Steven Neale (University of Lisbon, Portugal), Valeria de Paiva (Nuance Communications, USA), Arantxa Otegi (University of the Basque Workshop, Spain), Alexandre Rademaker (IBM Research Lab & FGV/EMAp, Brazil)

Abstract: LexSem+Logics 2016 combines the 1st Workshop on Lexical Semantics for Lesser-Resources Languages and the 3rd Workshop on Logics and Ontologies. Its aim is to bring together researchers interested in a range of topics across these two areas – advances in the recognition and disambiguation of lexical semantic units, exploiting new or existing tools for resolving lexical semantic issues in NLP, formal approaches on dealing with semantics, the combination of logical and statistical methods for acquiring and using ontologies for computational semantics, and many more.

Program:

12h00 – 12h30 - Invited Speaker: Hugo Gonçalo Oliveira (University of Coimbra, Portugal) – Portuguese Lexical Knowledge Bases (in conjunction with the ASSIN workshop.

13h30 - 14h00 - Plurality in Wordnets. Livy Real and Valeria de Paiva.

14h00 - 14h30 - Dicionário Creativo: The Construction of a Fuzzy Onomasiological Thesaurus from Multiple Sources. Felipe Islaji de Albuquerque and Hugo Gonçalo Oliveira.

14h30 - 15h00 - Universal POS Tagging for Portuguese: Issues and Opportunities. Valeria de Paiva and Livy Real.

15h00 - 16h00 - Invited Speaker: Pablo Gamallo (University of Santiago de Compostela, Spain) – Strategies for Open Information Extraction.

Schedule:

12h00 – Start 12h30 – 13h30 – Lunch break 16h00 – End 16h00 – 16h30 – Coffee break, lobby Bar

Page 14: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

13

Workshop: Corpora and Tools for Processing Corpora Day 1, Wednesday, 13 July, 9h00 – 16h00

Room: Infante I

W3: Corpora and Tools for Processing Corpora The workshop is co-organized by the QTLeap project.

Chair: Hilário Leal Fontes (DGT – European Commission)

Abstract: A great deal of the popularity of statistical machine translation solutions is due to the availability of software packages that are making increasingly easier and faster to train a working machine translation system. For this deployment to take place, these packages have been seen as just requiring to be fed with a sufficiently large volume of data, including some form of parallel corpora of raw text.

The present workshop seeks to contribute to improve on this state of affairs by helping to map both available parallel datasets suitable to feed statistical machine translation systems and available language processing tools useful for their preparation. While pursuing this goal, the workshop seeks also to exchange ideas and disseminate best practices that help to foster the ELRC and CEF.AT (http://www.lr-coordination.eu) initiatives.

Program:

Welcome and Introduction

09h00 - ELRC and CEF.AT initiatives and MT@EC — António Branco and Hilário Leal Fontes

09h30 - Processing of EU Multilingual Corpora — M.T. Carrasco

10h00 - Language processing in MT@EC — Hilário Leal Fontes

Multilingual resources

11h00 - CM2News: Towards a Corpus for Multilingual Multi-document Summarization — Ariani Di Felippo

11h30 - Language Resources and Processing Tools at the University of Lisbon in the NLX Group Collection — António Branco et al

12h00 - Language resources for information extraction and semantic computing – NLP at PUCRS — Renata Vieira et al

Page 15: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

14

Workshop: Corpora and Tools for Processing Corpora Day 1, Wednesday, 13 July, 9h00 – 16h00

Room: Infante I Task-specific resources

14h00 - MWE-aware corpus processing with the mwetoolkit and word embeddings — Aline Villavicencio et al

14h30 - ZAC: Zero Anaphora Corpus, A Corpus for Zero Anaphora Resolution in Portuguese — Jorge Baptista et al

Beyond machine translation

15h00 - Resources for Monolingual Translation: a case study of Text Simplification for Portuguese — Rodrigo Wilkens et al

15h30 - Building a Brazilian Portuguese – Brazilian Sign Language Parallel Corpus using Motion Capture Data — José Mario De Martino et al

16h00 - Discussion and wrap up

Schedule:

09h00 – Start (morning session) 10h30 – 11h00 – Coffee break, lobby Bar 12h30 – 13h30 – Lunch break 14H00 – Start (afternoon session) 16h00 – End 16h00 – 16h30 – Coffee break, lobby Bar

Page 16: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

15

Opening address: Camões Day 1, Wednesday, 13 July, 16h30

Room: Infante I

OA: Opening address Ana Paula Laborinho President, Camões – Institute for Cooperation and Language Schedule: 16h30 – Start 17h00 – End

Page 17: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

16

Invited talk 1: Dealing with Unwanted Information in Speech Day 1, Wednesday, 13 July, 17h00 – 18h30

Room: Infante I

I1: Dealing with Unwanted Information in Speech

Invited Speaker: Hynek Hermansky (Johns Hopkins University, Baltimore, Maryland, USA and Brno University of Technology, Czech Republic)

Abstract: Besides a message, speech carries information from a number of additional sources,

which introduce irrelevant variability (noise). As discussed in the talk, such noise comes in several

distinct forms. Noise with predictable effects can be often suppressed analytically, and we discuss

some techniques for doing so. Unpredictable effects of expected noise are typically successfully

dealt with by extensive training of a machine using noisy data. Such multi-style training is

currently the technique of choice in most practical situations, which is often hard to beat.

However, we argue for alternative adaptive multi-stream approaches, which could in principle

also deal with noises that are unexpected. Our current efforts in this direction are discussed.

Schedule: 17h00 – Start 18h30 – End

Page 18: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

17

Oral Presentations 01a: Speech processing or 01b: Corpora

Day 2, Thursday, 14 July, 09h00 – 10h30 Room: Infante I / Room Gualdim

01a Speech processing

An Automatic Phonetic Aligner for Brazilian Portuguese with a Praat Interface Gleidson Sousa and Nelson Neto

Evaluating Phonetic Spellers for User-generated Content in Brazilian Portuguese Gustavo Mendonça, Lucas Avanço, Magali Duran, Erick Fonseca, Maria Das Graças Nunes and Sandra Aluísio

Design and Analysis of a Database to Evaluate Children’s Reading Aloud Performance Jorge Proença, Dirce Celorico, Carla Lopes, Sales Dias Miguel, Michael Tjalve, Andreas Stolcke, Sara Candeias and Fernando Perdigão

01b Corpora

Lexical Semantics Annotation for Enriched Portuguese Corpora Steven Neale, Rita Pereira, João Silva and António Branco

Crawling by Readability Level Jorge Wagner Filho, Rodrigo Wilkens, Leonardo Zilio, Marco Idiart and Aline Villavicencio

Towards a Statistical-enriched Corpus Containing Portuguese Collocations in Use: reviewing possible extraction tools Ângela Costa and Luísa Coheur

Schedule:

9h00 – Start 10h30 – End 10h30 – 11h00 – Coffee break, lobby Bar

Page 19: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

18

Oral Presentations 02a: Lexical resources or 02b: Language processing applications

Day 2, Thursday, 14 July, 11h00 – 12h30 Room: Infante I / Room Gualdim

02a Lexical resources

LX-DSemVectors: Word Embeddings Resources for the Portuguese Language João Rodrigues, António Branco, Steven Neale and João Silva

Making a Virtue of Necessity: a Verb Lexicon Valeria De Paiva, Fabricio Chalub, Livy Real and Alexandre Rademaker

Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet Hugo Oliveira

02b Language processing applications

Improving Question-Answering for Portuguese using Triples Extracted from Corpora Ricardo Rodrigues and Paulo Gomes

Analysis of Temporal Adverbial Phrases for Portuguese-Chinese Machine Translation Siyou Liu and Ana Luísa Varani Leal

Applying Lexical-Conceptual Knowledge for Multilingual Multi-Document Summarization Ariani Di Felippo, Fabrício Tosta and Thiago Pardo

Schedule:

11h00 – Start 12h30 – End 12h30 – 13h30 – Lunch break

Page 20: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

19

Networking, Conference Dinner & Awards Ceremony Day 2, Thursday, 14 July, 14h00 – 24h00

Networking event

Conference dinner

Awards ceremony

Camões Prize The Camões Institute awards the “Camões Prize 2016 for the Technologies for the Portuguese Language” to the best paper in PROPOR 2016.

IILP Prize The International Institute for the Portuguese Language (IILP) awards a prize to the best PhD dissertation in PROPOR 2016.

Best MSc Dissertation Schedule:

14h00 – Start 18h00 – End 18h00 – 18h30 – Coffee break, lobby Bar 20h00 – Conference dinner 21h30 – Awards ceremony 24h00 – End

Page 21: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

20

Invited talk 2: Natural Language Understanding Day 3, Friday, 15 July, 09h00 – 10h00

Room: Infante I

I2: Natural Language Understanding using Knowledge Bases and Random Walks

Invited Speaker: Eneko Agirre (University of the Basque Country, Spain)

Abstract: One of the key challenges for creating the semantic representation of a text is mapping words found in a natural language text to their meanings. This task, Word Sense Disambiguation (WSD), is confounded by the fact that words have multiple meanings, or senses, dictated by their use in a sentence and the domain. We present an algorithm that employs random walks over the graph structure of knowledge bases, yielding state-of-the-art results for WSD on both general and biomedical texts, as well as in Named-Entity Disambiguation. We also show that the same algorithm can be successfully applied to Word Similarity and to enrich texts with related concepts, yielding improvements in Information Retrieval. Finally, we argue that knowledge-based approaches are complementary to other approaches, like supervised machine learning or unsupervised distributional embeddings.

Schedule:

9h00 – Start 10h00 – End

Page 22: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

21

Oral Presentations 03a: Deep language processing or 03b: Lexical processing

Day 3, Friday, 15 July, 10h00 – 11h00 Room: Infante I/ Room Gualdim

03a: Deep language processing

Automatic Semantic Role Labeling on Non-revised Syntactic Trees of Journalistic Texts Nathan Hartmann, Magali Duran and Sandra Aluísio

Syntax Deep Explorer José Correia, Jorge Baptista and Nuno Mamede

03b: Lexical processing

Entity Linking with Distributional Semantics Pablo Gamallo and Marcos Garcia

Extracting and Structuring Open Relations from Portuguese Text Sandra Collovini, Gabriel Machado and Renata Vieira

Schedule:

10h00 – Start 11h00 – End 11h00 – 11h30 – Coffee break, in lobby Bar

Page 23: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

22

Oral Presentations 04a: Semantic processing or 04b: Sentiment analysis

Day 3, Friday, 15 July, 11h30 – 13h00 Room: Infante I/ Room Gualdim

04a: Semantic processing

Towards Keyphrase Assignment for Texts in Portuguese Language Raquel Silveira, Vládia Pinheiro and Vasco Furtado

Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years Nathan Hartmann, Livia Cucatto, Danielle Brants and Sandra Aluísio

Improving Coreference Resolution with Semantic Knowledge Evandro Fonseca, Renata Vieira and Aline Vanin

04b: Sentiment analysis

Comparing Approaches for Subjectivity Classification: a Study on Portuguese Tweets Silvia Moraes, André Santos, Matheus Redecker, Rackel Machado and Felipe Meneguzzi

Determining the Level of Clients’ Dissatisfaction from their Commentaries Ana Forte and Pavel Brazdil

TOPIE: an Open-source Opinion Mining Pipeline to Analyze Consumers’ Sentiment in Brazilian Portuguese Ellen Souza, Tiago Alves, Ingryd Teles, Adriano Oliveira and Cristine Gusmão

Schedule:

11h30 – Start 13h00 – End 13h00 – 14h00 – Lunch break

Page 24: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

23

Jobshop and innovation forum Day 3, Friday, 15 July, 14h30 – 17h30

Room: Infante I

JIF: Jobshop and innovation forum

Chair: Rosa Del Gaudio (HF – Higher Functions, Portugal)

The 1st Jobshop and Innovation Forum (JIF) aims to create a fruitful environment to promote the exchange of results, talent and partnerships between researchers in the field of Language Technologies and companies using and developing these technologies. The overall goal is to support the development of common methodologies, resources, tools, applications and projects that can be shared among researchers and practitioners. The Innovation Forum offers companies and organizations the opportunity to showcase their most innovative ideas to a motivated audience looking for new market ideas, collaborators, and partners.

Participating companies at the 1st JIF - Jobshop and Innovation Forum:

Camões’ mission is to propose and implement the Portuguese cooperation policy and to coordinate activities undertaken by other public entities, to disseminate the Portuguese language and culture and to manage the foreign Portuguese teaching network.

Defined crowd is an intelligent data platform for Artificial Intelligence and Machine Learning, offering efficient data pipelines to collect process and enrich training data.

PC Medic excels in the Technical Support to the End User of the Information Technology Systems, thanks to Question Answering algorithms and Machine Translation.

Unbabel enables fast and scalable multilingual Customer Service by combining AI and a crowd of Human translators, helping the world communicate in multiple languages.

Voice Interaction develops speech processing technology which is used in a variety of areas and in a number of countries.

Microsoft and ISCTE present their common research projects, in particular a new project called ACP Street Libraries.

Schedule:

14h30 – Start 16h00 – Lunch break 17h30 – End

Page 25: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

24

Posters Session Day 3, Friday, 15 July, 14h30 – 17h30

Room: Infante I

Posters session

A Construction Grammar Approach for Pronominal Clitics in European Portuguese Tânia Marques and Katrien Beuls

A Comparative Evaluation of QA Systems over List Questions Patricia Gonçalves and António Branco

A Model for Textual Entailment Based on Linguistic Rules Sandro Rigo and Evandro Flores

Automatic Generation of Internet Memes from Portuguese News Titles Hugo Oliveira, Diogo Costa and Alexandre Pinto

Building a Question-Answering Corpus using Social Media and News Articles Paulo Cavalin, Flavio Figueiredo, Maira Bayser, Luis Moyano, Heloisa Candello, Ana Appel and Renan Souza

Characterizing Opinion Mining: a Systematic Mapping Study of the Portuguese Language Ellen Souza, Douglas Vitório, Dayvid Castro, Adriano Oliveira and Cristine Gusmão

Domain-Specific Hybrid Machine Translation from English to Portuguese João Rodrigues, Luís Gomes, Steven Neale, Andreia Querido, Nuno Rendeiro, Sanja Štajner, João Silva and António Branco

Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese Sandra Aluisio, André Cunha and Carolina Scarton

Finding Compositional Rules for Determining the Semantic Orientation of Phrases António Santos, Carlos Ramos and Nuno Marques

FrameNet-Based Automatic Suggestion of Translation Equivalents Simone Peron-Corrêa, Alexandre Diniz, Meire Lara, Ely Matos and Tiago Torrent

Improving POS Tagging Across Portuguese Variants with Word Embeddings Erick Fonseca and Sandra Aluisio

Investigating Machine Learning Approaches for Sentence Compression in Different Application Contexts for Portuguese Fernando Nóbrega and Thiago Pardo

Joining Forces for Multiword Expression Identification Leonardo Zilio, Rodrigo Wilkens, Luís Santos, Marco Idiart, Eric Wehrli and Aline Villavicencio

Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora Henrico Brum, Filipe Araujo and Fabio Kepler

The Portuguese B2SG: a semantic test for distributional thesaurus Rodrigo Wilkens, Leonardo Zilio, Eduaro Ferreira and Aline Villavicencio

Schedule:

14h30 – Start 16h00 – Coffee break 17h30 – End

Page 26: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

25

Demos Session Day 3, Friday, 15 July, 15h30 – 17h30

Room: Infante I

Demos Session

Chairs: Vládia Pinheiro (UNIFOR, Brazil) and Hugo Gonçalo Oliveira (CISUC, University of Coimbra, Portugal)

Accepted Demos:

A Computational Tool for Automated Language Production Analysis Aimed at Dementia Diagnosis – Sandra Aluísio, Andre Cunha, Cintia Toledo and Carolina Scarton

Annotating Portuguese Corpora with Word Senses Using LX-SenseAnnotator – Steven Neale and António Branco

CORP: Coreference Resolution for Portuguese – Evandro Fonseca, Renata Vieira and Aline Vanin

Hookit: natural language processing in a semantic based platform for social commerce – Sandro José Rigo, Vinicius Dambros Andrade and Denis Andrei Araújo

LetsRead – Tool to Automatically Evaluate Children’s Reading Aloud Performance – Jorge Proença, Dirce Celorico, Carla Lopes, Sara Candeias and Fernando Perdigão

NILC-WISE: An Easy-to-use Web Interface for Summary Evaluation with the ROUGE Metric – Fernando Antônio Asevedo Nóbrega and Thiago Alexandre Salgueiro Pardo

OpenWordnet-PT – Fabricio Chalub, Livy Real, Valeria de Paiva and Alexandre Rademaker

Poe, now you can TryMe: Interacting with a Poetry Generation System – Hugo Gonçalo Oliveira

Syntax Deep Explorer – José Correia, Jorge Baptista and Nuno Mamede

VITHEA-Kids: Improving the Linguistic Skills of Children with Autism Spectrum Disorder – Vânia Mendonça, Cláudia Filipe, Luísa Coheur and Alberto Sardinha

XCrimes: Information Extractor for the Public Safety and National Defense Areas – Daniel Sullivan, Vládia Pinheiro, Rafael Pontes and Vasco Furtado

Schedule:

15h30 – Start 16h00 – Coffee break 17h15 – Best Demo Award 17h30 – End

Page 27: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

26

Community meeting & Farewell Day 3, Friday, 15 July, 17h30 – 18h30

Room: Gualdim

Community meeting & Farewell

Schedule:

17h30 – Start 18h30 – End

Page 28: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

27

Workshop Abstracts Student Workshop

Day 1, Wednesday, 13 July

ArgMine: A Framework for Argumentation Mining Gil Rocha, Henrique Lopes Cardoso and Jorge Teixeira

Abstract: The aim of argumentation mining is the automatic detection and identification of the argumentative structure contained within a piece of natural language text. In this paper we present the ArgMine Framework: an alignment of tools and processes that facilitate and partially automate argumentation mining research. We also report on a preliminary exploitation of the framework, where we address argumentative zoning, a sub-task of argumentation mining, whose aim is to automatically select the zones of the text that contain argumentative content. The target corpus used to train the supervised machine learning algorithms was manually annotated and is composed of Portuguese news articles, a domain where argumentation mining does not seem to have been applied before. The results of our experiments are presented and critically analyzed.

A simple but potentially powerful approach for multilingual parsing Pablo Botton Da Costa, Helena de Medeiros Caseli and Fabio Natanael Kepler

Abstract: All approaches today for multilingual dependency parsing don't use any support of sister languages. This paper presents a very different approach to deal with data sparsity, language transfer, etc. Our approach really use sister languages theory, combining resources from that type of languages, we want a model who can really parse many different languages. In this paper, we present a new architecture who not use any type of oracles to make good parsing decisions. We hope that type of approach can really make dynamical parsing decisions.

Page 29: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

28

Workshop Abstracts Student Workshop

Day 1, Wednesday, 13 July

BooViews: Aspect-based Sentiment Analysis on Product Reviews combining SVM and CRF in Portuguese Guilherme Nobre, Alan Justino, Fernando Tadao, Danilo Nunes, Daniel Takabayashi and Rayssa Küllian

Abstract: Customers use product reviews to gather opinion regarding a product before making a purchase decision. Reviews are available in several e-commerce businesses and wrote by real customers. Companies can use these reviews to harness consumer information by artificial intelligence algorithms and automatically extract product information, such as review polarity and product aspects. In this paper, we cover techniques to classify reviews polarity, extract product aspects and classify them. The resulting SVM classifier got 91.7% of precision in classifying the sentiment of the reviews, 74.2% of F1-score using CRF to extract the product aspects and 79.9% of precision classifying aspect’s polarity.

Mapping Grammatical Structures onto Proficiency Levels

Rui Talhadas

Abstract: In the development of scientifically validated curricula that promote a consistent and appropriate learning process of progressive complexity, it is necessary to determine at what stage of this process are the students of Portuguese as a Foreign Language (PFL) linguistically prepared to learn and use the different language structures. This project intEnd to map the use of various grammatical and lexical structures, namely: (i) vocabulary; (ii) the use of verbal tenses and modes; (iii) the use of conjunctive adverbs, conjunctions and other discourse connectors; (iv) the internal sentence structures; and (v) the passive construction; in correlation with the learning levels defined in the Common European Framework of Reference for Languages (CEFR), and the evolution in the learning process of Portuguese as a Foreign Language.

Page 30: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

29

Workshop Abstracts Student Workshop

Day 1, Wednesday, 13 July

NEPAL: A Tool for Never-Ending Paraphrase Learning Paulo César Polastri, Helena De Medeiros Caseli and Eloize Rossi Marques Seno

Abstract: Use different words to express/convey the same message is an ordinary task in any language and one of the best ways to do so is using paraphrases. As a consequence, a proper treatment of paraphrases is crucial for several NLP applications, such as Machine Translation, Multidocument Summarization and Natural Language Generation. This paper describes the NEPAL, an automatic system capable of learning paraphrases involving single words in Brazilian Portuguese. The extraction is made from a bilingual parallel corpus composed of news that are available online. To do so, the NEPAL applies the never-ending machine learning approach [5] together with [2]. The experiment described in this paper show promising results achieving 86% of correctly extracted paraphrases

Question Answering Based on Distributional Semantics Vladislav Maraev

Abstract: An NLP application for question answering provides an insight into computer's understanding of human language. Many areas of NLP recently obtained number of improvements from deep learning and distributional semantic representation. This paper offers a view into the current work that considers replication of state-of-the-art results of applying distributional semantic models and convolutional neural networks to the question answering task.

Uncovering differences between synonyms and antonyms in Word Space Models Bruna Thalenberg

Abstract: This research will investigate means to uncover differences between synonym and antonym pairs in vector space models of semantics. One of the main criticisms of such models is that they would be incapable of distinguishing between related words, such as antonyms, and similar ones, namely synonyms and hypernyms. However, Scheible et al. (2013) proposed a method that could establish this difference, with promising but improvable results. We will adapt their study to Brazilian Portuguese, verifying if the results will be similar, and try to improve its accuracy. We will use the CETENFolha corpus, made available through Linguateca by the NILC research group, and the vectors will be created through gensim, the Python distribution of word2vec, originally created by Mikolov et al. (2013).

Page 31: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

30

Workshop Abstracts ASSIN: Avaliação de Similaridade Semântica e Inferência Textual

Day 1, Wednesday, 13 July

ASSIN: Evaluation of Semantic Similarity and Textual Inference Erick Rocha Fonseca, Leandro Borges dos Santos, Marcelo Criscuolo, Sandra Maria Aluísio

Abstract: Recognizing Textual Entailment (RTE) and Semantic Textual Similarity (STS) are two related natural language processing tasks dealing with pairs of text passages. The former aims to determine whether the meaning of one passage entails the other, while the latter assigns a similarity score to the pair. This paper presents the results of the ASSIN shared task and its corpus, annotated for both tasks in the two major varieties of the Portuguese language (Brazilian and European). The corpus differs from similar ones in the literature in that its RTE classes are Entailment, Neutral and Paraphrase, and in the fact that it is composed of sentences extracted from newswire texts. Six teams took part in the shared task, exploring different strategies for the tasks.

Solo Queue at ASSIN: Mix of a Traditional and an Emerging Approaches Nathan Hartmann

Abstract: In this paper we present a proposal to automatically label the similarity between a pair of sentences and the results obtained on ASSIN 2016 sentence similarity shared-task. Our proposal consists of using a classical feature of bag-of-words, the TF-IDF model; and an emergent feature, the word embeddings. The TF-IDF is used to relate texts which share words. Word embeddings are known by capture the syntax and semantics of a word. Following Mikolov et al. (2013), the sum of embedding vectors can compound the meaning of a sentence. Using both features, we are able to capture the words shared between sentences and their semantics. We use linear regression to solve this problem, once the dataset is labeled as real numbers between 1 and 5. Our results are promising. Although the usage of embeddings has not overcome our baseline system, when we combined it with TF-IDF, our system achieved better results than using only TF-IDF as a feature. Our results are the best of ASSIN 2016 sentence similarity shared-task.

Page 32: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

31

Workshop Abstracts ASSIN: Avaliação de Similaridade Semântica e Inferência Textual

Day 1, Wednesday, 13 July

Blue Man Group at ASSIN: Using Distributed Representations for Semantic Similarity and Entailment Recognition Luciano Barbosa, Paulo Cavalin, Victor Guimarães and Matthias Kormaksson

Abstract: In this paper, we present the methodology and the results obtained by our team, dubbed Blue Man Group, in the ASSIN (from the Portuguese Avaliação de Similaridade Semântica e Inferência Textual) competition, held at PROPOR 2016. Our team’s strategy consisted of evaluating methods based on semantic word vectors, following two distinct directions: 1) to make use of low-dimensional, compact, feature sets, and 2) deep learning-based strategies dealing with high-dimensional feature vectors. Evaluation results demonstrated that the first strategy was more promising, so that the results from the second strategy have been discarded. As a result, by considering the best run of each of the six teams, we have been able to achieve the best accuracy and F1 values in entailment recognition, in the Brazilian Portuguese set, and the best F1 score overall. In the semantic similarity task, our team was ranked second in the Brazilian Portuguese set, and third considering both sets.

ASAPP at ASSIN: Automatic Semantic Alignment for Phrases applied to Portuguese Ana Alves, Hugo Oliveira and Ricardo Rodrigues

Abstract: We present two distinct approaches to the ASSIN shared evaluation task where, given a collection with pairs of sentences, in Portuguese, poses the following challenges: (a) computing the semantic similarity between the sentences of each pair; and (b) testing whether one sentence paraphrases or entails the other. The first approach, dubbed Reciclagem, is exclusively based on heuristics computed on Portuguese semantic networks. The second, dubbed ASAPP, is based on supervised machine learning. The results of Reciclagem enable an indirect comparison of Portuguese semantic networks. They were then used as features of the ASAPP approach, together with lexical and syntactic features. After comparing our results with those in the gold collection, it is clear that ASAPP consistently outperforms Reciclagem. This happens both for European Portuguese and Brazilian Portuguese, where the entailment performance reaches an accuracy of 80.28% +/- 0.019, and the semantic similarity scores are 66.5% +/- 0.021 correlated with those given by humans.

Page 33: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

32

Workshop Abstracts ASSIN: Avaliação de Similaridade Semântica e Inferência Textual

Day 1, Wednesday, 13 July

LEC_UNIFOR at ASSIN: FlexSTS - A Framework for Semantic Textual Similarity Janio Freire, Vládia Pinheiro and David Feitosa

Abstract: Since 2012, Semantic Evaluation series (SemEval) propose the task of Semantic Textual Similarity (STS) as an evaluation theme, demonstrating the relevance of this research topic. In 2016, the task was first proposed to the Portuguese language, in the Workshop of Semantic Textual Similarity and Inference Evaluation (ASSIN), held during the conference PROPOR 2016. In this paper, we present the FlexSTS - a flexible framework for STS combining several components as morphological and syntactic parsers, knowledge and lexical databases, machine learning algorithms, and algorithms for alignment and similarity. For ASSIN, FlexSTS was instantiated into three STS systems for Portuguese. The results were compared with a baseline approach that uses DICE coefficient.

INESC-ID at ASSIN: Measuring Semantic Similarity and Recognizing Textual Entailment Pedro Fialho, Ricardo Marques, Bruno Martins, Luísa Coheur and Paulo Quaresma

Abstract: In this work we present INESC-ID@ASSIN, the system from INESC-ID that competed in the 2016 joint evaluation effort entitled “Avaliação de Similaridade Semântica e Inferência Textual” (ASSIN) in the tasks of semantic similarity and textual entailment recognition. INESC-ID@ASSIN addresses the problem of detecting sentence similarity as a regression task, and it addresses textual entailment as a classification task. Although INESC-ID@ASSIN relies mainly on simple lexical features for detecting paraphrases and recognizing textual entailment, promising results were achieved.

Page 34: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

33

Workshop Abstracts LexSem+Logics Workshop: Third Workshop on Logics and Ontologies First Workshop on Lexical Semantics for Lesser-Resourced Languages

Day 1, Wednesday, 13 July

Plurality in Wordnets Livy Real and Valeria de Paiva

Abstract: We investigate the features of Princeton WordNet associated with nouns that are essentially plural. This means exploring the Princeton WordNet feature classifiedByUsage: plural that labels synsets and words commonly used in the plural. We decided to investigate how this feature works for Portuguese and here we discuss the best way to encode this kind of lexical information in OpenWordnet-PT, an open wordnet for Portuguese.

Dicionário Creativo: The Construction of a Fuzzy Onomasiological Thesaurus from Multiple Sources Felipe Iszlaji de Albuquerque and Hugo Gonçalo Oliveira

Abstract: This paper reports on the construction of a fuzzy onomasiological thesaurus for Dicionário Criativo, an online website specialized in creative writing. In order to build this thesaurus, we merged distinct thesauri using similarity metrics and measuring the importance of each word inside the concept represented. To shape the concepts, we first used a clustering algorithm and then a graph-based technique. This process generates larger semantic groups, the core for our thesaurus and a helpful tool for creative writing. To rank their position inside a group, we relied on the frequency of each word in each semantic group.

Universal POS Tagging for Portuguese: Issues and Opportunities Valeria de Paiva and Livy Real

Abstract: Part-of-Speech (POS) tagging consists of labeling every token of a text with its correct morphosyntactic category and is considered by many a solved task in NLP. However, there are many tag systems in use, tags are not very easy to compare, there is no official golden standard and hence comparing performance of different systems is a nightmare, even for English. Much more so for less resourced languages. Recently a collective of researchers decided to tackle this issue and there is a new initiative, the Universal Dependencies project that is developing cross-linguistically consistent treebanks and annotations for many languages. We look at how the coarse categories of POS tags defined by the Universal Dependencies project would work for Portuguese and describe the issues of aligning them with the POS tags produced by FreeLing, the open source NLP system we use.

Page 35: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

34

Workshop Abstracts Workshop on Corpora and Tools for Processing Corpora

Day 1, Wednesday, 13 July

CM2News: Towards a Corpus for Multilingual Multi-document Summarization Ariani Di Felippo.

Abstract: This paper describes the ongoing construction of CM2News, a semantic-annotated corpus for fostering research on multilingual multi-document summarization. The corpus comprises 20 clusters of news texts in English and Brazilian Portuguese languages and a set of multi-document manual and automatic summaries. All the source texts have a layer of semantic annotation at lexical level. Some clusters also have annotation at sentence level, as well as alignment of texts and human summaries. The corpus is a result delivered within the context of the Sustento Project, which aims at generating linguistic knowledge for multi-document summarization. The corpus design and the manual annotation tasks are detailed in this paper.

Language Resources and Processing Tools at the University of Lisbon in the NLX Group Collection António Branco, João Silva, João Rodrigues, Francisco Costa, Pedro Martins, Eduardo Ferreira, Filipe Nunes, Sérgio Castro, Catarina Carvalheiro, Sílvia Pereira, Mariana Avelãs, Clara Pinto, Steven Neale, Andreia Querido, Rita de Carvalho, Marisa Campos, Rendeiro Nuno, Catarina Correia, Patrícia Gomes, Diana Amaral and Rita Valadas Pereira

Abstract: In this paper we present many of the language resources and processing tools developed and made available at the University of Lisbon by the NLX - Natural Language and Speech Group. These were developed over the years to support the development of a wide array of natural language applications, including machine translation.

Language resources for information extraction and semantic computing - NLP at PUCRS Renata Vieira, Daniela Do Amaral, Sandra Collovini, Evandro Fonseca, Artur Freitas, Larissa Freitas, Roger Leitzke Granada, Lucas Hilgert, Lucelene Lopes, Daniela Schmidt, Bernardo Severo, Marlo Souza and Cassia Trojahn.

Abstract: In this paper we present an overview of the language resources developed at the Natural Language Processing Lab at PUCRS, making them available to the research community.

Page 36: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

35

Workshop Abstracts Workshop on Corpora and Tools for Processing Corpora

Day 1, Wednesday, 13 July

MWE-aware corpus processing with the mwetoolkit and word embeddings Silvio Ricardo Cordeiro, Carlos Ramisch, Marco Idiart and Aline Villavicencio

Abstract: Multiword expressions (MWEs) are an integral part of language whose importance has long been recognized. However, their heterogeneous characteristics have proved a challenge to computational tasks and applications, including machine translation. In this paper we discuss how MWEs can be dealt with using the mwetoolkit, a language-independent platform for MWE related tasks. In particular, we concentrate on 3 tasks: (1) corpus processing for type identification from corpora, (2) token identification and corpus annotation, and (3) the construction of semantic distributional models for compositionality detection based on word embeddings. The mwetoolkit provides a uniform platform for creating MWE resources, and we discuss its use for both English and Portuguese MWE processing.

ZAC: Zero Anaphora Corpus, A Corpus for Zero Anaphora Resolution in Portuguese Jorge Baptista, Simone Pereira and Nuno Mamede

Abstract: This paper describes a corpus of Brazilian Portuguese texts built in view of the construction of an Anaphora Resolution system, which is part of a fully-fledged Natural Language Processing system (STRING). The ZAC corpus is aimed at the resolution of the so-called zero-anaphora, that is, an anaphora relation where the anaphoric expression (or anaphor) has been zeroed The paper briefly discusses the linguistic issues in the process of zero anaphora resolution, and describes the annotation process in detail, as well as the main aspects of the anaphoric relations thus annotated.

Resources for Monolingual Translation: a case study of Text Simplification for Portuguese Rodrigo Wilkens, Leonardo Zilio, Marco Idiart, Jorge Wagner Filho, Eduaro Ferreira, Luís Santos, Bianca Pasqualini and Aline Villavicencio

Abstract: Text simplification can be seen as a monolingual translation task, and for precise results various resources are needed. Thus, in this paper we examine resources available for Portuguese and English. Among them, we discuss simple and general corpora, dictionaries of simple words, thesauri, lists of multiword expressions, and resources containing semantic role labeling. The difference in terms of quantity and coverage of manually constructed resources for these two languages reveals the gap that still needs to be addressed for Portuguese.

Page 37: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

36

Workshop Abstracts Workshop on Corpora and Tools for Processing Corpora

Day 1, Wednesday, 13 July

Building a Brazilian Portuguese - Brazilian Sign Language Parallel Corpus using Motion Capture Data José Mario De Martino, Paula D. Paro Costa, Ângelo Benetti, Luciana Aguera Rosa, Kate Mamhy Oliveira Kumada and Ivani Rodrigues Silva

Abstract: Brazilian Sign Language, or Libras, is the language officially recognized as the first language of the Brazilian deaf community by a federal law. Nevertheless, deaf Brazilians still face considerable challenges to access public services or to advance their studies since most part of basic and advanced information is still only available in written Brazilian Portuguese (BP). In general, the knowledge of written BP by deaf citizens is far from satisfactory. In this context, automatic machine translation from BP into Libras is a promising approach to help deaf individuals to leverage their knowledge and represents a valuable option to reduce communication barriers especially in situations when a sign language interpreter is not available. This paper describes our approach to build a comprehensive BP-Libras parallel corpus. The approach combines a methodology based on the translation of school textbooks with a thorough description of sign gestures and facial expressions based on motion captured data. The methodology also seeks to handle the challenges of working with a sign language that still lacks school vocabulary.

Page 38: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

37

Main Conference Abstracts 01a Speech processing Day 2, Thursday, 14 July

An Automatic Phonetic Aligner for Brazilian Portuguese with a Praat Interface Gleidson Sousa and Nelson Neto

Abstract: The analysis of the phonetic entities of speech nearly always requires the alignment of an audio file with its phonetic transcription. However, it is an extremely labor-intensive task. An automatic alignment tool has modules that depend on the language and, while there are many public resources for some languages (e.g., English and French), the resources for Brazilian Portuguese (BP) are still limited. This work describes the development of an automatic phonetic alignment tool for BP, consisting of grapheme-to-phone converter, syllabification system and HTK-based acoustic models. This aligner is implemented and freely distributed as a plug-in of Praat. Performance tests are presented, comparing the current proposal with an existing tool.

Evaluating Phonetic Spellers for User-generated Content in Brazilian Portuguese Gustavo Mendonça, Lucas Avanço, Magali Duran, Erick Fonseca, Maria Das Graças Nunes and Sandra Aluísio

Abstract: Recently, spell checking (or spelling correction systems) has regained attention due to the need of normalizing user-generated content (UGC) on the web. UGC presents new challenges to spellers, as its register is much more informal and contains much more variability than traditional spelling correction systems can handle. This paper proposes two new approaches to deal with spelling correction of UGC in Brazilian Portuguese (BP), both of which take into account phonetic errors. The first approach is based on three phonetic modules running in a pipeline. The second one is based on machine learning, with soft decision making, and considers context-sensitive misspellings. We compared our methods with others on a human annotated UGC corpus of reviews of products. The machine learning approach surpassed all other methods, with 78.0% correction rate, very low false positive (0.7%) and false negative rate (21.9%).

Page 39: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

38

Main Conference Abstracts 01a Speech processing Day 2, Thursday, 14 July

Design and Analysis of a Database to Evaluate Children’s Reading Aloud Performance Jorge Proença, Dirce Celorico, Carla Lopes, Sales Dias Miguel, Michael Tjalve, Andreas Stolcke, Sara Candeias and Fernando Perdigão

Abstract: To evaluate the reading performance of children, human assessment is usually involved, where a teacher or tutor has to take time to individually estimate the performance in terms of fluency (speed, accuracy and expression). Automatic estimation of reading ability can be an important alternative or complement to the usual methods, and can improve other applications such as e-learning. Techniques must be developed to analyse audio recordings of read utterances by children and detect the deviations from the intended correct reading i.e. disfluencies. For that goal, a database of 284 European Portuguese children from 6 to 10 years old (1st-4th grades) reading aloud amounting to 20 hours was collected in private and public Portuguese schools. This paper describes the design of the reading tasks as well as the data collection procedure. The presence of different types of disfluencies is analysed as well as reading performance compared to known curricular goals.

Page 40: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

39

Main Conference Abstracts 01b Corpora

Day 2, Thursday, 14 July

Lexical Semantics Annotation for Enriched Portuguese Corpora Steven Neale, Rita Pereira, João Silva and António Branco

Abstract: The semantic annotation of corpora has an important role to play in ensuring that sentences occurring in natural language texts are correctly understood based on their intended context. Two examples of lexical semantic units that contribute to this knowledge are word senses – which allow words with multiple meanings to be understood based on the context in which they are used – and named entities – which can be disambiguated and linked back to the specific encyclopedic resources that describe them. In this paper, we describe the construction of lexical semantically-annotated corpora for Portuguese, annotated with both word senses linked to senses in a Portuguese wordnet and named entities linked to Portuguese Wikipedia entries using DBpedia. The result is a gold-standard lexical semantically-annotated resource that is useful in supporting the training and evaluation of tools for the disambiguation of these lexical units in Portuguese.

Crawling by Readability Level Jorge Wagner Filho, Rodrigo Wilkens, Leonardo Zilio, Marco Idiart and Aline Villavicencio

Abstract: Despite important applications and a large amount of publications, the availability of annotated corpora for research in the area of Readability Assessment is still very limited. On the other hand, the Web is increasingly being used by researchers as a source of written content to build very large and rich corpora, in theWeb as Corpus (WaC) initiative. This paper proposes a framework for automatic generation of large corpora classified by readability. It adopts a supervised learning method to incorporate a readability filter based in features with low computational cost to a crawler, to collect texts targeted at a specific reading level. We evaluate this framework by comparing a readability-assessed web crawled corpus to a reference corpus. The results obtained indicate that these features are good at separating texts from level 1 (initial grades) from other levels. As a result of this work two corpora were constructed: theWikilivros Readability Corpus, classified by grade level, and a crawled WaC classified by readability level.

Page 41: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

40

Main Conference Abstracts 01b Corpora

Day 2, Thursday, 14 July

Towards a Statistical-enriched Corpus Containing Portuguese Collocations in Use: reviewing possible extraction tools Ângela Costa and Luísa Coheur

Abstract: Collocations are a main problem for any natural language processing task, from machine translation to summarization. With the goal of building a corpus with collocations, enriched with statistical information about them, we survey, in this paper, four tools for extracting collocations. These tools allowed us to collect sentences with collocations, and also to gather statistics on this particular type of co-ocurrences, like Mutual Information and Log likelihood values.

Page 42: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

41

Main Conference Abstracts 02a Lexical resources

Day 2, Thursday, 14 July

LX-DSemVectors: Word Embeddings Resources for the Portuguese Language João Rodrigues, António Branco, Steven Neale and João Silva

Abstract: In this article we describe the creation of the first publicly available distributional semantic models for Portuguese. These are evaluated and compared against the original English models on the well-known analogies task. We gather a large collection of Portuguese corpora, with a total of 1.7 Billion tokens, the first Portuguese distributional semantic analogies test set and make way to the first Portuguese word embeddings model parametrization and evaluation.

Making a Virtue of Necessity: a Verb Lexicon Valeria De Paiva, Fabricio Chalub, Livy Real and Alexandre Rademaker

Abstract: We describe the verb lexicon of OpenWordNet-PT, a wordnet-like resource for (mostly Brazilian) Portuguese and a series of experiments that we designed to extend its coverage. These experiments include checking online lists of most common verbs, checking corpora freely available like the Bosque-Universal Dependencies corpus and especially checking a dictionary of Brazilian politicians' biographies (the DHBB) that we consider the ideal corpus for the kind of information extraction we are after. We certainly succeeded into extending the coverage of the verb lexicon, however it remains to be seen whether this new coverage is enough for the original application.

Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet Hugo Oliveira

Abstract: There are currently several lexical-semantic resources available for the computational processing of Portuguese, following different approaches and with different limitations.

This paper presents the first experiments towards the exploitation of seven of those resources in the automatic creation of a large wordnet, where numerical scores are assigned to each decision taken, namely the inclusion of words in synsets and the connection of synsets by semantic relations.

Experiments show that a large wordnet can indeed be created and, to some extent, computed scores can be used as a confidence measure, which will enable the users to select only a portion of the resource, depending on the needs of their application on quantity and quality of lexical-semantic knowledge.

Page 43: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

42

Main Conference Abstracts 02b Language processing applications

Day 2, Thursday, 14 July

Improving Question-Answering for Portuguese using Triples Extracted from Corpora Ricardo Rodrigues and Paulo Gomes

Abstract: We present here an evolution of a QA system for Portuguese that is uses subject-predicate-object triples extracted from sentences in a corpus. The system is supported by indices that store those triples, related sentences and documents. It processes the questions and retrieves answers based on the triples. For purposes of testing and evaluation, we have used the CHAVE corpus, used in multiple editions of the CLEF QA tracks. The questions from those editions were used to query and benchmark our system. Currently, the system manages to answer up to 42% of those questions. This document describes the modules that compose the system and how they are combined, providing a brief analysis on them, and also current results, as well as some expectations regarding future work.

Analysis of Temporal Adverbial Phrases for Portuguese-Chinese Machine Translation Siyou Liu and Ana Luísa Varani Leal

Abstract: Adverbial phrase (AdP) contains rich and indispensable information, however, translating them properly is one of big challenges for machine translation (MT). In this paper, we systematically present a contrastive analysis of MT of AdPs from Chinese to Portuguese. The study is conducted on The International Chinese Newsweekly corpus which consists of 46 Chinese texts (ST) with their respective Portuguese translations given by a state-of-the-art Portuguese-Chinese MT system and human beings (HT). By comparing the syntactic structures of these texts in ST, PCT, and HT sides, we found that nearly 90% MT outputs suffer structural inconsistency with lower translation qualities. Therefore, we discuss and propose a series of grammatical rules to address the problem. Finally, our oracle experiment indicates that there is a large space of improvement to integrate the proposed approach into MT systems. We believe that this work could inspire both MT researchers and industries to boost the performance of Portuguese-Chinese MT systems.

Page 44: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

43

Main Conference Abstracts 02b Language processing applications

Day 2, Thursday, 14 July

Applying Lexical-Conceptual Knowledge for Multilingual Multi-Document Summarization Ariani Di Felippo, Fabrício Tosta and Thiago Pardo

Abstract: We define Multilingual Multi-Document Summarization (MMDS) as the process of identifying the main information of a cluster with (at least) two texts, one in the user’s language and one in a foreign language, and presenting it as a summary in the user’s language. Although it is a relevant task due to the increasing amount of on-line information in different languages, there are only baselines for (Brazilian) Portuguese, which apply machine-translation to obtain a monolingual input and superficial features for sentence extraction. We report our investigation on the application of conceptual frequency measure to build a summary in Portuguese from a bilingual cluster (Portuguese and English). The methods tackle two additional challenges: using Princeton WordNet for nouns annotation and applying MT to translate selected sentences in English to Portuguese. The experiments were performed using a corpus of 20 clusters, and show that lexical-conceptual knowledge improves the linguistic quality and informativeness of extracts.

Page 45: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

44

Main Conference Abstracts 03a Deep language processing

Day 3, Friday, 15 July

Automatic Semantic Role Labeling on Non-revised Syntactic Trees of Journalistic Texts Nathan Hartmann, Magali Duran and Sandra Aluísio

Abstract: Semantic Role Labeling (SRL) is a Natural Language Processing task that enables the detection of events described in sentences and the participants of these events. For Brazilian Portuguese (BP), there are two studies recently concluded that perform SRL in journalistic texts. [1] obtained F1-measure scores of 79.6, using the PropBank.Br corpus, which has syntactic trees manually revised; [7], without using a treebank for training, obtained F1-measure scores of 68.0 for the same corpus. However, the use of manually revised syntactic trees for this task does not represent a real scenario of application. The goal of this paper is two-fold: to evaluate the performance of SRL on revised and non-revised syntactic trees using a larger and balanced corpus of BP journalistic texts. First, we have shown that [1]’s system also performs better than [7]’s system on the larger corpus. Second, the SRL system trained on non-revised syntactic trees performs better over non-revised trees than a system trained on gold-standard data.

Syntax Deep Explorer José Correia, Jorge Baptista and Nuno Mamede

Abstract: The analysis of the co-occurrence patterns between words allows for a better understanding of the use (and meaning) of words and its most straightforward applications are lexicography and linguist description in general. Some tools already produce co-occurrence information about words taken from Portuguese corpora, but few can use lemmata or syntactic dependency information. Syntax Deep Explorer is a new tool that uses several association measures to quantify several co-occurrence types, defined on the syntactic dependencies (e.g. subject, complement, modifier) between a target word lemma and its co-locates. The resulting co-occurrence statistics is represented in lex-grams, that is, a synopsis of the syntactically-based co-occurrence patterns of a word distribution within a given corpus. These lex-grams are obtained from a large-sized Portuguese corpus processed by STRING [19] and are presented in a user-friendly way through a graphical interface. The Syntax Deep Explorer will allow the development of finer lexical resources and the improvement of STRING processing in general, as well as providing public access to co-occurrence information derived from parsed corpora.

Page 46: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

45

Main Conference Abstracts 03b Lexical processing

Day 3, Friday, 15 July

Entity Linking with Distributional Semantics Pablo Gamallo and Marcos Garcia

Abstract: Entity Linking (EL) consists in linking name mentions in a given text with their referring entities in external knowledge bases such as DBpedia/Wikipedia. In this paper, we propose an EL approach whose main contribution is to make use of a knowledge base built by means of distributional similarity. More precisely, Wikipedia is transformed into a manageable database structured with similarity relations between entities. Our EL method is focused on a specific task, namely semantic annotation of documents by extracting those relevant terms that are linked to nodes in DBpedia/Wikipedia. The method is currently working for four languages: English, Portuguese, Spanish, and Galician. The Portuguese and English versions have been evaluated and compared against other EL systems, showing competitive range, close to the best systems.

Extracting and Structuring Open Relations from Portuguese Text Sandra Collovini, Gabriel Machado and Renata Vieira

Abstract: This paper presents the extraction and mining of open relations between named entities from Portuguese texts. We apply the Conditional Random Fields model for the extraction of any relation descriptor between named entities belonging to Person, Place and Organisation categories. We also show that the extracted relation descriptors can be better structured using data mining techniques.

Page 47: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

46

Main Conference Abstracts 04a Semantic processing

Day 3, Friday, 15 July

Towards Keyphrase Assignment for Texts in Portuguese Language Raquel Silveira, Vládia Pinheiro and Vasco Furtado

Abstract: Keyphrase assignment has often been confounded with keyphrase extraction, since the basic hypothesis is that a keyphrase of a text must be extracted from this text. Typically, keyphrase extraction approaches use a training set restricted to textual terms, reducing the learning capabilities of any inductive algorithm. Our research investigates ways to improve the accuracy of the keyphrase assignment systems for texts in Portuguese language by allowing classification algorithms to learn from non-textual terms as well. The basic assumption we have followed is that non-textual terms can be included into the training set by inference from an eventual semantic relationship with textual terms. In order to discover the latent relationship between non-textual and textual terms, we use deductive strategies to be applied in Portuguese common sense bases such as Wikipedia and InferenceNet. We show that algorithms that follow our approach outperform others that do not use the same methods introduced here.

Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years Nathan Hartmann, Livia Cucatto, Danielle Brants and Sandra Aluísio

Abstract: Recent research shows that most Brazilian students have serious problems regarding their reading skills. The full development of this skill is key for the academic and professional future of every citizen. Tools for classifying the complexity of reading materials for children aim to improve the quality of the model of teaching reading and text comprehension. For English, Feng’s work [12] is considered the state-of-art in grade level prediction and achieved 74% of accuracy in automatically classifying 4 levels of textual complexity for close school grades. There are no classifiers for nonfiction texts for close grades in Portuguese. In this article, we propose a scheme for manual annotation of texts in 5 grade levels, which will be used for customized reading to avoid the lack of interest by students who are more advanced in reading and the block- ing of those that still need to make further progress. We obtained 52% of accuracy in classifying texts into 5 levels and 74% in 3 levels. The results prove to be promising when compared to the state-of-art work.

Page 48: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

47

Main Conference Abstracts 04a Semantic processing

Day 3, Friday, 15 July

Improving Coreference Resolution with Semantic Knowledge Evandro Fonseca, Renata Vieira and Aline Vanin

Abstract: This paper evaluates the impact of semantic features for coreference resolution, which are based on current available semantic resources for the Portuguese language. We show a significant improvement in Precision, Recall and F-measure.

Page 49: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

48

Main Conference Abstracts 04b Sentiment analysis

Day 3, Friday, 15 July

Comparing Approaches for Subjectivity Classification: a Study on Portuguese Tweets Silvia Moraes, André Santos, Matheus Redecker, Rackel Machado and Felipe Meneguzzi

Abstract: In this paper, we compared the lexicon-based and machine learning-based approaches to define the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and algorithms SMO and Naive Bayes for this task. In our study, we used the Computer-BR corpus that contains messages about technology area. Our best results were obtained using the CMFS method for the feature selection and the SMO algorithm as the classifier. We achieved an accuracy of 78.51% when we included the polarities of words in the vectorial representation of tweets.

Determining the Level of Clients’ Dissatisfaction from their Commentaries Ana Forte and Pavel Brazdil

Abstract: We present a study in the area of sentiment analysis of clients’ commentaries transcribed by assistants of a help-desk service of one Portuguese telecommunications company. We have adopted a lexicon-based approach to determine the polarity of the sentiment of each commentary, based on the so called opinion words. This task was by no means easy, as not many tools are available for the Portuguese language. The initial results with the off-the-shelf solutions were rather poor. This has motivated us to carry out a number of enhancements, including, for instance, enriching the given lexicon with domain specific terms, formulating specific rules for negation and amplifiers. Quite surprisingly automatic pruning of some of the lexicon terms has led to a significant improvement in performance. As our final system achieved a very good performance, our work should be of interest to others working on domain specific solutions for languages where ready-made solutions are not available.

Page 50: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

49

Main Conference Abstracts 04b Sentiment analysis

Day 3, Friday, 15 July

TOPIE: an Open-source Opinion Mining Pipeline to Analyze Consumers’ Sentiment in Brazilian Portuguese Ellen Souza, Tiago Alves, Ingryd Teles, Adriano Oliveira and Cristine Gusmão

Abstract: The growth of social media and user-generated content (UGC) on the Internet provides a huge quantity of information that allows discovering the experiences, opinions, and feelings of users or customers. These electronic Word of Mouth statements expressed on the web are prevalent in business and service industry to enable a customer to share his/her point of view. However, it is impossible for humans to fully understand it in a reasonable amount of time. Opinion mining (also known as Sentiment Analysis) is a sub-field of text mining in which the main task is to extract opinions from UGC. Thus, this work presents an open source pipeline to analyze the costumer’s opinion or sentiment in Twitter about products and services offered by Brazilian companies. The pipeline is based on GATE framework and the proposed hybrid method combines lexicon-based, supervised learning, and rule-based approaches. Case studies performed on Twitter real data achieved precision of almost 70%.

Page 51: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

50

Main Conference Abstracts Posters session

Day 3, Friday, 15 July

A Construction Grammar Approach for Pronominal Clitics in European Portuguese Tânia Marques and Katrien Beuls

Abstract: Cliticization in European Portuguese (EP) is very distinct amongst other Romance languages in Europe (e.g Spanish, Italian, French). While preverbal and postverbal placement of clitics is present in other Romance languages, it is defined by the finiteness of the verb. In EP, however, the placement of pronouns postverbally does not depend on this characteristic of the verb, and it is instead triggered by specific

words and phrases that appear before it. Finding a common theory capable of explaining all these triggers has been shown to be hard due to their heterogeneous nature. In this paper, we look at this problem from a computational perspective, by presenting a fully functional fragment of the Portuguese grammar in Fluid Construction Grammar (FCG), able to ascertain if the clitic placement is valid and to produce grammatically correct sentences.

A Comparative Evaluation of QA Systems over List Questions Patricia Gonçalves and António Branco

Abstract: The evaluation of a Question Answering system is a challenging task. In this paper we evaluate our system, Self-reference, a Web-based QA System that focuses on answering list questions whose answers are extracted and composed from several documents retrieved from the Web against other QA systems. For the comparison, the results were analysed in two ways: (i) the quantitative evaluation of answers provides recall, precision and F-measure and (ii) the question coverage that indicate the usefulness of the system to the user counting the number of questions for which the system provides at least one correct answer. The evaluation brings interesting results that points a certain degree of complementary between different approaches.

Page 52: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

51

Main Conference Abstracts Posters session

Day 3, Friday, 15 July

A Model for Textual Entailment Based on Linguistic Rules Sandro Rigo and Evandro Flores

Abstract: This work proposes a model for recognition of textual entailment by presenting a new approach through the combined use of syntactic analysis, morphology, linguistic rules, detection of the bending voice, treatment of denial and the use of synonyms. A prototype was developed to evaluate the model proposed. The results, which are promising, allow the identification of textual linking of different textual samples accurately and with flexibility.

Automatic Generation of Internet Memes from Portuguese News Titles Hugo Oliveira, Diogo Costa and Alexandre Pinto

Abstract: This paper presents MemeGera, a prototype tool that generates image-based memes from Portuguese news titles. All is done automatically, with the help of computational linguistic resources, uncovered here with the rules for selecting images and adapting the text.

Building a Question-Answering Corpus using Social Media and News Articles Paulo Cavalin, Flavio Figueiredo, Maira Bayser, Luis Moyano, Heloisa Candello, Ana Appel and Renan Souza

Abstract: Is it possible to develop a reliable QA-Corpus using social media data? What are the challenges faced when attempting such a task? In this paper, we discuss these questions and present our findings when developing a QA-Corpus on the topic of Brazilian finance. In order to populate our corpus, we relied on opinions from experts on Brazilian finance that are active on the Twitter application. From these experts, we extract information from news websites that are used to populate answers in the corpus. Moreover, to effectively provide rankings of answers to questions, we employ novel deep-learning based similarity measures between short sentences (that accounts for both questions and Tweets). We validated the employed methods on a recently released dataset of similarity between short Portuguese sentences. More importantly, we also discuss how we can use word vector representations to match questions from real users to social media information, as well as rank answers to the provided questions based on news websites shared on Twitter.

Page 53: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

52

Main Conference Abstracts Posters session

Day 3, Friday, 15 July

Characterizing Opinion Mining: a Systematic Mapping Study of the Portuguese Language Ellen Souza, Douglas Vitório, Dayvid Castro, Adriano Oliveira and Cristine Gusmão

Abstract: Background: The growth of social media and user-generated content (UGC) on the Internet provides a huge quantity of information that allows discovering the experiences, opinions, and feelings of users or customers. However, it is impossible for humans to fully understand UGC in a reasonable amount of time. Opinion Mining (also known as Sentiment Analysis) is a sub-field of text mining in which the main task is to extract opinions from UGC. Aim: Given that the Portuguese language is one of the most common spoken languages in the world, and it is also the second most frequent language on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of Opinion Mining in the Portuguese language. Method: a systematic mapping review method was applied to search, select and to extract data from the included studies. Results: our manual and automated searches retrieved 6075 studies up to year 2014, from which 25 articles were included. Almost 70% of all approaches focus on the Brazilian Portuguese variant. Naïve Bayes and Support Vector Machine were the main classifiers and SentiLex-PT was the most used lexical resource. Conclusion: the number of publications has grown, research interest on Portuguese processing is shared mainly with Portugal and Brazil and few authors have employed benchmark datasets to evaluate their approaches so is not always possible to compare their results.

Domain-Specific Hybrid Machine Translation from English to Portuguese João Rodrigues, Luís Gomes, Steven Neale, Andreia Querido, Nuno Rendeiro, Sanja Štajner, João Silva and António Branco

Abstract: Machine translation (MT) from English to Portuguese has not typically received much attention in existing research. In this paper, we focus on MT from English to Portuguese for a specific domain (IT), building a small in-domain parallel corpus to address the lack of IT-specific and publicly-available parallel corpora and then adapted an existing hybrid MT system to the new language pair (English to Portuguese). We further improved the initial version of the EN-PT hybrid system by adding various modules to address the most frequently occurring errors in the initial system. In order to assess the improvements achieved by each of these dedicated modules, we compared all versions of our MT system automatically. In addition, we conduct and report on a detailed error analysis of the initial and final versions of our system.

Page 54: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

53

Main Conference Abstracts Posters session

Day 3, Friday, 15 July

Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese Sandra Aluisio, André Cunha and Carolina Scarton

Abstract: Automated discourse analysis aiming at the diagnosis of language impairing dementias already exist for the English language, but no such work had been made for Portuguese. Here, we describe the results of creating a unified environment, entitled X, based on a previous tool to analyze discourse, named Coh-Metrix-Port. After adding 25 new metrics for measuring syntactical complexity, idea density, and text cohesion through latent semantics, X extracts 73 features from narratives of normal aging (CTL), Alzheimer's Disease (AD), and Mild Cognitive Impairment (MCI) patients. This paper presents initial experiments in automatically diagnosing CTL, AD, and MCI patients from a narrative language test based on sequenced pictures and textual analysis of the resulting transcriptions. In order to train regression and classification models, the large set of features in X must be reduced in size. Three feature selection methods are compared. In our experiments with classification, it was possible to separate CTL, AD, and MCI with 0.817 F1 score, and separate CTL and MCI with 0.900 F1 score. As for regression, the best results for MAE were 0.238 and 0.120 for scenarios with three and two classes, respectively.

Finding Compositional Rules for Determining the Semantic Orientation of Phrases António Santos, Carlos Ramos and Nuno Marques

Abstract: The semantic compositionality principle states that the meaning of an expression can be determined by its parts and the way they are put together. Based on that principle, this paper presents a method for finding the set of compositional rules that best explain the positive, negative, and neutral semantic orientation (SO) of two-word phrases, in terms of the SO of its words. For instance, the phrase “fake contract” has a negative SO. A corpus was built for evaluating the proposed method and several experiences are reported. We also use the conditional probability as a reliability measure of the compositional rules. The reliability of the learned rules ranges from 60.44% for verb-noun phrases to 100% for adjective-adjective phrases.

Page 55: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

54

Main Conference Abstracts Posters session

Day 3, Friday, 15 July

FrameNet-Based Automatic Suggestion of Translation Equivalents Simone Peron-Corrêa, Alexandre Diniz, Meire Lara, Ely Matos and Tiago Torrent

Abstract: This paper presents an application developed for automatically suggesting translation equivalents in a frame-based domain specific trilingual electronic dictionary covering the domains the World Cup and Tourism. By comparing the syntactic and semantic affordances of a lexical unit in the source language with those shown by all lexical units evoking the same frame in a target language, the application suggests which of them is the best-fit translation equivalent. The application contributes to the purpose of bringing to scale the development of frame-based multilingual lexical databases. We also discuss the current limitations of the application, especially those concerned with nouns denoting entities. We then propose to enrich FrameNet with ontologies and qualia structure as a means of enhancing machine translation for entity nouns.

Improving POS Tagging Across Portuguese Variants with Word Embeddings Erick Fonseca and Sandra Aluisio

Abstract: Brazilian Portuguese (BP) and European Portuguese (EP) have specific NLP resources and tools for many tasks. It is generally agreed upon that applying them to the variant other than their intended one results in a performance drop; however, very little research has measured it. We evaluated a POS tagger in a cross-variant setting under multiple combinations of word embeddings, train and test corpora, and found that (i) BP is easier than EP, (ii) word embeddings help increase tagger performance significantly, but not enough to close the accuracy gap in a cross-variant setting and (iii) embeddings generated from a corpus with both variants are useful in cross-variant scenarios. While we cannot generalize observations from POS tagging to any NLP task, this is an important first step for such evaluations.

Investigating Machine Learning Approaches for Sentence Compression in Different Application Contexts for Portuguese Fernando Nóbrega and Thiago Pardo

Abstract: Sentence Compression aims to produce a shorter version of an input sentence and it is very useful for many Natural Language applications. However, investigations in this field are frequently task focused and for English language. In this paper, we report machine learning approaches to compress sentences in Portuguese. We also analyze different application contexts and the respective available features. Our experiments show good results that are better than previous investigations in this field.

Page 56: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

55

Main Conference Abstracts Posters session

Day 3, Friday, 15 July

Joining Forces for Multiword Expression Identification Leonardo Zilio, Rodrigo Wilkens, Luís Santos, Marco Idiart, Eric Wehrli and Aline Villavicencio

Abstract: Multiword Expressions (MWEs) display syntactic and statistical markedness, among other characteristics, that may influence the effectiveness of techniques that automatically identify them in texts. While parsing-based techniques for MWE identification are considered to be better at handling long-distance dependencies, passivization and internal modification, statistics-based techniques use association measures to detect statistical markedness regardless of syntactic form. In this paper we compare these two approaches focusing on nominal compounds in Portuguese. We compare the accuracy of each method and propose that combining the strengths of both increases precision.

Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora Henrico Brum, Filipe Araujo and Fabio Kepler

Abstract: With the growth of social networks and internet services, the need of Sentiment Analysis tasks are being required for companies in attempt to know what users are talking about some product or service. Therefore, the goal of this paper is to do a comparison study between techniques that are being used in sentiment analysis tasks, such as Naive Bayes, Doc2Vec and Recursive Neural Tensor Network models using a Brazilian Portuguese corpus. Naive Bayes showed the highest accuracy for the Pos/Neg classification when applying the under-sampling technique.

The Portuguese B2SG: a semantic test for distributional thesaurus Rodrigo Wilkens, Leonardo Zilio, Eduaro Ferreira and Aline Villavicencio

Abstract: The lack of availability of gold standards for evaluation of distributional thesauri is a stumbling block that prevents a direct comparison of alternative approaches in a uniform way. In this paper, we present B$^2$SG, a TOEFL-like task for Portuguese that contains 2875 test items including various semantic relations like synonyms, antonyms an hypernyms for nouns and verbs. The resource is evaluated intrinsically comparing it with the Onto.PT and collecting human judgments about its quality. Additionally we used it in the evaluation of distributional thesauri. We built two thesauri: one from surface forms and a second from lemmatized forms. The results obtained are that the thesaurus generated from lemmas is more accurate, even if smaller, than the one from surface forms.

Page 57: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

56

Demos Abstracts Demos Session

Day 3, Friday, 15 July

A Computational Tool for Automated Language Production Analysis Aimed at Dementia Diagnosis

Sandra Aluísio, Andre Cunha, Cintia Toledo and Carolina Scarton

Abstract: Coh-Metrix-Dementia is a unified computational environment that allows for the automated analysis of language production in clinical (and similar) settings. In its current version, the tool is composed of an underlying Python library – built on top of several Natural Language Processing (NLP) tools for the Portuguese language – that can be used as a component inside other applications, and a web interface, suitable for use by health professionals. From a transcribed or written text sample produced by a subject, Coh-Metrix-Dementia can extract 73 textual metrics, comprising several levels of linguistic analysis from word counts to semantics and discourse, which can be used in language evaluation. The tool has been evaluated in automated classification tasks involving dementia patients, where it demonstrated to have a high classification accuracy, and is freely available under the GPLv3 license.

Annotating Portuguese Corpora with Word Senses Using LX-SenseAnnotator Steven Neale and António Branco

Abstract: This paper describes LX-SenseAnnotator, an accessible and easy-to-use interface tool for manual annotating text with word senses. We demonstrate how the tool was used to manually annotate the CINTIL-WordSenses corpus, outlining the loading and browsing of Portuguese texts and how word senses themselves are selected and assigned. We also describe the potential for LX-SenseAnnotator to be adapted for other languages besides Portuguese.

CORP: Coreference Resolution for Portuguese Evandro Fonseca, Renata Vieira and Aline Vanin

Abstract: This paper describes CORP, an open source, off-the-shelf noun phrase coreference resolver for Portuguese with a web interface.

Page 58: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

57

Demos Abstracts Demos Session

Day 3, Friday, 15 July

Hookit: natural language processing in a semantic based platform for social commerce Sandro José Rigo, Vinicius Dambros Andrade and Denis Andrei Araújo

Abstract: This demo will exhibi the practical results of the application of Natural Language Processing resources in a social e-commerce platform, called "Hookit". The platform itself comprises several modules that allow, in first place, the integration of several products in a friendly and resourceful environment. The users interaction with this environment is processed in modules that implement machine learning algorithms to support personalized recommendation of products.

LetsRead – Tool to Automatically Evaluate Children’s Reading Aloud Performance Jorge Proença, Dirce Celorico, Carla Lopes, Sara Candeias and Fernando Perdigão

Abstract: This demo presents a web-based platform that analyzes speech of read utterances of children, aged 6-10 years old, from the 1st to 4th grades, to automatically evaluate their reading aloud performance. It operates by detecting and analyzing errors and disfluencies in speech. It provides some metrics that are used for computing a reading ability index and shows how close it is to the index given by expert evaluators for that child. Although this demo is not targeted to the participation of children, as pre-recorded utterances are used, the same methods will be applied to live reading tasks with microphone input. A fully developed application will be useful in aiding and complementing the current manual and subjective methods for evaluation of overall reading ability in schools.

NILC-WISE: An Easy-to-use Web Interface for Summary Evaluation with the ROUGE Metric Fernando Antônio Asevedo Nóbrega and Thiago Alexandre Salgueiro Pardo

Abstract: NILC-WISE is an easy-to-use web application for summary evaluation that provides resources and tools to evaluate summaries written in Portuguese, using the ROUGE metric. Its purpose is to be a default experiment environment, contributing to the summarization area.

Page 59: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

58

Demos Abstracts Demos Session

Day 3, Friday, 15 July

OpenWordnet-PT Fabricio Chalub, Livy Real, Valeria de Paiva and Alexandre Rademaker

Abstract: This demo introduces OpenWordnet-PT (OWN-PT), an open wordnet for Portuguese. We will explore its web interface which offers an easy way for regular Internet users to utilise it. We also give a quick introduction to how to use RDF and SPARQL in the context of the OWN-PT.

Poe, now you can TryMe: Interacting with a Poetry Generation System Hugo Gonçalo Oliveira

Abstract: This demo presents two ways of interacting with PoeTryMe, a poetry generation system. The TryMe web interface communicates with a REST API for a simpler version of PoeTryMe that enables the generation of poems according to four parameters. A Portuguese instantiation of PoeTryMe is also continuously running as a Twitterbot.

Syntax Deep Explorer José Correia, Jorge Baptista and Nuno Mamede

Abstract: Syntax Deep Explorer is a new tool that uses several association measures to quantify several co-occurrence types, defined on the syntactic dependencies (e.g. subject, complement, modifier) between a target word lemma and its co-locates. The resulting co-occurrence statistics is represented in lex-grams, that is, a synopsis of the syntactically-based co-occurrence patterns of a word distribution within a given corpus. These lex-grams are obtained from a large-sized Portuguese corpus processed by STRING and are presented in a user-friendly way through a graphical interface. The Syntax Deep Explorer will allow the development of finer lexical resources and the improvement of STRING processing in general, as well as providing public access to dependency-based, co-occurrence information derived from parsed corpora.

Page 60: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

59

Demos Abstracts Demos Session

Day 3, Friday, 15 July

VITHEA-Kids: Improving the Linguistic Skills of Children with Autism Spectrum Disorder Vânia Mendonça, Cláudia Filipe, Luísa Coheur and Alberto Sardinha

Abstract: Each child with Autism Spectrum Disorder (ASD) has a unique set of abilities, symptoms and needs; hence, educational applications should allow to tailor exercises’ content and options. However, most existing applications do not take this requirement into account. In this work, we present VITHEA-Kids, a platform that takes advantage of language and speech technologies tools to allow children with ASD to benefit from a customized learning experience.

XCrimes: Information Extractor for the Public Safety and National Defense Areas Daniel Sullivan, Vládia Pinheiro, Rafael Pontes and Vasco Furtado

Abstract: The increased volume of notifications about crimes or attempted crimes is a vast textual material to support public safety policies, and the reading, mining and analysis of all the textual volume of police reports are very time consuming tasks. In this scenario, subsidized by research and innovation project of the Department of Science and Technology of the State of Ceará (SECITECE), we are developing the XCRIMES tool - which allows automatic extraction of information about crimes from textual reports of public safety. XCRIMES uses the semantic knowledge base in Portuguese, InferenceNet, and the Semantic-Inferentialist Analyzer - SIA - to reason about the text and draw conclusions in order to leverage the human expertise, automating the information extraction process about the characteristics of crimes. In this software demonstration, it will be presented how XCrimes extracts information about the type of crime and crime scene.

Page 61: Conference Programme - PROPOR 2016propor2016.di.fc.ul.pt/wp-content/uploads/2016/07/Conference-Progr… · Conference Programme PROPOR 2016 – 15, July , 2016 Tomar, Portugal. 1

We thank our Supporters and Institutional partners

Supporters:

Camões Prize 2016 for the Technologies for the Portuguese Language:

I

Best dissertation award sponsor:

I

Institutional partners: