Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno,...

33
Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines Lourdes Moreno * Paloma Martínez, Isabel Segura-Bedmar and Ricardo Revert Grupo LaBDA Departamento de Informática Universidad Carlos II de Madrid (*) [email protected] Vilanova I la Geltrú (Universitat Politècnica Catalunya ), septiembre 2015 Reference ACM Digital Library: http://dl.acm.org/citation.cfm?id=2829927&CFID=573822944&CFTOKEN=54544041

Transcript of Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno,...

Page 1: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines

Lourdes Moreno * Paloma Martínez, Isabel Segura-Bedmar and Ricardo Revert Grupo LaBDA Departamento de Informática Universidad Carlos II de Madrid (*) [email protected]

Vilanova I la Geltrú (Universitat Politècnica Catalunya ), septiembre 2015

Reference ACM Digital Library: http://dl.acm.org/citation.cfm?id=2829927&CFID=573822944&CFTOKEN=54544041

Page 2: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

Contents

• Motivation and introduction

• EASY-TO-READ (E2R) Guidelines

• WCAG 2.0: readability and understandability

• Natural language processing (NLP) approaches for text simplification

• Proof of Concept: Lexical Simplification of Drug Package Leaflets

• Conclusions

LaBDA, Universidad Carlos III de Madrid

Page 3: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

MOTIVATION

• Part of citizenship faces accessibility barriers when texts containing:

long sentences

unusual words

complex linguistic structures

• Environment: web content

• Readability and understanding should be considered when texts are created

LaBDA, Universidad Carlos III de Madrid

Page 4: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

INTRODUCTION Target groups

• People with cognitive or learning disabilities

• Also:

Pre lingually deaf persons

Older people (Individual cognitive abilities such as attention span and memory)

Non-alphabetized people

Immigrants (different native language)

People with aphasia, dyslexia, autism

LaBDA, Universidad Carlos III de Madrid

Page 5: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

INTRODUCTION Initiatives

• Easy-to-Read (E2R)

Inclusion Europe 2009

Guidelines of IFLA 2010

• Web Content Accessibility Guidelines (WCAG) 2.0

Regulatory framework

Hard Success criteria

Conformance level AA

LaBDA, Universidad Carlos III de Madrid

Page 6: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

EASY-TO-READ (E2R) Guidelines

• In general terms these guidelines are:

Use simplest and most common words

Avoid long words

Avoided use of abbreviations

The same term used to refer to the same concept

Use short sentences

Avoid complex sentences with dependent clauses

Use active language and avoid passive voice

LaBDA, Universidad Carlos III de Madrid

Page 7: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

EASY-TO-READ (E2R) Guidelines What can be done?

• To make online texts more accessible and readable

• In complex words or phrases are replaced with more commonly used words

• These adaptations are carried out with the use of text simplification techniques:

www.noticiasfacil.es www.e-include.info/ simple.wikipedia.org/

www.simplext.es/

• Manual process? In some cases it is unfeasible

• Support Technology

LaBDA, Universidad Carlos III de Madrid

Page 8: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

EASY-TO-READ (E2R) Guidelines

• These E2R guidelines are aimed only to text content.

• In addition: page structure, presentation, …

=> For this reason, accessibility requirements of WCAG 2.0 must be taken into account

LaBDA, Universidad Carlos III de Madrid

Page 9: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

WCAG 2.0: READABILITY AND UNDERSTANDABILITY

understandable vs readability

“a text could be highly readable, since the syntax is extremely simple, but extremely hard to understand because of the lexicon used”

Readability gives an evaluation about the structure of sentences (it concerns syntax and consequently requires syntactic simplification approaches)

understandability captures the lexical aspects and lexical simplification approaches are required

LaBDA, Universidad Carlos III de Madrid

Page 10: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

WCAG success criteria concerning text

• 3.1 (Readable: Make text content readable and understandable)

Readability - 3.1.5 (Reading Level)

Understandable - 3.1.3 (Unusual Words) and 3.1.4 ( Abbreviations)

Code (Level Conformance)

Description

1.1.1 Non-text

Content (Level A).

Every non-text content that is presented to the

user has a alternative text that serves the equivalent purpose

2.4.2 Page Titled

(Level A).

Web pages have titles that describe topic or

purpose.

2.4.4 Link Purpose (In

Context):

(text type)

The purpose of each link can be determined

from the link text alone or from the link text together with its programmatically determined link context

2.4.6 Headings and Labels (Level AA).

Headings and labels describe topic or purpose.

2.4.9 Link Purpose (Link Only) (Level AAA).

(text type)

A mechanism is available to allow the purpose of each link to be identified from link text alone, except where the purpose of the link would be ambiguous to users in general.

2.4.10 Section Headings (Level AAA).

Section headings are used to organize the content.

3.1.1 Language of Page (Level A).

The default human language of each Web page can be programmatically determined.

3.1.2 Language of Parts (Level AA).

The human language of each passage or phrase in the content can be programmatically determined.

3.1.3 Unusual Words (Level AAA).

A mechanism is available for identifying specific definitions of words or phrases used in an unusual.

3.1.4 Abbreviations (Level AAA).

A mechanism for identifying the expanded form or meaning of abbreviations is available.

3.1.5 Reading Level (Level AAA).

When text requires reading ability more advanced than the lower secondary education level after removal of proper names and titles,

supplemental content, or a version that does not require reading ability more advanced than the lower secondary education level, is available.

LaBDA, Universidad Carlos III de Madrid

Page 11: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

WCAG 2.0: READABILITY AND UNDERSTANDABILITY Additional accessibility requirements

• WCAG 2.0 document does not specify guidelines to these matters as concerning visual or auditory accessibility

• A set of additional WCAG 2.0 success criteria has been obtained regarding the presentation, navigation, structure, cognitive aspects in user task,…

• Some of these additional success criteria are:

1.4.8 (Visual Presentation)

2.2.3 (No Timing)

2.4.5 (Multiple Ways)

3.2.3 (Consistent Navigation)

3.2.4 (Consistent Identification)

2.2.3 (No Timing)

3.3.1 (Error Identification)

3.3.2 (Labels or Instructions)

3.3.5 (Help)

LaBDA, Universidad Carlos III de Madrid

Page 12: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

WCAG 2.0: READABILITY AND UNDERSTANDABILITY Discussion and conclusions

• No correspondence between concepts in E2R guidelines and success criteria of WCAG 2.0

=> The professional closely to the field of the accessibility conformity WCAG does not know how to accomplish requirements E2R

• Aside from WCAG 2.0 regarding the text, further accessibility features should be considered

• WCAG 2.0 support is not enough

• Technology supporting the authorship of texts is required

LaBDA, Universidad Carlos III de Madrid

Page 13: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

WCAG 2.0: READABILITY AND UNDERSTANDABILITY Discussion and conclusions

• Proposal:

PLN approaches with a use of E2R and WCAG 2.0 resources provide the semi-automatic support

Different NLP strategies to simplify texts depending on whether you want to analyse understandable or readability

LaBDA, Universidad Carlos III de Madrid

Page 14: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

Natural language processing (NLP)

• The discipline devoted to develop technology to understand natural language

• Applications:

Machine translation

Information retrieval

Information extraction from unstructured data

Summarization

Question answering

….

LaBDA, Universidad Carlos III de Madrid

Page 15: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Support to accessibility

• NLP processes are applied with the objective of transforming a text in an equivalent one, but more accessible to people with any kind of cognitive disability

• Three NLP processes that could be applied to text simplification tasks are described:

Language detection

Abbreviations detection

Topic detection

LaBDA, Universidad Carlos III de Madrid

Page 16: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Language detection

• Language detection consists on identifying the language of a text

• It is helpful for example: when screen readers are used

• Approaches:

To find out it is to check if language-specific characters, (e.g. Dutch if string “ik” appears, German is “ich” or “β” is used, Polish if “czy” or “ń”, “Ł”, “ź” are included in words)

To use n-grams frequency distributions. All languages have words that occur more frequently than others (Zipf´s Law)

• if two texts of a same language are compared then they should have similar n-grams frequency distributions)

LaBDA, Universidad Carlos III de Madrid

Page 17: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Abbreviations

• Approaches to recognized abbreviations and corresponding expansions:

Pattern-matching methods based on rules and heuristics to detect upper alphanumeric strings

• To identify Long form (short form) or Short form (long form)

A sequence of words co-occurs frequently with an abbreviation and the sequence does not occur with other near words => it is an “abbreviation-definition” relationship.

LaBDA, Universidad Carlos III de Madrid

Page 18: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Text summarization or topic detection

• Goal : to obtain a set of sentences that reflects the content

• This technique offers accessibility support to editors of web contents to create:

Titles of paragraphs Sections that faithfully represent the content

• Approach:

Automatic text extraction: considering relevant sentences of a text has a big amount of important words

The importance of a word is calculated with a measure that relies on how frequent is a word in a document and in how many documents from a collection the word appears.

LaBDA, Universidad Carlos III de Madrid

Page 19: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Text Simplification

• It is essential in several types of texts: News, Government and administrative information, laws and rights, etc.

• There are three subtasks of text simplification

1 Syntactic simplification that divides complex sentences in simplest sentences

2 Lexical simplification whose objective is to replace complex vocabulary by common vocabulary

3 Clarification that provides definitions and explanations.

These tasks are not completely automatic, they have to be manually reviewed in some cases.

LaBDA, Universidad Carlos III de Madrid

Page 20: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Text Simplification

Lexical simplification:

• Replacing words (taking into account the context) and complex utterances by easier words or phrases.

• Heuristic: complex words have a low frequency

• Proposals based on frequency give better results compared to other sophisticated systems [Semeval 2012]

• Resource: lexical resources as Wordnet are used to extract synonyms as candidates to replace a complex or difficult word.

LaBDA, Universidad Carlos III de Madrid

Page 21: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION Text Simplification

Lexical simplification

• Complexity measures: frequency of words in texts as well as the length of phrases

FOX index

Flesch-Kinaid

These indexes have to be validated by final users

LaBDA, Universidad Carlos III de Madrid

Page 22: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

NLP APPROACHES FOR TEXT SIMPLIFICATION

WCAG 2.0 PLN Approach

2.4.2 (Page Titled) 2.4.6 (Headings and Labels) 2.4.10 (Section Headings)

Text summarization

3.1.4 (Abbreviations )

Abbreviations

3.1.3 (Unusual Words) Dictionaries with definition

3.1.5 (Reading Level) Syntactic simplification

LaBDA, Universidad Carlos III de Madrid

Page 23: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

• The principal text source of information for patients

• This document provides information about a its appearance, actions, side effects and drug interactions, contraindications, special warnings

• It is difficult to understand by patients:

Vocabulary is specific, technical. Long paragraphs, especially those containing lists of

side effects. Using a small font size (9 points)

• Problems: Patient misunderstanding could be a potential source of medication errors and adverse drug reactions.

LaBDA, Universidad Carlos III de Madrid

Page 24: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

• Goal of the system:

Provide information in an easy and clear way to read.

• Medical terms (in particular, drug effects) are translated into lay terms, which patients can understand.

LaBDA, Universidad Carlos III de Madrid

Page 25: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

FIRST Module:

Named Entity Recognition (NER)

• Detects the mentions of drug effects

• Use MedDRA (medical multilingual terminology dictionary about events associated with drugs )

• MeaningCloud integrates MedDRA, into GATE

LaBDA, Universidad Carlos III de Madrid

Page 26: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

SECOND module:

Lexical Simplifier

• To Identify the effects whose names are considered complex with the objective of replacing them by a simpler synonym

• Two different strategies: preferred term substitution and most frequent term substitution.

LaBDA, Universidad Carlos III de Madrid

Page 27: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

SECOND module. Lexical Simplifier

• Preferred Term Substitution

MedDRA allows to defining sets of synonyms and providing a preferred term for each set

• Cefalalgia (cephalalgia) would be substituted for cefalea (headache)

LaBDA, Universidad Carlos III de Madrid

Page 28: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

SECOND module. Lexical Simplifier

• Most Frequent Term Substitution

Corpus of MedlinePlus website documents (1,536 documents)

• 939 belonging to drug package leaflets • 597 to general health related articles about diseases, effects and

diagnoses. Elasticsearch to index the MedLinePlus documents Hypothesis: complex terms should be less frequent than simpler terms

in the corpus 1) The frequency of each effect in the corpus is calculated 2) an effect will be substituted for its synonym with the highest

frequency (if it is not itself) in the corpus.

LaBDA, Universidad Carlos III de Madrid

Page 29: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

SECOND module. Lexical Simplifier

Synonyms from MedDRA appear in MedLinePlus corpus

catarro (nasopharyngitis), 12

resfriado (cold), 48

resfriado común (common cold)

7

síntomas de resfriado (cold symptoms)

6

The complex term replaced by resfriado (cold)

LaBDA, Universidad Carlos III de Madrid

Page 30: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets

SECOND module. Lexical Simplifier ori

gin

al Muy frecuentes: diarrea e indigestión.

Frecuentes: náuseas, vómitos, dolor abdominal. Poco frecuentes: hemorragia. Raros: perforación gástrica, flatulencia, estreñimiento

PT

Muy frecuentes: diarrea e dispepsia. Frecuentes: náuseas, vómitos, dolor abdominal. Poco frecuentes: hemorragia. Raros: perforación gástrica, flatulencia, estreñimiento

freq

Muy frecuentes: diarrea e pirosis. Frecuentes: náuseas, vómitos, dolor abdominal.

Poco frecuentes: sangrado. Raros: perforación gástrica, gases, estreñimiento

LaBDA, Universidad Carlos III de Madrid

Page 31: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

CONCLUSIONS

• For some people, it is difficult to infer the meaning of an unusual word or phrase from context

• Long sentences and complex linguistic structures can cause barriers in access to the text content as indicated in WCAG and E2R guidelines

However, these guidelines do not provide precise methods and support (semi) automatic with which to address these accessibility issues concerning to text readable and understandable

• PLN approaches with a use of E2R and WCAG 2.0 resources provide the semi-automatic support

Proof of concept: Prototype to simplify drug package leaflet that implements a component for lexical simplification

LaBDA, Universidad Carlos III de Madrid

Page 32: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

CONCLUSIONS Work in progress

• New approaches to offer support: abbreviations, summaries, definitions of unusual words, etc.

• Evaluations by users (In addition, by experts)

• Taking into account other important issues as:

Presentation elements

Page structure

Navigation structures

LaBDA, Universidad Carlos III de Madrid

Page 33: Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos

REFERENCE Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. In Proceedings of the XVI International Conference on Human Computer Interaction (Interacción '15). ACM, New York, NY, USA, , Article 57 , 8 pages. DOI=http://dx.doi.org/10.1145/2829875.2829927