De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data. It has been well-studied within the medical domain. The need for deidentification technology is increasing, as …
This paper introduces DAN+, a multi-domain resource for nested named entities (NEs) and lexical normalization for Danish, a less-resourced language. We empirically assess three strategies to model the two-layer NE annotations, cross-lingual …
Due to the differences between reviews in different product categories, creating a general model for crossdomain sentiment classification can be a difficult task. This paper proposes an architecture that incorporates domain knowledge into a neural …