Langanzeige der Metadaten
DC ElementWertSprache
dc.contributor.authorAsaadi, Shima-
dc.contributor.authorKolagar, Zahra-
dc.contributor.authorLiebel, Alina-
dc.contributor.authorZarcone, Alessandra-
dc.date.accessioned2022-12-15T10:45:35Z-
dc.date.available2022-12-15T10:45:35Z-
dc.date.issued2022-10-31-
dc.identifier.urihttps://fordatis.fraunhofer.de/handle/fordatis/293-
dc.identifier.urihttp://dx.doi.org/10.24406/fordatis/226-
dc.description.abstractThe Semantic textual similarity (STS) task is commonly used to evaluate the semantic representations that language models (LMs) learn from texts, under the assumption that good-quality representations will yield accurate similarity estimates. When it comes to estimating the similarity of two utterances in a dialogue, however, the conversational context plays a particularly important role. We argue for the need of benchmarks specifically created using conversational data in order to evaluate conversational LMs in the STS task. We introduce GiCCS, a first conversational STS evaluation benchmark for German. We collected the similarity annotations for GiCCS using best-worst scaling and presenting the target items in context, in order to obtain highly-reliable context-dependent similarity scores. We present benchmarking experiments for evaluating LMs on capturing the similarity of utterances. Results suggest that pretraining LMs on conversational data and providing conversational context can be useful for capturing similarity of utterances in dialogues. GiCCS will be publicly available to encourage benchmarking of conversational LMs.en
dc.language.isodeen
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/en
dc.subjectSTSen
dc.subjectsemantic textual similarityen
dc.subjectconversational dataseten
dc.subjectSTS benchmarken
dc.titleGiCCS: A German in-Context Conversational Similarity Benchmarken
dc.typeTextual Dataen
dc.contributor.funderBundesministerium für Wirtschaft und Klimaschutz BMWK (Deutschland)en
fordatis.bibliographicCitation.doi10.5281/zenodo.7266256en
fordatis.bibliographicCitation.issued2022-10-31-
fordatis.bibliographicCitation.placeThe GEM 💎 Workshop at EMNLP 2022en
fordatis.bibliographicCitation.urihttps://zenodo.org/record/7266256#.Y2OmG8HMKbsen
fordatis.instituteIIS Fraunhofer-Institut für Integrierte Schaltungenen
fordatis.rawdatafalseen
fordatis.sponsorship.projectidFKZ 01MK19011en
fordatis.sponsorship.projectnameSPEAKERen
Enthalten in den Sammlungen:Fraunhofer-Institut für Integrierte Schaltungen IIS

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
GiCCS.zipA German in-Context Conversational Similarity Benchmark26,35 kBZIPÖffnen/Download


Diese Ressource wurde unter folgender Copyright-Bestimmung veröffentlicht: Lizenz von Creative Commons Creative Commons