CROWDSS: A crowdsourced, ecologically-valid dialogue dataset for German

Frommherz, Yannick; Zarcone, Alessandra

Full metadata record

DC Field	Value	Language
dc.contributor.author	Frommherz, Yannick	-
dc.contributor.author	Zarcone, Alessandra	-
dc.date.accessioned	2021-06-10T14:26:22Z	-
dc.date.available	2021-06-10T14:26:22Z	-
dc.date.issued	2021-06	-
dc.identifier.uri	https://fordatis.fraunhofer.de/handle/fordatis/198	-
dc.identifier.uri	http://dx.doi.org/10.24406/fordatis/124	-
dc.description.abstract	The CROWDSS dataset (Crowdsourced Wizard of Oz Dialogue dataset based on Situated Scenarios) contains 113 German dialogues collected in a Wizard-of-Oz fashion (i.e., simulating human-machine interaction). To refer to CROWDSS in any publication, please cite the following paper: Frommherz, Y. and Zarcone, A. (2021). Crowdsourcing ecologically-valid dialogue data for German. In Frontiers in Computer Science, Vol 3, doi: 10.3389/fcomp.2021.686050	en
dc.language.iso	de	en
dc.relation.ispartof	10.3389/fcomp.2021.686050	-
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en
dc.subject	dialogue data	en
dc.subject	voice assistants	en
dc.subject	crowdsourcing	en
dc.subject	Wizard-of-Oz	en
dc.subject	German	en
dc.subject	ecological validity	en
dc.subject	situated knowledge	en
dc.subject.ddc	DDC::400 Sprache	en
dc.subject.ddc	DDC::000 Informatik, Informationswissenschaft, allgemeine Werke	en
dc.title	CROWDSS: A crowdsourced, ecologically-valid dialogue dataset for German	en
dc.type	Textual Data	en
dc.contributor.funder	Bundesministerium fur Wirtschaft und Energie BMWi (Deutschland)	en
dc.description.technicalinformation	The dataset is structured as follows: Each dialogue is saved as a dictionary (with the dialogue id as key) containing 1) the scenario which was used for eliciting the corresponding dialogue and 2) the log. The log is a list of turns made by user and assistant, where each turn again is a dictionary containing the actual turn ("text"), who uttered it ("role") as well as the corresponding dialogue act annotations, following the scheme in Pareti and Lando (2019) but with some modifications (see annotation guidelines). The dialogue acts are saved as a list with the label as well as the start and end indices in the text. The dialogues were collected on a turn-by-turn basis and using a one-to-many ratio (see paper). The dialogue ids consist of numbers separated by dots. The first number corresponds to the the 30 dialogue beginnings that where collected in batch 1 (see paper). Since we assigned each of these dialogues to multiple participants in batch 2, dialogues sharing the first number in their id share both the same scenario and the first turn, etc.	en
fordatis.institute	IIS Fraunhofer-Institut für Integrierte Schaltungen	en
fordatis.project.fhgid	210011	en
fordatis.rawdata	false	en
fordatis.sponsorship.FundingProgramme	Innovationswettbewerb "Künstliche Intelligenz als Treiber für volkswirtschaftlich relevante Ökosysteme"	en
fordatis.sponsorship.projectid	FKZ 01MK20011A	en
fordatis.sponsorship.projectname	SPEAKER - Aufbau einer führenden Sprachassistenzplattform ”Made in Germany”	en
fordatis.sponsorship.projectacronym	SPEAKER	en
fordatis.date.start	2020-06	-
fordatis.date.end	2021-03	-
Appears in Collections:	Fraunhofer-Institut für Integrierte Schaltungen IIS

Files in This Item:

File	Description	Size	Format
Annotation guidelines.pdf	Guidelines for the dialogue act annotation	222,15 kB	Adobe PDF	Download/Open
CROWDSS.json	CROWDSS dataset	458,87 kB	Unknown	Download/Open

Show simple item record

This item is licensed under a Creative Commons License