Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Frommherz, Yannick | - |
dc.contributor.author | Zarcone, Alessandra | - |
dc.date.accessioned | 2021-06-10T14:26:22Z | - |
dc.date.available | 2021-06-10T14:26:22Z | - |
dc.date.issued | 2021-06 | - |
dc.identifier.uri | https://fordatis.fraunhofer.de/handle/fordatis/198 | - |
dc.identifier.uri | http://dx.doi.org/10.24406/fordatis/124 | - |
dc.description.abstract | The CROWDSS dataset (Crowdsourced Wizard of Oz Dialogue dataset based on Situated Scenarios) contains 113 German dialogues collected in a Wizard-of-Oz fashion (i.e., simulating human-machine interaction). To refer to CROWDSS in any publication, please cite the following paper: Frommherz, Y. and Zarcone, A. (2021). Crowdsourcing ecologically-valid dialogue data for German. In Frontiers in Computer Science, Vol 3, doi: 10.3389/fcomp.2021.686050 | en |
dc.language.iso | de | en |
dc.relation.ispartof | 10.3389/fcomp.2021.686050 | - |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en |
dc.subject | dialogue data | en |
dc.subject | voice assistants | en |
dc.subject | crowdsourcing | en |
dc.subject | Wizard-of-Oz | en |
dc.subject | German | en |
dc.subject | ecological validity | en |
dc.subject | situated knowledge | en |
dc.subject.ddc | DDC::400 Sprache | en |
dc.subject.ddc | DDC::000 Informatik, Informationswissenschaft, allgemeine Werke | en |
dc.title | CROWDSS: A crowdsourced, ecologically-valid dialogue dataset for German | en |
dc.type | Textual Data | en |
dc.contributor.funder | Bundesministerium fur Wirtschaft und Energie BMWi (Deutschland) | en |
dc.description.technicalinformation | The dataset is structured as follows: Each dialogue is saved as a dictionary (with the dialogue id as key) containing 1) the scenario which was used for eliciting the corresponding dialogue and 2) the log. The log is a list of turns made by user and assistant, where each turn again is a dictionary containing the actual turn ("text"), who uttered it ("role") as well as the corresponding dialogue act annotations, following the scheme in Pareti and Lando (2019) but with some modifications (see annotation guidelines). The dialogue acts are saved as a list with the label as well as the start and end indices in the text. The dialogues were collected on a turn-by-turn basis and using a one-to-many ratio (see paper). The dialogue ids consist of numbers separated by dots. The first number corresponds to the the 30 dialogue beginnings that where collected in batch 1 (see paper). Since we assigned each of these dialogues to multiple participants in batch 2, dialogues sharing the first number in their id share both the same scenario and the first turn, etc. | en |
fordatis.institute | IIS Fraunhofer-Institut für Integrierte Schaltungen | en |
fordatis.project.fhgid | 210011 | en |
fordatis.rawdata | false | en |
fordatis.sponsorship.FundingProgramme | Innovationswettbewerb "Künstliche Intelligenz als Treiber für volkswirtschaftlich relevante Ökosysteme" | en |
fordatis.sponsorship.projectid | FKZ 01MK20011A | en |
fordatis.sponsorship.projectname | SPEAKER - Aufbau einer führenden Sprachassistenzplattform ”Made in Germany” | en |
fordatis.sponsorship.projectacronym | SPEAKER | en |
fordatis.date.start | 2020-06 | - |
fordatis.date.end | 2021-03 | - |
Appears in Collections: | Fraunhofer-Institut für Integrierte Schaltungen IIS |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Annotation guidelines.pdf | Guidelines for the dialogue act annotation | 222,15 kB | Adobe PDF | Download/Open |
CROWDSS.json | CROWDSS dataset | 458,87 kB | Unknown | Download/Open |
This item is licensed under a Creative Commons License