Fotografía de autor

Vanlancker-Sidtis J. Kreiman, D., and Gerratt, B.

Autor de Defining and measuring voice quality

1 Obra 1 Miembro 1 Reseña

Obras de Vanlancker-Sidtis J. Kreiman, D., and Gerratt, B.

Etiquetado

Conocimiento común

Todavía no hay datos sobre este autor en el Conocimiento Común. Puedes ayudar.

Miembros

Reseñas

Kreiman et al. note the importance of VQ for understanding speech sounds, and yet the elusiveness of reliable definitions of different phonation types. They argue for the ANSI (1960) definition of VQ “(that attribute of auditory sensation in terms of which a listener can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar)” (1) because it treats VQ as the result of specific perceptual processes and goals—that is, treats VQ as a phenomenon that becomes linguistically relevant at the perceptual stage. As they note in that context, it is unclear how that definition might generalize to other tasks such as the evaluation of a single stimulus (2)—an observation that one person’s modal phonation is another’s non-modal phonation (analogous to Ladefoged’s observation that one person’s pathological phonation is another’s normal nonmodal phonation).


The ANSI definition is unavoidably perceptual and multidimensional, but also essentially negative, defining VQ as excluding pitch and loudness without being specific about what it does contain; as Kreiman et al. note, it also ignores important evidence that pitch and loudness do constitute essential elements of perceptual VQ judgments (cf. e.g. Melara and Marks 1990).


Following from this, Kreiman et al. propose a definition of VQ as “the perceptual impression created by the vibration of the vocal folds” and “the perceived result of the coordinated action of the respiratory system, vocal folds, tongue, jaw, lips, and soft palate” (2), a definition which seemingly would lead to difficulty separating VQ from other timbre qualities (e.g. nasality) and possibly if read literally even from the articulatory action of the filter (the “tongue, jaw, lips, and soft palate”). It remains undeniable, though, that as Kreiman et al. assert, physiologically-based definitions cannot accommodate perceptual aspects including context, attention, listener background, and listening task.


For voice quality measurements, this implies the need for a standard framework with which to describe perceptual VQ judgments. As the authors note, “This approach to measuring voice quality depends on descriptive traditions rather than theory, and has changed very little in nearly 2000 years. Many common terms have been in use for centuries. Familiar labels like harsh, clear, bright, smooth, weak, shrill, deep, dull, and hoarse can be found in Roman writings on oratory (Austin, 1806), as well as in modern studies of voice quality” (2). The authors dismiss such schemes, as well as modern imitators (factor analyses), as hopelessly redundant and ambiguous.


Ultimately, the authors note, a perceptual scale for VQ variation depends on an interpersonally uniform perceptual space, which is unlikely to exist. They argue that in the absence of a universally valid set of descriptors, it is unclear why some should be treated as more valid than others. They cite studies showing that even the “simplest” VQ differences (presumably, e.g.,breathy v. modal, or modal v. creaky) are identified in the same way on a 7-point scale by two listeners are slim (.21 where chance was .14 [Kreiman and Garrett 1998]).


As a promising method to circumvent these difficulties, the authors suggest applying speech synthesis to a method-of-adjustment task, where naïve listeners are able to adjust a synthesized voice to match a benchmark token. This method obviates the need to model inaccessible mental representations (e.g. exemplars), and allows measurement of which aspects of the speech signal allow listeners to decide which VQ variations sound the same or different. In a preliminary assessment of this method (Garrett and Kreiman 2001), interlistener agreement jumped from 22 percent to 97 percent.


The success of the synthesized speech-based perceptual-evaluation methods explored by the authors seems beyond my ability to exploit both in terms of equipment and experience, but seem generally promising and provide a welcome reminder that VQ is a perceptual phenomenon based on the interaction between a speaker and a listener, and cannot ultimately be completely described or understood in articulatory or acoustic terms alone. Paper was presented at From Sound to Sense at MIT.
… (más)
½
 
Denunciada
MeditationesMartini | Apr 21, 2010 |

Estadísticas

Obra
1
Miembro
1
Popularidad
#2,962,640
Valoración
½ 3.5
Reseñas
1