Transcription Bank: Annika Falthin

Where the transcript was published

Falthin, A. (2011). Musik som nav i skolredovisningar. Stockholm: KMH-Förlaget (eng. title, Music as a hub in school presentations) & Falthin (on-going Doctoral thesis)

Link to transcript

How the transcript was made?

The transcription model was designed to grasp aspects of the multimodal nature of the research objects, that is, when pupils in lower secondary school gave presentations on a subject matter while playing an instrument and/or singing. In order to be able to design the scripts like a musical score and have the possibility to notate music, the work was realized in Finale, software for musical notation (Make Music Inc.). Three questions directed the selection of modes which should be represented in the scores: what modes do the pupils work in and how are these modes intertwined; how do the pupils structure their presentations, dealing with form and content; and how do they interact with each other and the audience during the events? These issues led to a score design where the participants, between two and four pupils in each presentation, were represented by two staves each in the scores: one stave for playing or singing, and an additional stave for speech. Each participant was also represented by a stave depicting posture, gesture, facial expression and gaze. The performed music was transcribed in great detail. The speech was notated in the same way but the pitch was represented approximately. Direction of gaze, body positions, gestures and facial expressions were marked with abbreviations or words. At this stage, the scores gave an overview and visualization of the presentations’ texture and their multimodal nature (fig.1). To facilitate the analysis, the scores were transferred to Pages (Apple Inc.). The use of colouring and graphics to mark and follow different events made it possible to unfold aspects of the pupils’ work; for example, arrows helped to show the course of pupils’ multimodal communication, with colouring and/or arrows indicating the cause of the communication (fig.2).


Figure 1 (Falthin, 2011)
Excerpt from score representing three boys’ interpretation of the 1st book of Moses Chap. 6. The participants’ (three boys F, S, O and audience [teacher and classmates]) actions are represented in different staves as parts. The boy, O, who plays the guitar, has in addition to a stave for the guitar, a stave where his body position, direc-tion of gaze and facial expression is notated. Abbreviations or words describe the type of action (e.g., B.S means gaze at S, skratt is Swedish for “laugh”, ler means “smiles”). The other two boys’ speech is notated in music; the pitch is represented approximately. Their bodily movements, like the guitarist, are notated in separate staves. The sound of the audience is represented in an additional stave. The audience were not video recorded, therefore only the sound is given.

Rationale for the design

The open and experimental approaches to transcription conveyed by Jewitt (2006) and Flewitt et al (2009), among others, were inspirational in creating a method appropriate to my research needs; van Leeuwen’s (1999) visual representations of sound and music gave ideas. Personal experience of reading and writing musical scores was influential, inasmuch as a musical score setup provides several possibilities.

Figure 2

Figure 2

Purpose of the transcript

The transcriptions facilitated interpretation of the meaning of the subjects’ actions. In the thesis (Falthin, 2011), excerpts of transcriptions from different school presentations represent a multimodal communicational perspective, e.g. gaze direction and bodily movement in music making (pp 72-73), phrasing and prosody in speech and music (p 74), and dramaturgy in speech and music (attached).

Other issues in making the transcript

The design facilitates analysis of how semiotic recourses are intertwined in the participants’ sign-making concerning, primarily, temporal aspects. To a lesser extent the design unfolds spatial aspects. Photos and e.g. sketches could be added or Laban notation. A combination of the above could be the next step, but that would risk too many layers to interpret in the analysis.

A key benefit of the model is the possibility to hear the music, speech and sounds within oneself while imagining the participants’ action. The experience of the event is not lost, as the scores provide opportunities to analyse how and when different modes are involved, in particular the timing, and unfold what semiotic resources the participants use. Moreover, one can infer what impact this may have on meaning-making.

In an on-going study, I develop the model further and use musical symbols for the most apt corresponding meaning when transcribing bodily movements and oral sounds (other than music) (fig. 3-5). The aim with the transcription is to elucidate the multimodality in musical interaction and musical meaning-making. This work is planned to be published 2014.


Flewitt, R., Hampel. R., Hauck, M., & Lancaster, L. (2009). What are multimodal data and transcription? In: C. Jewitt (Ed.), The Routledge Handbook of Multimodal Analysis.  London & New York: Routledge.

Jewitt, C. (2006). Technology, Literacy, Learning. A multimodal Approach.  London & New York: Routledge.

van Leeuwen, T. (1999). Speech, Music, Sound.  London: Macmillan