Integrating Text, Voice, and Visual Inputs for a Cohesive Multimodal Conversational Experience: Iranian EFL Intermediate Students in Focus

Hossein Vahid Dastjerdi

doi:10.57647/jntell.2025.0401.05

10.57647/jntell.2025.0401.05

Integrating Text, Voice, and Visual Inputs for a Cohesive Multimodal Conversational Experience: Iranian EFL Intermediate Students in Focus

PDF

Hossein Vahid Dastjerdi*¹,

English Department, Najafabad Branch, Islamic Azad University, Najafabad, Iran

Received: 2024-12-21

Revised: 2025-02-14

Accepted: 2025-01-23

Published in Issue 2025-04-23

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Vahid Dastjerdi, H. (2025). Integrating Text, Voice, and Visual Inputs for a Cohesive Multimodal Conversational Experience: Iranian EFL Intermediate Students in Focus. Journal of New Trends in English Language Learning (JNTELL), 4(1). https://doi.org/10.57647/jntell.2025.0401.05

PDF views: 185

Abstract

This holistic study took a critical look at the integration of various modes of communication, particularly text, voice, and visual inputs, for the development of a seamless and cohesive multimodal conversational experience within AI chatbots. The focus of this research was directed toward Iranian EFL intermediate high school students aged between 15 and 19 years. It is important to note that the existing conversational AI systems usually rely on a single mode of interaction, which in effect seriously limits their overall effectiveness and usability. By integrating text, voice, and visual elements in this innovative approach, this research aims at increasing user engagement and satisfaction levels among the students participating in the study. A sample of 200 male and female students was conveniently selected and engaged in multiple interactions with custom-developed AI chatbots over a period of three months. Each subject experienced text-only, voice-only, visual-only, and multimodal interactions in a random order. Data collections included interaction duration, frequency, and satisfaction surveys, while further data was collected through focus groups. Quantitative analysis using MANOVA and qualitative thematic analysis have together shed some very important light on multimodal interaction. These interactions, having been shown to significantly heighten the overall user experience, promise new directions in the further development of conversational AI technologies.

Keywords

AI chatbots,
Conversational AI;,
Multimodal interaction,
User engagemen,
User satisfaction,
Text,
Voice,
Visual inputs

PDF

References

Binns, R., Veale, M., Van Kleek, M., & Shadbolt, N. (2018). 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1-14). ACM. https://doi.org/10.1145/3173574.3173951
Kopp, S., Gesellensetter, L., Krämer, N. C., & Wachsmuth, I. (2005). A conversational agent as museum guide–design and evaluation of a real-world application. Intelligent Virtual Agents, 329-343. https://doi.org/10.1007/11550617_28
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2022). Multimodal conversational AI: Advancing human-computer interaction with voice, text, and visual input. Journal of Artificial Intelligence, 34(2), 121-136. https://doi.org/10.1016/j.jai.2022.03.006
Kimmel, M., Avidan, D., & Zilberman, A. (2020). Combining modalities for more intuitive AI interaction: A review of multimodal systems. AI Review, 48(4), 2475-2494. https://doi.org/10.1007/s10462-020-09845-0
Li, Q., Xu, Z., & Li, X. (2023). Real-time adaptation of multimodal systems: From theory to practice. Journal of Human-Computer Interaction, 39(1), 72-84. https://doi.org/10.1145/3210134.3210213
Liu, H., Zhang, H., & Ma, X. (2021). Adapting multimodal AI to personalized user needs: A study on dynamic adjustments of input modes. Journal of Intelligent Systems, 33(2), 1034-1046. https://doi.org/10.1002/j.1678-3458.2021.00385.x
Mayer, R. E. (2001). Multimedia learning. Cambridge University Press. https://doi.org/10.1017/CBO9780511811678
Miao, E. (2024, July). Using fine-grained data to track group effectiveness and individual characteristics of teachers in blended teacher learning. In 2024 International Symposium on Educational Technology (ISET) (pp. 310-316). IEEE.
Morana, S., Turi, M., Ravšelj, D., Schuetzler, R. M., & Maedche, A. (2017). Individual differences in multimodal chatbot interaction: The impact of perceived system's persona and communication strategy. International Conference on Information Systems (ICIS) 2017 Proceedings. https://aisel.aisnet.org/icis2017/Interaction/Presentations/4/
Murphy, K., Richards, S., & Woods, D. (2022). AI for the next generation: Engaging younger audiences through multimodal systems. Journal of Educational Technology, 61(3), 215-227. https://doi.org/10.1109/JET.2022.00523
Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. Cambridge University Press. https://doi.org/10.1017/CBO9780511564079
Sengupta, S., Srinivasan, V., & Gupta, A. (2022). Enhancing emotional intelligence in AI through multimodal communication. AI in Mental Health, 19(2), 86-95. https://doi.org/10.1007/s00542-022-06731-w
Skantze, G., & Al Moubayed, S. (2012). IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. Proceedings of the 14th ACM international conference on Multimodal interaction (pp. 69-76). ACM. https://doi.org/10.1145/2388676.2388689
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 3104-3112. https://doi.org/10.48550/arXiv.1409.3215
Wang, X., Zhang, Y., & Li, X. (2024). Long-term user engagement with multimodal AI systems: Insights from longitudinal studies. Journal of Human-Computer Studies, 46(1), 124-137. https://doi.org/10.1007/s00462-024-09983-2
Wei, L., Liu, J., & Zhang, Q. (2023). The impact of multimodal interaction on user satisfaction in AI systems. International Journal of Human-Computer Interaction, 43(5), 457-469. https://doi.org/10.1080/10447318.2023.1884739
Xu, S., He, L., & Wang, M. (2021). Multimodal AI for accessibility: Addressing the needs of users with hearing impairments. Journal of AI and Disability Studies, 3(1), 29-41. https://doi.org/10.1080/26303024.2021.1884312
Yang, H., Kim, H., Lee, J. H., & Shin, D. (2022). Implementation of an AI chatbot as an English conversation partner in EFL speaking classes. ReCALL, 34(3), 327-343.
Zhang, Y., Yin, Z., Wang, J., & Shi, Z. (2019). Multimodal AI: A survey of methods and applications. arXiv preprint arXiv:1905.13804. https://doi.org/10.48550/arXiv.1905.13804

Integrating Text, Voice, and Visual Inputs for a Cohesive Multimodal Conversational Experience: Iranian EFL Intermediate Students in Focus

How to Cite

Download Citation

Abstract

Keywords

References