10.57647/jntell.2025.0401.05

Integrating Text, Voice, and Visual Inputs for a Cohesive Multimodal Conversational Experience: Iranian EFL Intermediate Students in Focus

  1. English Department, Najafabad Branch, Islamic Azad University, Najafabad, Iran

Received: 2024-12-21

Revised: 2025-02-14

Accepted: 2025-01-23

Published in Issue 2025-04-23

How to Cite

Vahid Dastjerdi, H. (2025). Integrating Text, Voice, and Visual Inputs for a Cohesive Multimodal Conversational Experience: Iranian EFL Intermediate Students in Focus. Journal of New Trends in English Language Learning (JNTELL), 4(1). https://doi.org/10.57647/jntell.2025.0401.05

PDF views: 172

Abstract

This holistic study took a critical look at the integration of various modes of communication, particularly text, voice, and visual inputs, for the development of a seamless and cohesive multimodal conversational experience within AI chatbots. The focus of this research was directed toward Iranian EFL intermediate high school students aged between 15 and 19 years. It is important to note that the existing conversational AI systems usually rely on a single mode of interaction, which in effect seriously limits their overall effectiveness and usability. By integrating text, voice, and visual elements in this innovative approach, this research aims at increasing user engagement and satisfaction levels among the students participating in the study. A sample of 200 male and female students was conveniently selected and engaged in multiple interactions with custom-developed AI chatbots over a period of three months. Each subject experienced text-only, voice-only, visual-only, and multimodal interactions in a random order. Data collections included interaction duration, frequency, and satisfaction surveys, while further data was collected through focus groups. Quantitative analysis using MANOVA and qualitative thematic analysis have together shed some very important light on multimodal interaction. These interactions, having been shown to significantly heighten the overall user experience, promise new directions in the further development of conversational AI technologies.

Keywords

  • AI chatbots,
  • Conversational AI;,
  • Multimodal interaction,
  • User engagemen,
  • User satisfaction,
  • Text,
  • Voice,
  • Visual inputs

References

  1. Binns, R., Veale, M., Van Kleek, M., & Shadbolt, N. (2018). 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1-14). ACM. https://doi.org/10.1145/3173574.3173951
  2. Kopp, S., Gesellensetter, L., Krämer, N. C., & Wachsmuth, I. (2005). A conversational agent as museum guide–design and evaluation of a real-world application. Intelligent Virtual Agents, 329-343. https://doi.org/10.1007/11550617_28
  3. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2022). Multimodal conversational AI: Advancing human-computer interaction with voice, text, and visual input. Journal of Artificial Intelligence, 34(2), 121-136. https://doi.org/10.1016/j.jai.2022.03.006
  4. Kimmel, M., Avidan, D., & Zilberman, A. (2020). Combining modalities for more intuitive AI interaction: A review of multimodal systems. AI Review, 48(4), 2475-2494. https://doi.org/10.1007/s10462-020-09845-0
  5. Li, Q., Xu, Z., & Li, X. (2023). Real-time adaptation of multimodal systems: From theory to practice. Journal of Human-Computer Interaction, 39(1), 72-84. https://doi.org/10.1145/3210134.3210213
  6. Liu, H., Zhang, H., & Ma, X. (2021). Adapting multimodal AI to personalized user needs: A study on dynamic adjustments of input modes. Journal of Intelligent Systems, 33(2), 1034-1046. https://doi.org/10.1002/j.1678-3458.2021.00385.x
  7. Mayer, R. E. (2001). Multimedia learning. Cambridge University Press. https://doi.org/10.1017/CBO9780511811678
  8. Miao, E. (2024, July). Using fine-grained data to track group effectiveness and individual characteristics of teachers in blended teacher learning. In 2024 International Symposium on Educational Technology (ISET) (pp. 310-316). IEEE.
  9. Morana, S., Turi, M., Ravšelj, D., Schuetzler, R. M., & Maedche, A. (2017). Individual differences in multimodal chatbot interaction: The impact of perceived system's persona and communication strategy. International Conference on Information Systems (ICIS) 2017 Proceedings. https://aisel.aisnet.org/icis2017/Interaction/Presentations/4/
  10. Murphy, K., Richards, S., & Woods, D. (2022). AI for the next generation: Engaging younger audiences through multimodal systems. Journal of Educational Technology, 61(3), 215-227. https://doi.org/10.1109/JET.2022.00523
  11. Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. Cambridge University Press. https://doi.org/10.1017/CBO9780511564079
  12. Sengupta, S., Srinivasan, V., & Gupta, A. (2022). Enhancing emotional intelligence in AI through multimodal communication. AI in Mental Health, 19(2), 86-95. https://doi.org/10.1007/s00542-022-06731-w
  13. Skantze, G., & Al Moubayed, S. (2012). IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. Proceedings of the 14th ACM international conference on Multimodal interaction (pp. 69-76). ACM. https://doi.org/10.1145/2388676.2388689
  14. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 3104-3112. https://doi.org/10.48550/arXiv.1409.3215
  15. Wang, X., Zhang, Y., & Li, X. (2024). Long-term user engagement with multimodal AI systems: Insights from longitudinal studies. Journal of Human-Computer Studies, 46(1), 124-137. https://doi.org/10.1007/s00462-024-09983-2
  16. Wei, L., Liu, J., & Zhang, Q. (2023). The impact of multimodal interaction on user satisfaction in AI systems. International Journal of Human-Computer Interaction, 43(5), 457-469. https://doi.org/10.1080/10447318.2023.1884739
  17. Xu, S., He, L., & Wang, M. (2021). Multimodal AI for accessibility: Addressing the needs of users with hearing impairments. Journal of AI and Disability Studies, 3(1), 29-41. https://doi.org/10.1080/26303024.2021.1884312
  18. Yang, H., Kim, H., Lee, J. H., & Shin, D. (2022). Implementation of an AI chatbot as an English conversation partner in EFL speaking classes. ReCALL, 34(3), 327-343.
  19. Zhang, Y., Yin, Z., Wang, J., & Shi, Z. (2019). Multimodal AI: A survey of methods and applications. arXiv preprint arXiv:1905.13804. https://doi.org/10.48550/arXiv.1905.13804