<会議発表論文>
Sentiment Analysis of Noisy Malay Text using a Large Language Model

作成者
本文言語
出版者
発行日
収録物名
開始ページ
終了ページ
会議情報
出版タイプ
アクセス権
権利関係
権利関係
概要 Due to the informality of social media, Malay user-generated content sentiment analysis is difficult. Existing methods struggle to capture cultural and contextual details. This study proposes publishi...ng an open-source annotated dataset, fine-tuning an open-source large language model (LLM), and using an open-source chatbot interface to create a robust sentiment analysis model for noisy Malay text. The research addresses three main issues: lack of labelled Malay social media data, insufficient generic Malay language models, and lack of practical sentiment analysis tools. Its three goals are to create a diverse dataset with accurate sentiment labels, parameter-efficiently fine-tune an LLM, and export the model for an interactive chatbot. The process involves collecting social media data using Contextual Lexical Adaptation, preprocessing and analysing it, fine-tuning the TinyLlama LLM using LoRA, and comparing it to traditional models. Real-world applications, such as sentiment analysis of Malaysian tweets, will be shown using a locally deployed chatbot interface for fine-tuned model inference. This study lays the groundwork for practical sentiment analysis, benefiting businesses, researchers, and politicians seeking data-driven insights. This research aims to revolutionise open-source Malay sentiment analysis by addressing current limitations through an integrated approach.続きを見る

本文ファイル

pdf 2025_p1904 pdf 3.53 MB 3  

詳細

EISSN
レコードID
主題
登録日 2025.12.15
更新日 2025.12.15