تفاصيل بحث أو دراسة | المجلة الدولية للعلوم والتقنية

الباحث(ون):	Hussien A Elharati Nasser B. Ekreem Khaled A Marghani Ahmed Ali Alsoukni
المؤسسة:	Higher Institute of Science and Technology, Sok Aljum’aa, Libya Azzaytuna University Faculty of Science, University of Tripoli, Libya Higher Institute of Science and Technology, Sok Aljum’aa, Libya
المجال:	الهندسة الكهربائية و الالكترونية وهندسة الاتصالات
منشور في:	العدد الثاني والثلاثون - أبريل 2023

الملخص

عملية استخلاص الميزات وتصنيفها وتقييمها من الاهتمامات الرئيسية في نظام التعرف على الكلام وما تزال تعتبر من أكثر مجالات البحث نشاطًا في وقتنا الحاضر. في هذا البحث تم فحص أداء خوارزمية جديدة لنظام هجين لاستخلاص خصائص الصوت استخدمت فيه عدد اربع تقنيات هي Prediction Cepstrum Coefficient, perceptual linear production, Mel Frequency Cepstrum Coefficient, RASTA-PLP. تم تصنيف البيانات المستخلصة باستخدام هذه التقنية الهجينة وتقييمها باستخدام تقنية Hidden Markov Model (HMM) واستخدم لتقييم أداء هذه الخوارزمية المقترحة مجموعة كبيرة من البيانات الصوتية تتكون من إحدى عشرة كلمة باللغة الانجليزية (صفر إلى تسعة زائد الحرف أو) والتي تم تسجيلها من عدد 4558 متحدثًا بالغًا، سجلت كل كلمة من كل شخص مستهدف مرتين وكان زمن اخد العينة Sampling Rate هو 8 كيلو هرتز وتم حفظها في 4558 ملف منفصل امتداده WAV وقسمت كل هذه الاصوات الي مجموعتين مجموعة للتدريب ومجموعة للتختبار وقد اعطت التقنية الهجينة MFCC+RASTA أفضل نتيجة بنسبة مقدارها 99.43 وبعدد اخطاء يساوى 14 خطأ من أصل 2472 صوت في العينة.

Abstract

Feature extraction, classification, and evaluation processes are considered the main concerns in speech recognition system and still the most active area of research nowadays. In this paper the performance of new hybrid feature extraction algorithm is examined using Linear Prediction Cepstrum Coefficient (LPCC), perceptual linear production (PLP), Mel Frequency Cepstrum Coefficient (MFCC), and RASTA-PLP. The extracted data vectors are classified and evaluated using Hidden Markov Model (HMM). The performance of the proposed hybrid algorithm is assessed using data set of human voice, which consists from eleven words (zero to nine plus oh) and recorded from 4558 adult speakers, each person said the same word 2 times. The collected data are sampled by 8 KHz and saved in 4558 WAV files divided into training and testing data. According to the final results, the proposed system provided an excellent recognition rate with 99.43% using the combination between MFCC and RASTA.

المجلة الدولية للعلوم والتقنية

International Science and Technology Journal

Performance Evaluation of Hybrid Features in ASR System

الملخص

Abstract