Development of Dialogue Systems for Serbian and Other South Slavic Languages is a technology project (TR-32035) of the Ministry of Science and Technological development (2011-2016) of the Republic of Serbia, aiming at establishing more flexible speech communication between humans and machines. The project gathers 38 researchers from 6 scientific research institutions in Serbia, as well as 5 researchers from abroad.

The project represents a continuation of previous technology project of the Faculty of Technical Scienced financed by the Ministry: Human-machine speech communication (2008-2010) and Development of Speech Technologies in Serbian and their application in "Telekom Srbija" (2005-2007), during which a continuous speech recognition system and a high quality text-to-speech system for Serbian and some other kindred South Slavic languages have been developed. Within the on-going project as well as the previous ones a number of speech databases and language resources have been developed, and more than 100 scientific papers at renowned international conferences and in journals have been published.

IVG10tf100n is a database designed for research in the area of speaker identification and verification, namely for speaker recognition on the basis of digits spoken over the phone. Recording has been repeated once a month with approximately 100 speakers. It was carried out over the telephone network, using a Dialogic CTI card. Samples were recorded on a hard disc in mono PCM format, 16 bits/sample, 8000 samples/second. Each time the caller's name, the calling phone number, two fixed and ten more random sequences of four digits were also recorded. Some of the callers participated in the recording process every month (their voice can be used for system training), and some of them called only once (their voice can be used for faulty identification probability testing).

SpeechDat II is a database compliant to the SpeechDat standard. The base is of telephone quality and currently contains 500 speakers. Every speaker pronounced 50 utterances which contain names (people, cities, companies), digits, amounts, dates, isolated phonemes, application words and phrases, phonetically rich sentences, etc. Recording format was mono A-law, 8 bits/sample, 8000 samples/second. The whole base is labeled and documented in accordance to the standard. The database inspection implied labeling of each noise and poorly pronounced phoneme, as well as phoneme boundaries positioning. It is used for training the system for phoneme based speech recognition over the telephone line.

AN_CASR is a database still in the recording phase. It is being recorded under the criteria similar to the SpeechDat standard, but over the microphone covering full audible range. It currently contains 30 speakers. Every speaker pronounced 120 sequences which contain names (people, cities, companies), digits, amounts, dates, isolated phonemes, application words and phrases, phonetically rich sentences, etc. Recording format was mono PCM, 16 bits/sample, 22050 samples/second. The recorded part of database is fully inspected and labeled. The database should be used together with S70W100s120 for training large vocabulary continuous ASR system.

TTSlsMarina is a database in Serbian, containing two hours of text chosen in a way convenient for TTS system which takes speech segments from a large database (see AlfaNumTTS.pdf). It is recorded in the studio in mono PCM, 16bits/sample, 22050 samples/second format. The base has been inspected, labeled and pitch-marked. Inspection implied marking the degree of impairment for every phoneme, open/closed types of vowels, as well as places where disturbances in glottal activity occured.

TTSlsMarica is a database in Croatian, containing two hours of text chosen in a way convenient for TTS system which takes speech segments from a large database (see AlfaNumTTS.pdf). It is recorded in the studio in mono PCM, 16bits/sample, 22050 samples/second format. The base has been inspected, labeled and pitch-marked. Inspection implied marking the degree of impairment for every phoneme, open/closed types of vowels, as well as places where disturbances in glottal activity occured.

TTSlsMarija is a database which contains two hours of text chosen in a way convenient for TTS system which takes speech segments from a large database (see AlfaNumTTS.pdf). It is recorded in the studio in mono PCM, 16bits/sample, 22050 samples/second format. The base has been inspected, labeled and pitch-marked. Inspection implied marking the degree of impairment for every phoneme, open/closed types of vowels, as well as places where disturbances in glottal activity occured. Beside using another speaker, the speech rate was somewhat slower in order to minimize the impairment of both vowels and consonants. A professional speaker unanimously selected among 5 candidates was engaged in the recording (see ETRAN2003.pdf). The voice of the database was also automatically converted to a male one, enabling speech synthesis using a male voice.

TTSlsSnezana is a database which contains ten hours of text chosen in a way convenient for TTS system which takes speech segments from a large database (see AlfaNumTTS.pdf). It is recorded in the studio in mono PCM, 16bits/sample, 44100 samples/second format. The base has been inspected, labeled and pitch-marked. Inspection implied marking the degree of impairment for every phoneme, open/closed types of vowels, as well as places where disturbances in glottal activity occured. Each word was part-of-speech tagged and marked for the values of particular morphological categories as well as accentuation. A certain portion of the database (about 40%) has been marked for phrase breaks as well as sentence focus, making the database convenient for automatic prediction of prosodic features of speech as well. A professional speaker unanimously selected among 10 candidates was engaged in the recording (see ETRAN2003.pdf).


Among more than 100 scientific papers published at regional and international conferences and in scientific magazines and books, the following stand out:

Relevance of the Types and the Statistical Properties of Features in the Recognition of Basic Emotions in Speech
Milana Bojanić, Vlado Delić, Milan Sečujski
Facta Universitatis, University of Niš, 2014
2014_facta_esr.pdf

User-Awareness and Adaptation in Conversational Agents
Vlado Delić, Milan Gnjatović, Nikša Jakovljević, Branislav Popović, Ivan Jokić, Milana Bojanić
Facta Universitatis, University of Niš, 2014
2014_facta_agents.pdf

Automatic Prosody Generation in a Text-to-Speech System for Hebrew
Branislav Popović, Dragan Knežević, Milan Sečujski, Darko Pekar
Facta
Universitatis, University of Niš, 2014
2014_facta_hebrew.pdf

Speaker Detection Using Phoneme Specific Hidden Markov Models
Edvin Pakoci, Nikša Jakovljević, Branislav Popović, Dragiša Miškovic, Darko Pekar
SPECOM 2014
Novi Sad, Serbia, September 5th-9th, 2014
2014_specom_spk_detect.pdf

Comparison of Linear Discriminant Analysis Approaches in Automatic Speech Recognition
Nikša Jakovljević, Dragiša Mišković, Marko Janev, Milan Sečujski, Vlado Delić
Elektronika ir Elektrotechnika, Kaunas University of Technology, 2013
2013_eie_lda.pdf

Discrimination Capability of Prosodic and Spectral Features for Emotional Speech Recognition
Vlado Delić, Milana Bojanić, Milan Gnjatović, Milan Sečujski, Slobodan Jovičić
Elektronika ir Elektrotechnika, Kaunas University of Technology, 2013
2013_eie_esr.pdf

Influence of the Number of Principal Components used to the Automatic Speaker Recognition Accuracy
Ivan Jokić, Stevan Jokić, Zoran Perić, Milan Gnjatović, Vlado Delić
Elektronika ir Elektrotechnika, Kaunas University of Technology, 2012
2012_eie_pc.pdf

A Novel Split-and-Merge Algorithm for Hierarchical Clustering of Gaussian Mixture Models
Branislav Popović, Marko Janev, Darko Pekar, Nikša Jakovljević, Milan Gnjatović, Milan Sečujski, Vlado Delić
Applied Intelligence, Springer, 2012
2012_ai.pdf

Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
Milan Sečujski, Darko Pekar, Nikša Jakovljević
INTERSPEECH 2011
Florence, Italy, August 28th-31th, 2011
2011_interspeech.pdf

Speech Technologies for Serbian and Kindred South Slavic Languages
Vlado Delić, Milan Sečujski, Nikša Jakovljević, Marko Janev, Radovan Obradović, Darko Pekar
Advances in Speech Recognition (chapter in the book), SCIYO, 2010
(link to IntechOpen)

Applications of Speech Technologies in Western Balkan Countries
Darko Pekar, Dragiša Mišković, Dragan Knežević, Nataša Vujnović Sedlar, Milan Sečujski, Vlado Delić
Advances in Speech Recognition (chapter in the book), SCIYO, 2010
(link to IntechOpen)

Transformation-Based Part-of-Speech Tagging for Serbian Language
Vlado Delić, Milan Sečujski, Aleksandar Kupusinac
CIMMACS 2009
Puerto de la Cruz, Spain, December 14th-16th, 2009.
CIMMACS2009.pdf

Eigenvalues Driven Gaussian Selection in Continuous Speech Recognition Using HMMs with Full Covariance Matrices
Marko Janev, Nikša Jakovljević, Darko Pekar, Vlado Delić
Applied Intelligence, Springer, 2009
AI2009.pdf

Part-of-Speech Tagging Based on Combining Markov Models and Machine Learning
Aleksandar Kupusinac, Milan Sečujski
Speech and Language 2009
Belgrade, November 13th-14th, 2009
SL2009.pdf

Energy Normalization in Automatic Speech Recognition
Nikša Jakovljević, Marko Janev, Darko Pekar and Dragiša Mišković
Lecture Notes in Computer Science, Vol. 5246, 2008
LNCS2008.pdf

An Overview of the AlfaNum Text-to-Speech Synthesis System
Milan Sečujski, Vlado Delić, Darko Pekar, Radovan Obradović, Dragan Knežević
SPECOM 2007
Moscow, Russia, October 15th-18th, 2007
SPECOM2007.pdf

A Review of R&D of Speech Technologies in Serbian and Their Applications in Western Balkan Countries
Vlado Delić
SPECOM 2007
Moscow, Russia, October 15th-18th, 2007
SPECOM_WBC2007.pdf

Speech-Enabled Computers as a Tool for Serbian-Speaking Blind Persons
Vlado Delić, Nataša Vujnović, Milan Sečujski
EUROCON 2005
Belgrade, November 22th-24th, 2005
EUROCON2005.pdf

Description of training procedure for AlfaNum continuous speech recognition system
Jakovljević Nikša, Pekar Darko
EUROCON 2005
Belgrade, November 22th-24th, 2005
EUROCON_CASR2005.pdf