Click Here for
Track Your Paper

International Journal of New Technology and Research

Impact Factor 3.953

(An ISO 9001:2008 Certified Online Journal)
India | Germany | France | Japan

System for Identification and Analysis of Reduplication Words in Hindi Corpus

( Volume 2 Issue 4,April 2016 ) OPEN ACCESS

Dr. Kamlesh Dutta, Anshul Jindal


Reduplication words is a class of MWE which is rapidly expanding due to the continuous need for coinage of new terms for describing new concepts, such as multi word expression, gold standard, and web page. Identification of reduplication words can particularly help parsing, and dictionary based applications like machine translation, and cross lingual information retrieval, since such word sequences should be treated as a single unit. The purpose of our work is to come up with a list of potential reduplication MWEs which a lexicographer can look at and decide whether a given word sequence should be added to the lexicon. This will aid the construction of a quality lexicon which incorporates MWE entries.

A system is to be developed which is focused on first extracting reduplication words from the given text and then identify them into different categories based upon their semantic and syntactic analysis. The system should store different categories words into different files based upon their classification. The approach used in identification of reduplication words is that the two hyphen separated words are first translated to English language and then they are compared from the backside. Depending upon the degree of similarity they are classified into different categories of reduplication

Paper Statistics:

Total View : 773 | Downloads : 764 | Page No: 18-21 |

Cite this Article:
Click here to get all Styles of Citation using DOI of the article.