Large organizations have to deal with plethora of text documents on daily bases that require flexible but customized classification. That system should be flexible to tune itself according to dynamic changing requirements of predefined categories.
File classification is a task of assigning one or more predefined categories to an electronic text document, based on its content. There can be two basic types of categorizations: supervised and unsupervised. All of the supervised categorization methods use machine learning techniques to train the classifier using a training set of pre-classified documents to later carry out the categorization of earlier unseen files. During such training, a classifier accumulates knowledge that is important for distinguishing categories based on document features. Unsupervised methods do not rely on any kind of external information for classification. They rather focus on finding out such features of documents that allow them to gather similar documents into consistent groups, usually referred as clusters.
DTS has researched and implemented autonomic file Categorization by using ontological knowledge for the text categorization purposes. Ontology’s contain formal descriptions of concepts and relationships that model certain domains. Ontology defines knowledge and properties of a given domain in a way that machines can read it and understand it.
DTS Ontology Based Training-Less File Categorization System reads text from set of given files and categorize them in predefined categories by using its semantic based intelligence.
In contrast to previously taken approaches in text categorization that enhance known methods with ontological knowledge, we directly use ontological knowledge for text categorization. Our solution does not rely on the training of a categorizer, making a training set unnecessary, and directly leverages the knowledge from the ontology for text categorization.