- Enterprise Content Categorization – The Business Strategy for a Semantic Infrastructure
- Enterprise Content Categorization – How to Successfully Choose, Develop and Implement a Semantic Strategy
Enterprise categorization, he explains in the first article, is an element of the text analytics field (along with text mining, ontologies, sentiment analysis). While these can work together, content categorization done at the enterprise level shows and supports the semantic infrastructure of the enterprise. In the paper he presents a set of core capabilities and features for ECC.
- categorization
- entity extraction
- fact and event extraction
- summarization
- clustering
Taxonomy and controlled vocabulary fit as "related semantic elements" along with metadata and ontologies.
The mission is to reveal the semantic infrastructure. Many organizations, Reamy "do not have a clear idea of what content and/or content structure they have".
There are three dimensions to a semantic infrastructure:• Content and content structure, such as metadata standards, taxonomiesand other structures.• Technology, which can include applications such as search and contentmanagement.• A team of people who are dedicated to maintaining, refining andfacilitating the application of the infrastructure’s various elements.
Enterprise content categorization would lead to cost savings in search and improved content management. Reamy describes ways to accomplish this by adding structure to unstructured content – through categorization, noun phrase extraction, ontologies - and using faceted navigation.
The second paper asks of organizations to have a "deep strategic understanding" of that semantic infrastructure before embarking on employing text analytics and enterprise content categorization.
Text analytics, especially enterprise content categorization software, differs from most other information technologies in that its core capabilities of auto-categorization, auto-summarization and entity extraction have more to do with meaning and semantics than with technology. Furthermore, this software is not designed to be used by itself; rather, it is meant to be used with other technologies such as search and content management.
Reamy summarizes the types of software, and makes points about standalone taxonomy management software, enterprise search, and content management. He next guides the reader - in some detail - through the considerations in undertaking the project from proof of concept, through development
These two papers form a full guide, or even course in how to do it.
No comments:
Post a Comment