Thursday, November 12, 2009

Text Analytics to Help in Classification

Rise of the Machines: The Role of Text Analytics in Record Classification and Disposition by James Santangelo, Information Management, ARMA (Nov/Dec 2009)

Classification is essential but may be overwhelming to staff. Because of the volume automated classification is needed - and text analytics software can help.

"The latest advancements in text analytics use sophisticated techniques to determine the conceptual meanings within each file to compensate for shortcomings and extend the functionality of the applications that use policy rule engines. Use of text analytics greatly increases the accuracy of the classification by interpreting the meaning of terms in their context instead of being limited by the character strings inherent in policy rule engines."

Text Analytics

Text Analytics Gains a Broader Audience in the Enterprise by Paula J. Hane, IT Newslinks (Nov 2)

Text analytics is becoming more important to search. As this article explains:

"Text analytics extracts key information from unstructured text and helps to retrieve otherwise hidden information. It is a key component of many customer relationship management (CRM) applications, as well as for media and publishing, competitive intelligence, reputation monitoring, e-discovery, compliance, and financial analysis. Because of this, we've seen a number of acquisitions of text analytics firms by larger search companies (Business Objects acquired Inxight, Reuters acquired ClearForest, SAS acquired Teragram, and IBM acquired SPSS) and an increased pace of product and service rollouts."

It's really automatic tagging. Susan Feldman said of one vendor, TEMIS:

""It's clear that text analytics has taken off as a hot market, and TEMIS' expansion of its US business underlines this fact." "As the volume and flow of information increases, publishers and corporations are turning to automation to tag their content to make it findable, to understand what their customers are saying, to monitor trends and opinions about their products and their companies. That's impossible, given the exponential growth of information that needs to be processed, unless the process is automated." "

Wednesday, November 11, 2009

OpenCalais is amazing

Learn About and Try OpenCalais (a Free Service from Thomson Reuters), ResourceShelf (Nov 6)

OpenCalais is making a dent in use of semantic technology to extract entities and topics from text.

"In a nutshell, OpenCalais uses semantic technology and natural language processing to analyze text and add metadata by drawing out entities from documents, blog posts, news stories, etc. In some cases, ths type of data can identify or help identify relationships between people, businesses, etc."

This post gives an example of what it can do, and points to the OpenCalais viewerbox where we can try it for ourselves - take a substantial story from an online news site and see the types of data that Calais can extract and organize.

Explore to see the power of the tool. Will we need taxonomies if we have tools like OpenCalais?

Further, we can have this at our fingertips for content we follow with Feedly, a Firefox plugin.

Feed(ly)ing The Enterprise
, by Jennifer Zaino, Semantic Web (Nov 9)

"For one thing, it’s the semantic technology embedded within Feedly, which uses the OpenCalais web service to get a clean representation of metadata behind content. That gives power to enterprise users such as marketing professionals, who might be subscribed to various blogs and feeds and services and different content that’s relevant to their brand."

Friday, October 30, 2009

Endeca - Aiming for integrated search/BI

Endeca Stresses Simplicity With New Partnerships by Theresa Cramer, Newsbreaks (Oct 29)

Endeca is partnering with Informatica and SAP in a bid to integrate search with business intelligence.

"Under the terms of the OEM agreement, Endeca will integrate Xcelsius software, SAP BusinessObjects Intelligent Search software, PowerCenter, and PowerExchange into the Endeca Information Access Platform. These partnerships will address two very different issues for Endeca and its customers, says Sonderegger."

Inxight, now part of SAP, turns up in this too.

"There is also a little something in this new partnership for Endeca's intelligence customers, namely, the licensing of SAP's query federation tool, Inxight SmartDiscovery Awareness Server. Intelligence customers will be able to send off a query about "uranium enrichment," for example, to multiple information sources such as Endeca applications, proprietary databases, The New York Times, and The Washington Post and expect to see results in one screen. "This comes straight from the traditional search world," says Sonderegger. "We've added our own twist. ... When results come back, the analyst will be able to do faceted browsing across search results.""

The Taxonomists Career

Becoming a Taxonomist: Real Life Stories by Karen Loasby, FUMSI (Oct 2009)

Four information taxonomists tell their career stories - Heather Hedden, Helen Lippell, Dorothy Tuma and Stephen D'Arcy. We see a mix of indexing, information architecture, information management. There are many backgrounds, such as Dorothy's "In addition to abetting my creative avocations, getting my MFA taught me to think about concepts, meaning, semantics, aboutness, and implicit versus explicit meaning. "

Information Access Technology from Gartner

Gartner has issued its 2009 Magic Quadrant for Information Access Technology (Sept 2009)

The overall description is a fair representation of enterprise search technology today. Selecting the vendor is only one step. A much larger one is to make the technology meet an organization's information needs - an endeavour that takes much planning and much adaption of both the technology and the organization.

"This Magic Quadrant assesses vendors with capabilities that go beyond enterprise search to encompass a range of technologies. Their capabilities include: search; federated search, content analytics, such as content classification, categorization and clustering, fact and entity extraction, taxonomy creation and management, information presentation (for example, visualization) to support analysis and understanding; and desktop search to address user-controlled repositories to locate and "invoke" documents, data and e-mail."

Gartner included vendors that have search as the foundation piece. It notes that "Many include other capabilities such as autocategorization, taxonomy functions and clustering, but we excluded those that offer only these capabilities, with no search."

Companies to note: Autonomy, Endeca, Exalead, IBM, Microsoft, Oracle, Recommind, Vivisimo. (There are others.)

Unfortunately the report does not describe the specific capabilities of the vendors.

Not Otherwise Categorized

Add this blog - Not Otherwise Categorized - to the reading list. This comes from Seth Earley and Associates and has several contributors. Caategories include content management, taxonomy, semantic web, Sharepoint (MOSS) and much else. Not a high volume blog, but has some very good reading.

Wednesday, October 28, 2009

Taxonomy Jargon Explained

What’s the difference between Taxonomies and Ontologies? - Ask Dr. Search at New Idea Engineering (June 2009)

Excellent question - excellent answer. These words are often interchangeable - but it depends on the person, as Dr Search explains. The casual user might use either term, but the "deep researcher" might prefer ontology. Dr Search suggests that ontology is the big sweeping pictureof knowledge, and the taxonomy more specific subject domain. There are differences in understanding in computer science context also.



"Beyond academic precision, ontologies try to represent knowledge in a form so carefully that even computers can derive meaning by traversing the various relationships. If a computer were actually relying on this data you can understand that the “is-a” relationship in “Obama is-a president” and “my boss is-a huge pain” have slightly different meanings, the former conferring a job function, the latter a behavioral attribute. Unless you are a researcher or vendor of this technology, most people don’t need to worry about this.

Taxonomies can also be read and used in computer software, for example Verity’s Topic Sets were a form of taxonomy, and could be loaded into a profiler to classify incoming documents; many other companies have had this idea as well. But the linkages between parent and child branches were much simpler in nature, and were designed to simply combine fulltext search terms in various ways. There was no hint of “understanding” in the relationship between a parent and child, beyond simple fulltext matching. This was still very advanced for its time (the late 1980s), but it didn’t attempt to encode meaning."



There are many other terms that may come into the discussion that are related to use of taxonomies such as topic trees, knowledge base, folksonomy (not the same at all), tagging. and sometimes natural language processing (which might be used to help create a taxonomy) and metadata.

Taxonomy is also discussed in Do You Need A Taxonomy? where it is explained that there are three types: Subject based (a subject domain), Content based (derived from the content), and Behaviour (not as clear - might be usage).

Friday, October 23, 2009

Transparent semantic search at Lexis Nexis

Semantic technology is being adopted more for search processing. LexisNexis, known for legal research tools, is enhancing those with a greater understanding of the meaning of content and, according to this press release, is also revealing what it has done to the searcher. Most semantic search engines seem to work like magic - they just "do" and the searcher must accept on faith.


LexisNexis Introduces Transparent Semantic Search Technology for Patent Research, Business Wire (Oct 12)

"Through a development alliance with Dallas-based Pure Discovery, LexisNexis has become the first provider of legal information services to integrate the power of semantic search technology with familiar Boolean search technology, giving the user greater control over the patent research process via a simple, streamlined user interface that matches their typical daily workflow. "

Thursday, October 22, 2009

Intute Thesaurus Removed

Intute has removed its thesaurus for social science. This announcement - The Thesaurus Engine service has been withdrawn - suggests that it did not "align" with UK higher education courses. Odd - would be helpful for students to learn about and use thesauri.