sethuiyer.com | sethu iyer on business, strategy, leadership, content management, collaboration, document management
sethu-iyer is a enterprise architect at vignette for web content collaboration and social media sethuiyer has specialization in web content management, knowledge management, collaboration, records management, erp, enterprise resource planning, digital marketing, search.

A primer on taxonomy

A primer on taxonomy

What is Taxonomy?

Taxonomy basically evolved from biology for a hierarchical organization or classification of information. The terms taxonomies, classifications and categorizations are often times used synonymously. Hierarchical formations usually have a tree like structure starting from a single parent node with multiple children nodes beneath them. In the content management world, taxonomy refers to the way in which content is categorized in this tree like structure. For example the Windows Explorer is a hierarchical organization of content with folders and sub-folders and is a good example of physical categorization of content. Such a hierarchical organization of content is also known as a "containment taxonomy" where by the content is physically stored within the hierarchical structure. Think of these are physical file folders. In a real world scenario, however a content item (or an electronic document) may be falling under different heads of classification. In such cases a single content item may be associated with multiple nodes within the hierarchical structure. This is also known as "associative taxonomy". For example a single content item referenced filed under "Travel Policy" and under "Expense Policy" as well.  Sometimes a single term might have equivalent meanings (or synonyms). In such cases the taxonomy relationship is also known as "equivalence relationship"

Metadata:  in simple terms metadata may also be defined as data about data. Think of the way library cards are physically organized. These cards are metadata about the physical location of books in a library and may contain pertinent information about a book such as the "Authors Name", "Books Title", "Publishers Name", "Physical Location" etc. This information is also known as Metadata and helps us in easily locating a book by its title, or authors name etc. Such metadata is applied to content in electronic formats as well. The way in which such metadata is applied depends on the capability of a content management system. Some of these systems store the actual content in a file system and wrap metadata information around the content, and the metadata information itself may be stored in a database. Content in HTML and PDF formats may have the metadata within the content item itself, though additional information may also be stored to a datastore outside of the content item. Usually enterprise class CMS systems allows for extensive metadata management capabilities including the ability to apply metadata from controlled vocabularies and also allow for nesting of metadata when required. Such metadata management is hugely important in R&D systems in LifeSciences and Pharmaceutical companies.

Difference between storage taxonomy, and navigation taxonomy: Taxonomies may be used for several purposes, and there are many companies engaged in the business of creating industry specific taxonomies. Though a limited number of parent-child hierarchical relationships may be used for content storage and retrieval, this becomes very complex and unmanageable when used in the context of an enterprise system. Hence many systems tend to separate out the taxonomies as "content storage taxonomies" and "content display taxonomy". For example the "navigation tabs" we see in a site can be the content display taxonomy which is different from the content storage taxonomy.

There are different ways by which the "content storage taxonomy" is associated within the "content display taxonomy". This may include techniques such as simple content filtering rules, intelligent templating mechanisms and dynamic content targeting capabilities. Some search solutions are also capable of creating dynamic navigation hierarchies for content display purposes.

Auto Categorization:  also known as auto classification is done using a content mining and analysis tool that can identify patterns of words in documents and classify them into separate categories based on pattern similarities. Vendors offering these capabilities use several different algorithms similar to those of search engines. Though useful from a content taxonomy stand point, such solutions are also used for text-mining, pattern matching, concept matching, dynamic cataloging, and content clustering.

Indexing: - this may be fall under the category of search; it is also related to taxonomy because many search engines that offer auto-classification capabilities also provide search and indexing services. Indexing is the process by which a search engine (or an indexer) parses and scans through content items, including the metadata and identifies the key patterns and words and stores them separately in an index. When an end-user executes a search, the engine locates the content attributes from the index and provides pointers to the location of the content.

Recent Comments

Leave the first comment for this page.