What is Taxonomy?
Taxonomy basically
evolved from biology for a hierarchical organization or classification of
information. The terms taxonomies, classifications and categorizations are
often times used synonymously. Hierarchical formations usually have a tree like
structure starting from a single parent node with multiple children nodes
beneath them. In the content management world, taxonomy refers to the way in
which content is categorized in this tree like structure. For example the
Windows Explorer is a hierarchical organization of content with folders and
sub-folders and is a good example of physical categorization of content. Such a
hierarchical organization of content
is also known as a "containment taxonomy"
where by the content is physically stored within the hierarchical structure. Think
of these are physical file folders. In a real world scenario, however a content
item (or an electronic document) may be falling under different heads of classification.
In such cases a single content item may be associated with multiple nodes
within the hierarchical structure. This is also known as "associative taxonomy". For example a single content item
referenced filed under "Travel Policy" and under "Expense Policy" as well. Sometimes a single term might have equivalent
meanings (or synonyms). In such cases the taxonomy relationship is also known
as "equivalence relationship"
Metadata: in simple terms metadata may also be
defined as data about data. Think of the way library cards are physically
organized. These cards are metadata about the physical location of books in a
library and may contain pertinent information about a book such as the "Authors
Name", "Books Title", "Publishers Name", "Physical Location" etc. This
information is also known as Metadata and helps us in easily locating a book by
its title, or authors name etc. Such metadata is applied to content in
electronic formats as well. The way in which such metadata is applied depends
on the capability of a content management system. Some of these systems store
the actual content in a file system and wrap metadata information around the
content, and the metadata information itself may be stored in a database.
Content in HTML and PDF formats may have the metadata within the content item
itself, though additional information may also be stored to a datastore outside
of the content item. Usually enterprise class CMS systems allows for extensive
metadata management capabilities including the ability to apply metadata from
controlled vocabularies and also allow for nesting of metadata when required.
Such metadata management is hugely important in R&D systems in LifeSciences
and Pharmaceutical companies.
Difference between storage taxonomy, and navigation
taxonomy: Taxonomies may
be used for several purposes, and there are many companies engaged in the
business of creating industry specific taxonomies. Though a limited number of
parent-child hierarchical relationships may be used for content storage and
retrieval, this becomes very complex and unmanageable when used in the context
of an enterprise system. Hence many systems tend to separate out the taxonomies
as "content storage taxonomies" and "content display taxonomy". For example the
"navigation tabs" we see in a site can be the content display taxonomy which is
different from the content storage taxonomy.
There are
different ways by which the "content
storage taxonomy" is associated within the "content display taxonomy". This may include techniques such as
simple content filtering rules, intelligent templating mechanisms and dynamic
content targeting capabilities. Some search
solutions are also capable of creating dynamic navigation hierarchies for
content display purposes.
Auto Categorization: also known as auto classification
is done using a content mining and analysis tool that can identify patterns of
words in documents and classify them into separate categories based on pattern
similarities. Vendors offering these capabilities use several different
algorithms similar to those of search engines. Though useful from a content
taxonomy stand point, such solutions are also used for text-mining, pattern
matching, concept matching, dynamic cataloging, and content clustering.
Indexing: - this may be fall under the category of
search; it is also related to taxonomy because many search engines that offer
auto-classification capabilities also provide search and indexing services.
Indexing is the process by which a search engine (or an indexer) parses and
scans through content items, including the metadata and identifies the key
patterns and words and stores them separately in an index. When an end-user
executes a search, the engine locates the content attributes from the index and
provides pointers to the location of the content.