Features – Columbus doc

Data and documents acquisition

The system acquires information from a variety sources specified by the user
(or by the system administrator) complying with the following conditions:

– The rightful owner has granted the necessary access authorization

– The access is either supported by the source via API interface or by Columbus through a customized connector specifically written and authorized

Transformation

This process uniforms all the acquired items, originally captured in their own format, in order to obtain searchable PDF documents. Depending on the acquired item format, the transformation may or may not include different tasks, such as rendering, scanning, OCR, conversion, export.

Optimized storage

The system carries out an optimized storage of all the processed items by using dedicated storages to maximize the access performance in consultation.

– While respecting access privacy and security, all duplicate content issues, often occurring in communications between members of the same organization, are solved (just think an email attachment sometimes present in dozens of elements, but still the same as content). The system then stores a single copy of any content, identifying each one by hash functions in order to save both space and time for processing any already transformed content;

– All the searchable PDF documents are also stored in paged mode: In other words, the system stores a conversion that allows you to consult a single document page avoiding to access the entire document. For example when the item to be searched is only mentioned in a few pages of a one hundred or more pages document the paged storage allows
to immediately locate and access such pages without downloading the entire document.

Classification and indexing

The classification process applies a set of automatic algorithms to all the acquired documents; such algorithms are able to extract and annotate structured information coming from both content and source where the document has been extracted.

The Automatic Extraction Algorithm of the information are combinable according to the customer needs: it is possible then to select the algorithms to be installed, how much computational capacity they must absorb, and in which order they must be run. It’s also possible to write and install customized algorithms capable of applying specific logic in the identification and extraction of structured data. You can also install algorithms using external services of artificial intelligence, able to further extend the capabilities of the system automatic analysis.

The set of these metadata is then used to enrich the indexing phase and make the system able to search information assets by combining full-text search methods with structured data ones. Furthermore, structured information is a basic element of the system’s ability to allow exploratory search, by providing different semantic paths to refine the search results.

During the indexing phase, the system also sets the necessary information to ensure security to access the information assets, according to partitioning and sharing mechanisms, able to guarantee each user can only search within the assets he is granted to access.

Search and consulting

When classification and indexing processes are completed, the system provides different search methods, both functional and architectural.

Search through columbus client: the dedicated client, available both in desktop mode (Windows PC) and (optionally) also in mobile mode (smartphone / tablet iOs / Android / Windows), provides all the available search functions.

The search is possible by combining full-text elements existing the indexed document text with filters on the several metadata sets extracted from the system.

It’s possible then to refine the obtained result through a faceting system totally based on the extracted metadata. The facet selection allows inclusions and exclusions, representing a powerful and intuitive way to rapidly locate the content of interest.

The system also allows the so-called ‘exploratory search’ which enable you to explore the entire information assets with no initial query input but using only the faceting mechanism. The system architecture allows such features while maintaining the expected performance efficiency.

Search through the API: the above described search functions can also be integrated with other systems by using the available API and the appropriate security mechanisms, in order to query the system and obtain standard format data to be used for own needs.

Sharing

Each user, if properly authorized through the administrator settings, can share his own search results with other users.

In particular, through the ‘smart sharing’ functions, the user can share information by following a functional logic not strictly tied to the typical security mechanisms of ACLs (users and groups). The sharing can either be based on data classification and / or full-text searches results. In this way it is more intuitive and can be based on functional rules capable of sharing over time even new elements matching the set rules. For example, if the user shares all the documents of a given year dealing with a specific topic, should future indexed elements match the same set criteria they will be automatically shared by the system.