Organizing documents

Documents are stored in repositories formally known as Document stores. A document store keeps all the documents in it organized together, same way as a cabinet. It groups them and defines the basic set of common configurations to ease their administration.
An implication of this, is that configurations like access rules, metadata configuration,…​ are not shared across different document stores. Similarly, a document placed in a document store cannot be accessed from another, nor share any configuration with documents from other stores.

Document stores

Document stores work as the basic unit of organization. Under the hood BigContent makes sure all data is securely stored in different databases. By default, a document store can hold an unlimited number of documents alongside their configuration.

Settings

Here is a brief summary of the configurations that can be applied to a document store:

  • Access permissions: definition of user groups that enable read-write, administrative privileges amongst others.

  • Metadata configuration (categories): listing of categories available in the document store, and which properties they have.

  • Default deletion policy: default deletion method applied to delete operations (to document or versions): metadata flagging, metadata deletion or physical deletion.

  • Audit configuration: which events will be tracked and registered in the audit trails.

  • Access-User header policy: whether audit access user is mandatory for the document store. Access-User allows adding a custom user identifier that is registered in the audit trails. For example, to store the actual user doing each action.

Each one of these configurations is kept independently for each document store to help you organize documents in a way that fits your needs.

Default deletion policy

There are 3 supported different types of deletion: metadata_flagging, metadata_deletion and physical_deletion.

Any of them can be set as the default one in the document store. Being metadata_flagging set automatically as the default value.

  • metadata_flagging: doesn’t actually delete any information about the document. Instead, it sets the deleted document or content as hidden, so only users with administration privileges are able to manipulate them.
    Hidden documents are flagged with the _hidden property, which is only visible by administrators. Once hidden, a document or content can be easily restored by admins using the appropriate operations (restore document, restore content).

  • metadata_deletion: deletes only the metadata, without deleting the binary contents. The binary files will be unreachable, even for admins, but won’t be lost. If a restore is needed, make a request to the BigContent support team.

  • physical_deletion: deletes the whole document permanently, including both metadata and content files. Restoration WILL NOT be possible under ANY circumstances.

The default deletion type will be applied to every deletion operation, this however, can be overridden by admin users.

Audit configuration

BigContent allows to keep track of operations to comply with security and regulatory needs. For each tracked operation the following information is kept:

  • Unique id of the audit trail

  • Domain or document store affected

  • Date and time of the operation

  • Id of the authorized user used for the operation

  • User information as provided in Access-User header

  • Operation name as shown in the table below.

  • Id of the affected object. Usually it will match a document id

  • Extended information, additional information

All information is stored in a dedicated database repository and is easily accessible with search trails API operation.

There is a list of the possible events that can be registered for each document store and domain.

Table 1. Audit events
Entity Event Description

Document

DOCUMENT_CREATE

Creation of a new document

DOCUMENT_DELETE

Deletion of a document

Metadata

METADATA_GET

Retrieval of a document metadata

METADATA_UPDATE

Update of any field in the document metadata

CATEGORY_ADD

Addition of a category instance to a document metadata

CATEGORY_DELETE

Removal of a category instance to a document metadata

CATEGORY_GET

Retrieval of all document’s categories

Version

VERSION_GET

Retrieval of a version of a document

DOWNLOAD_VERSION

Transfer of a specific document version content to a client

VERSION_NEW

Creation of a content as a new version of a document

VERSION_UPDATE

Override of a document’s content by a new one

VERSION_DELETE

Removal of a version of a document

VERSION_SET_CURRENT

Modification of the current version of a document. That is, setting on of the already existing versions as current. No content is added, modified or removed

Search

SEARCH

Retrieval of a Document Metadata list searching by document content

Links

LINK_CREATE

Creation of a link to download the document version’s content

Users

USER_GET_DETAILS

Retrieval of user information

Login

TOKEN_CREATE

Token generation

TOKEN_REFRESH

Token refresh

Access-User header policy

The Access-User header is an optional header which value will be recorded in the audit trails. It allows recording the real user who made the action in the adit trails under the accessUser field.

For each document store you can decide whether this information is required or not for all operations. Note that even when the header is set as required, the value contained is never validated against an AD or any sort of user’s list.

Domains

Likewise to how document stores group documents, Domains group document stores together.

Each domain has its owns access configuration separated from the one of the document stores that it contains. We recommend defining different groups which is the default configuration. But you are free to use the same groups to keep a simpler configuration.

Future plans for domains include enabling the definition of common configurations across different document stores to simplify management. But currently, the only practical implication is related to audit configuration. Events that are not related to a specific document store (e.g. authentication token creation and refresh) are considered as domain events and are treated separately. Audit trails related to such events require domain level permissions.

Documents

The Document is the basic unit of information and the key element of the BigContent.

Each document belongs to only one document store, and is composed of two elements:

  • The different versions of its binary contents: They contain the actual files that users work with. For example a document can be composed of a version 1.0 containing a working document in MS-Word format, and a second 2.0 version, the final, in PDF format.

  • The metadata: It holds on one side basic information like the author, date of creation,…​ and on the other, the customized business information in categories.

    The JSON used by the API to represent the metadata of a document has an element also named content. This contains information (e.g. version, filename, size,…​) about the different versions of the document and must not be confused with the actual binary contents. The latest, are accessible using the specific operations under /contents endpoint.

Keep in mind that a document can be created simply uploading a content with no additional metadata. The opposite is not possible though, a document must always contain at least one version attached.

Metadata

BigContent provides a flexible metadata model closer to a tagging system than to classical classes and types models found in ECM solutions like IBM’s FileNet or OpenText’s Documentum. Business change quickly an data needs to adapt quickly as well.

Instead of requiring the definition of a rigid and hierarchical model before everything else, we offer an open dynamic model based on Categories. Just start adding documents from day one. Metadata can be added or modified later with ease. When a new business need arises, just update the information on-the-go.

Document metadata example
{
  "id" : "qga8r5hsy9rw5weggf93lopa58",
  "description" : "Invoice template",
  "_hidden" : false,
  "author" : "john@company.com",
  "dateCreated" : "2018-10-24T07:10:51.709Z",
  "lastModifier" : "john@company.com",
  "dateModified" : "2018-10-24T07:10:51.709Z",
  "currentVersion" : "1.0",
  "content" : [
    {
      "id" : "g8q9ppihd58art2f9op1sq28gh",
      "name" : "invoice.pdf",
      "type" : "application/pdf",
      "size" : 58943,
      "majorVersion" : 1,
      "minorVersion" : 0,
      "_hidden" : false,
      "author" : "john@company.com",
      "dateCreated" : "2018-10-24T07:10:51.709Z",
      "lastModifier" : "john@company.com",
      "dateModified" : "2018-10-26T09:15:55.980Z"
    }
  ],
  "categories" : [
    {
      "_name": "template",
      "Type": "invoice",
      "date": "2018-10-24T07:10:51.709Z",
      "boolean": true
    }
  ],
  "documentStore" : "invoice_store"
}
Table 2. Document metadata fields

Name

Value

Detail

id

qga8r5hsy9rw5weggf93lopa58

A unique ID auto generated that identifies the document

description

Invoice template

A description provided by the author of the document at the creation of it

_hidden

false

Flag that indicates whether the document has been logically deleted or not. Only Admins can see it, and only if it is set to false, non admin users can see the document

author

john@company.com

The user that created the document to BigContent

dateCreated

2018-10-24T07:10:51.709Z

Date on which the document was uploaded. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset

lastModifier

john@company.com

The user that made the last modification to the document metadata

dateModified

2018-10-24T07:10:51.709Z

Date on which the document metadata was modified. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset

currentVersion

1.0

Indicates which one of the versions should be downloaded when contents.adoc#download-current-version

content

[{ "id" : "g8q9ppihd58art2f9op1sq28gh", "name" : "invoice.pdf", "type" : "application/pdf", "size" : 58943, "majorVersion" : 1, "minorVersion" : 0, "_hidden" : false, "author" : "john@company.com", "dateCreated" : "2018-10-24T07:10:51.709Z", "lastModifier" : "john@company.com", "dateModified" : "2018-10-26T09:15:55.980Z", }]

Array with metadata of the associated files

categories

[{
"_name": "template",
"Type": "invoice",
"date": "2018-10-24T07:10:51.709Z",
"boolean": true
}]

List with all the categories assigned to this document To edit them, use metadata.adoc#add-categories, or metadata.adoc#delete-categories categories

documentStore

invoice_store

The document store in which the document is stored

Categories

Categories are one of the most important features of the BigContent platform.

A category is a set of custom metadata that can be freely added or removed from a document. Every category has a name, and a set of fields with custom values.

The categories allow searching through the documents in a very powerful way. We can search for all the documents with a specific category assigned. We can even search for all the documents with a specific value on one of the fields of their categories.

However, is important to notice the difference between a category and a category definition.

Categories are what we assign to a document. A category contains specific values related to a single document. For example, once assigned we can search documents by these values.

The category definition is the template used to create a category. Category definitions are defined at a document store level, and are composed by:

  • Name: distinctive name of the category. It cannot contain spaces or special characters.

  • Set of metadata properties: these are the different properties that the category holds. Each one of them has its own name alongside its type (String, Date, Number or Boolean) and possible restrictions (mandatoriness and possible values).

Content

The content is the part of the metadata that represents the files.

Every time a new version of the document is uploaded, a new content is added to the document’s metadata. Note that the field currentVersion of the document will point to the most recent one, unless another one is set with set

A version number can be associated only to one (active) content at a time.
Many contents can have the same version number, if all but one are hidden (have been logically deleted).

Table 3. Content fields
Name Value Detail

id

g8q9ppihd58art2f9op1sq28gh

Unique non-sequential auto-generated identifier of the content

name

invoice11.pdf

Filename provided at the creation of the document

type

application/pdf

Content type of the associated file

majorVersion

2

A hierarchically major number that defines the creation order of the version

minorVersion

0

A hierarchically minor number that defines the creation order of the version

size

58943

The size of the file uploaded in bytes

_hidden

false

Flag that indicates whether the document has been logically deleted or not. Only Admins can see it, and only if it is set to false, non admin users can see the document

author

john@company.com

User account that uploaded the content to BigContent

dateCreated

2018-10-24T07:10:51.709Z

Date on which the version was uploaded. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset

lastModifier

john@company.com

The user that made the last modification to the version metadata

dateModified

2018-10-24T07:10:51.709Z

Date on which the version metadata was modified. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset