Organizing documents
Documents are stored in repositories formally known as Document stores.
A document store keeps all the documents in it organized together, same way as a cabinet.
It groups them and defines the basic set of common configurations to ease their administration.
An implication of this, is that configurations like access rules, metadata configuration,… are not shared across different document stores.
Similarly, a document placed in a document store cannot be accessed from another, nor share any configuration with documents from other stores.
Document stores
Document stores
work as the basic unit of organization.
Under the hood BigContent makes sure all data is securely stored in different databases.
By default, a document store can hold an unlimited number of documents alongside their configuration.
Settings
Here is a brief summary of the configurations that can be applied to a document store:
-
Access permissions: definition of user groups that enable read-write, administrative privileges amongst others.
-
Metadata configuration (categories): listing of categories available in the document store, and which properties they have.
-
Default deletion policy: default deletion method applied to delete operations (to document or versions):
metadata flagging
,metadata deletion
orphysical deletion
. -
Audit configuration: which events will be tracked and registered in the audit trails.
-
Access-User header policy: whether audit access user is mandatory for the document store.
Access-User
allows adding a custom user identifier that is registered in the audit trails. For example, to store the actual user doing each action.
Each one of these configurations is kept independently for each document store to help you organize documents in a way that fits your needs.
Default deletion policy
There are 3 supported different types of deletion: metadata_flagging
, metadata_deletion
and physical_deletion
.
Any of them can be set as the default one in the document store.
Being metadata_flagging
set automatically as the default value.
-
metadata_flagging
: doesn’t actually delete any information about the document. Instead, it sets the deleted document or content as hidden, so only users with administration privileges are able to manipulate them.
Hidden documents are flagged with the_hidden
property, which is only visible by administrators. Once hidden, a document or content can be easily restored by admins using the appropriate operations (restore document, restore content). -
metadata_deletion
: deletes only the metadata, without deleting the binary contents. The binary files will be unreachable, even for admins, but won’t be lost. If a restore is needed, make a request to the BigContent support team. -
physical_deletion
: deletes the whole document permanently, including both metadata and content files. Restoration WILL NOT be possible under ANY circumstances.
The default deletion type will be applied to every deletion operation, this however, can be overridden by admin users.
Audit configuration
BigContent allows to keep track of operations to comply with security and regulatory needs. For each tracked operation the following information is kept:
-
Unique id of the audit trail
-
Domain or document store affected
-
Date and time of the operation
-
Id of the authorized user used for the operation
-
User information as provided in
Access-User
header -
Operation name as shown in the table below.
-
Id of the affected object. Usually it will match a document id
-
Extended information, additional information
All information is stored in a dedicated database repository and is easily accessible with search trails API operation.
There is a list of the possible events that can be registered for each document store and domain.
Entity | Event | Description |
---|---|---|
Document |
DOCUMENT_CREATE |
Creation of a new document |
DOCUMENT_DELETE |
Deletion of a document |
|
Metadata |
METADATA_GET |
Retrieval of a document metadata |
METADATA_UPDATE |
Update of any field in the document metadata |
|
CATEGORY_ADD |
Addition of a category instance to a document metadata |
|
CATEGORY_DELETE |
Removal of a category instance to a document metadata |
|
CATEGORY_GET |
Retrieval of all document’s categories |
|
Version |
VERSION_GET |
Retrieval of a version of a document |
DOWNLOAD_VERSION |
Transfer of a specific document version content to a client |
|
VERSION_NEW |
Creation of a content as a new version of a document |
|
VERSION_UPDATE |
Override of a document’s content by a new one |
|
VERSION_DELETE |
Removal of a version of a document |
|
VERSION_SET_CURRENT |
Modification of the current version of a document. That is, setting on of the already existing versions as current. No content is added, modified or removed |
|
Search |
SEARCH |
Retrieval of a Document Metadata list searching by document content |
Links |
LINK_CREATE |
Creation of a link to download the document version’s content |
Users |
USER_GET_DETAILS |
Retrieval of user information |
Login |
TOKEN_CREATE |
Token generation |
TOKEN_REFRESH |
Token refresh |
Access-User header policy
The Access-User
header is an optional header which value will be recorded in the audit trails.
It allows recording the real user who made the action in the adit trails under the accessUser
field.
For each document store you can decide whether this information is required or not for all operations. Note that even when the header is set as required, the value contained is never validated against an AD or any sort of user’s list.
Domains
Likewise to how document stores group documents, Domains
group document stores together.
Each domain has its owns access configuration separated from the one of the document stores that it contains. We recommend defining different groups which is the default configuration. But you are free to use the same groups to keep a simpler configuration.
Future plans for domains include enabling the definition of common configurations across different document stores to simplify management. But currently, the only practical implication is related to audit configuration. Events that are not related to a specific document store (e.g. authentication token creation and refresh) are considered as domain events and are treated separately. Audit trails related to such events require domain level permissions.
Documents
The Document
is the basic unit of information and the key element of the BigContent.
Each document belongs to only one document store, and is composed of two elements:
-
The different versions of its binary contents: They contain the actual files that users work with. For example a document can be composed of a version
1.0
containing a working document in MS-Word format, and a second2.0
version, the final, in PDF format. -
The metadata: It holds on one side basic information like the author, date of creation,… and on the other, the customized business information in categories.
The JSON used by the API to represent the metadata of a document has an element also named content. This contains information (e.g. version, filename, size,…) about the different versions of the document and must not be confused with the actual binary contents. The latest, are accessible using the specific operations under /contents endpoint.
Keep in mind that a document can be created simply uploading a content with no additional metadata. The opposite is not possible though, a document must always contain at least one version attached.
Metadata
BigContent provides a flexible metadata model closer to a tagging system than to classical classes and types models found in ECM solutions like IBM’s FileNet or OpenText’s Documentum. Business change quickly an data needs to adapt quickly as well.
Instead of requiring the definition of a rigid and hierarchical model before everything else, we offer an open dynamic model based on Categories
.
Just start adding documents from day one. Metadata can be added or modified later with ease.
When a new business need arises, just update the information on-the-go.
{
"id" : "qga8r5hsy9rw5weggf93lopa58",
"description" : "Invoice template",
"_hidden" : false,
"author" : "john@company.com",
"dateCreated" : "2018-10-24T07:10:51.709Z",
"lastModifier" : "john@company.com",
"dateModified" : "2018-10-24T07:10:51.709Z",
"currentVersion" : "1.0",
"content" : [
{
"id" : "g8q9ppihd58art2f9op1sq28gh",
"name" : "invoice.pdf",
"type" : "application/pdf",
"size" : 58943,
"majorVersion" : 1,
"minorVersion" : 0,
"_hidden" : false,
"author" : "john@company.com",
"dateCreated" : "2018-10-24T07:10:51.709Z",
"lastModifier" : "john@company.com",
"dateModified" : "2018-10-26T09:15:55.980Z"
}
],
"categories" : [
{
"_name": "template",
"Type": "invoice",
"date": "2018-10-24T07:10:51.709Z",
"boolean": true
}
],
"documentStore" : "invoice_store"
}
Name |
Value |
Detail |
id |
|
A unique ID auto generated that identifies the document |
description |
|
A description provided by the author of the document at the creation of it |
_hidden |
|
Flag that indicates whether the document has been logically deleted or not. Only Admins can see it, and only if it is set to false, non admin users can see the document |
author |
|
The user that created the document to BigContent |
dateCreated |
|
Date on which the document was uploaded. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset |
lastModifier |
|
The user that made the last modification to the document metadata |
dateModified |
|
Date on which the document metadata was modified. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset |
currentVersion |
|
Indicates which one of the versions should be downloaded when contents.adoc#download-current-version |
content |
|
Array with metadata of the associated files |
categories |
|
List with all the categories assigned to this document To edit them, use metadata.adoc#add-categories, or metadata.adoc#delete-categories categories |
documentStore |
|
The document store in which the document is stored |
Categories
Categories are one of the most important features of the BigContent platform.
A category is a set of custom metadata that can be freely added or removed from a document. Every category has a name, and a set of fields with custom values.
The categories allow searching through the documents in a very powerful way. We can search for all the documents with a specific category assigned. We can even search for all the documents with a specific value on one of the fields of their categories.
However, is important to notice the difference between a category and a category definition.
Categories are what we assign to a document. A category contains specific values related to a single document. For example, once assigned we can search documents by these values.
The category definition is the template used to create a category. Category definitions are defined at a document store level, and are composed by:
-
Name: distinctive name of the category. It cannot contain spaces or special characters.
-
Set of metadata properties: these are the different properties that the category holds. Each one of them has its own name alongside its type (
String
,Date
,Number
orBoolean
) and possible restrictions (mandatoriness and possible values).
Content
The content is the part of the metadata that represents the files.
Every time a new version of the document is uploaded, a new content is added to the document’s metadata.
Note that the field currentVersion
of the document will point to the most recent one, unless another one is set with set
A version number can be associated only to one (active) content at a time. |
Name | Value | Detail |
---|---|---|
id |
|
Unique non-sequential auto-generated identifier of the content |
name |
|
Filename provided at the creation of the document |
type |
application/pdf |
Content type of the associated file |
majorVersion |
2 |
A hierarchically major number that defines the creation order of the version |
minorVersion |
0 |
A hierarchically minor number that defines the creation order of the version |
size |
58943 |
The size of the file uploaded in bytes |
_hidden |
|
Flag that indicates whether the document has been logically deleted or not. Only Admins can see it, and only if it is set to false, non admin users can see the document |
author |
|
User account that uploaded the content to BigContent |
dateCreated |
|
Date on which the version was uploaded. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset |
lastModifier |
|
The user that made the last modification to the version metadata |
dateModified |
|
Date on which the version metadata was modified. The format we follow is yyyy-MM-ddTHH:mm:ss.SSSZ. The "T" is used to separate date and time, and the Z is the zone designator for the zero UTC offset |