Content storage and structure
- Related topics
Magnolia stores all content (web pages, images, documents, configuration, data) in a content repository. The repository implementation we have chosen, Apache Jackrabbit, adheres to the Java Content Repository standard (JCR).
A content repository is designed to store, search and retrieve hierarchical data. Data consists of a tree of nodes with associated properties. Data is stored in the properties. They may store simple values such as numbers and strings or binary data (images, documents) of arbitrary length. Nodes may optionally have one or more types associated with them, which in turn dictates the type of their properties, the number and type of their child nodes, and certain behavioral characteristics.
Example: A, B, C and D are nodes. The boxes represent properties with Boolean, numerical, string and binary values.
Java Content Repository (JCR) is a standard interface for accessing content repositories. JCR version 1.0 was specified in Java Specification Request 170 (JSR-170). Version 2.0 in JSR-283 is also final. JCR specifies a hierarchical content store with support for structured and unstructured content.
Magnolia was the first open-source content management system built specifically to leverage JCR. The standard decouples the responsibilities of content storage from content management and provides a common API that enables standardized content reuse across the enterprise and between applications. Magnolia uses the open-source Jackrabbit reference implementation.
Application developers benefit from standardization as they don’t need to learn several vendor-specific APIs. Learning one standard API allows them to work with any compliant repository and write code against it.
Businesses enjoy freedom of choice. Open standards like JCR are the best insurance against vendor lock-in; any CMS that supports the JCR standard becomes a viable alternative. Costs associated with switching vendors are smaller when your content is already the correct format.
A persistence manager (PM) is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository can use a separate persistence manager to store content for that workspace. The persistence manager sits at the bottom layer of the Jackrabbit system architecture. Reliability, integrity and performance of the PM are crucial to the overall stability and performance of the repository.
In order to avoid integrity issues and to benefit from services such as observation, clustering and indexing, you should always access the content through the JCR API. Changing the data directly (bypassing the API) causes serious issues. This may sound restrictive but the API is actually quite versatile. You can even access the content repository from external applications using the API.
The choice of persistence managers includes:
Database: Magnolia uses a database as persistence manager by default. This is the most common option. We ship WAR files and operating system specific bundles with the H2 database. H2 is an embedded database that allows us to package a fully operational Magnolia example into a single download, including configuration details and demonstration websites. It requires minimal installation effort from users. However, for production environments, we recommend an enterprise-scale database such as MySQL, PostgreSQL or Oracle. All of them work with JCR. Database connections are based on JDBC, involve zero deployment, and run fast. Note! The MySQL InnoDB storage engine is supported by Magnolia, the MyISAM engine is not. InnoDB is the default engine in MySQL 5.5 and later.
File system: This kind of data store is typically not meant to run in production environments, except in read-only cases, but it can be very fast.
In-memory: This is a great persistence manager for testing and for small workspaces. All content is kept in memory and lost as soon as the repository is closed. Even faster than a file system. Again, not for production use.
Magnolia DX Core allows you to switch between persistence managers without losing any content.