Repository configuration

Magnolia uses the Apache Jackrabbit content repository which conforms to the specifications outlined in JSR 283. The configuration options provided by Jackrabbit are defined in the API RepositoryConfig. For information on Apache Jackrabbit configuration, see the Jackrabbit configuration documentation.

The repository configuration file specifies global options like security, datasources, and versioning. A default workspace configuration template is also included as part of the repository configuration file. For each workspace, there is a workspace.xml file inside the workspace home directory. This allows for specific configuration to be applied per workspace.

Configuration file

Magnolia ships with a few examples of popular repository configurations that customers might use as-is or as the basis for a customized configuration.

At a high level, the file is structured as follows.

<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
"http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
<Repository>
  <DataSources .../>
  <FileSystem .../>
  <Security .../>
  <DataStore .../>
  <Workspaces .../>
  <Workspace> <!-- Blueprint for all new workspaces -->
    <FileSystem .../>
    <PersistenceManager .../>
    <SearchIndex .../>
    <WorkspaceSecurity .../>
  </Workspace>
  <Versioning .../>
</Repository>

The repository configuration folders are located here:

/magnolia-x.x.x/apache-tomcat-x.x.x/webapps/magnoliaAuthor/WEB-INF/config/repo-conf
/magnolia-x.x.x/apache-tomcat-x.x.x/webapps/magnoliaPublic/WEB-INF/config/repo-conf

OOTB Examples

jackrabbit-bundle-derby-search.xml
jackrabbit-bundle-h2-search.xml
jackrabbit-bundle-ingres-search.xml
jackrabbit-bundle-mysql-search.xml
jackrabbit-bundle-postgres-search.xml
jackrabbit-memory-search.xml

There are several other examples of configuration files available in our git repository.

Two properties are required for the content repository (see Configuration management).

Repository home directory: specified in the magnolia.properties file as magnolia.repositories.home
Repository configuration file: specified in the magnolia.properties file as magnolia.repositories.jackrabbit.config

Datasources

One or more datasource configurations can be created within the DataSources element. It’s common to configure one datasource per repository. Each workspace uses the exact same datasource connection. Technically, any workspace could use a different configuration. Some workspaces could connected to an embedded datasource, while others could connect to a database datasource. There is a lot of flexibility.

Database

The configuration options are outlined in the DataSourceConfig API.

<DataSources>
    <DataSource name="magnolia">
        <param name="driver" value="com.mysql.jdbc.Driver" /> (1)
        <param name="url" value="jdbc:mysql://localhost:3306/magnolia" /> (2)
        <param name="user" value="root" />
        <param name="password" value="password" /> (3)
        <param name="databaseType" value="mysql"/> (4)
        <param name="validationQuery" value="select 1"/> (5)
    </DataSource>
</DataSources>

Item Parameter Description

Item	Parameter	Description
1	`driver`	required Depending on which database you choose to work with, make sure you include the `JAR` file with the appropriate driver in the classpath. MySQL Connector/J is the official JDBC driver for MySQL Ingres JDBC Driver Downloads - Ingres Community Wiki PostgreSQL JDBC Driver Oracle JDBC Drivers Microsoft JDBC Drivers
2	`url`	required The connection URL of the database.
3	`password`	required The password associated with the schema.
4	`databaseType`	optional Use the appropriate setting for the database. postgresql mysql mssql azure oracle
5	`validationQuery`	optional The SQL query that’s used to validate connections from this pool before returning them to the caller. The query depends on the database type. MySQL: select 1 MSSQL: select 1 Oracle: select 1 from dual
not shown in sample	`maxPoolSize`	optional Restrict the number of connections in the pool to a max value.

driver

required

Depending on which database you choose to work with, make sure you include the JAR file with the appropriate driver in the classpath.

url

required

The connection URL of the database.

password

required

The password associated with the schema.

databaseType

optional

Use the appropriate setting for the database.

postgresql
mysql
mssql
azure
oracle

validationQuery

optional

The SQL query that’s used to validate connections from this pool before returning them to the caller. The query depends on the database type.

MySQL: select 1
MSSQL: select 1
Oracle: select 1 from dual

not shown in sample

maxPoolSize

optional

Restrict the number of connections in the pool to a max value.

JNDI

Jackrabbit supports JNDI datasources. The container you use determines how you set up your JNDI datasource.

Note that you can use datasources from JNDI in both the configuration of the datasources and in configuration of the components. Jackrabbit uses these JNDI datasources as-is and does not wrap pools around them. See ConnectionPooling.

Tomcat: JNDI Resources HOW-TO Tomcat 9.0.
JBoss: JBoss AS JNDI Datasource Setup.
WebLogic: JNDI for Oracle WebLogic Server.
WebSphere: JNDI namespace bindings for WebSphere.

See How to use a JNDI DataSource.

Embedded

Jackrabbit provides persistence manager implementations for both H2 and Derby databases. Using these databases doesn’t require a concrete datasource configuration. You provide the connection URL at the persistence manager configuration.

File system

The virtual file system is used by the repository to store things like registered namespaces and node types.

The configuration options are outlined in the FileSystem API.

<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
    <param name="path" value="${rep.home}/repository" />
</FileSystem>

Jackrabbit provides many options for configuring the file system. Choose the class that best fits your use case and click the link to see your implementation-specific configuration options.

org.apache.jackrabbit.core.fs.local.LocalFileSystem: The file system is created in the location specified by magnolia.repositories.home.
org.apache.jackrabbit.core.fs.db.DbFileSystem: Persists file system entries in a database table.
org.apache.jackrabbit.core.fs.db.DerbyFileSystem: Persists file system entries in an embedded Derby database.
org.apache.jackrabbit.core.fs.db.MSSqlFileSystem: Persists file system entries in an MS SQL database.
org.apache.jackrabbit.core.fs.db.OracleFileSystem: Persists file system entries in an Oracle database.
org.apache.jackrabbit.core.fs.mem.MemoryFileSystem: An in-memory file system implementation (can be useful for development).

See Jackrabbit File System Configuration.

Security

The security configuration element is used to specify authentication and authorization settings for the repository.

<Security appName="magnolia"> <SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager"/>
    <AccessManager class="org.apache.jackrabbit.core.security.DefaultAccessManager"></AccessManager>
    <!--
         Login module defined here is used by the repo to authenticate every request.
         Not by the webapp to authenticate user against the webapp context
         (this one has to be passed before thing here gets invoked).
    -->
    <LoginModule class="info.magnolia.jaas.sp.jcr.JackrabbitAuthenticationModule"></LoginModule>
</Security>

Jackrabbit uses the Java Authentication and Authorization Service (JAAS) to authenticate users who try to access the repository. The appName parameter in the <Security/> element is used as the JAAS application name of the repository.

Once a user has been authenticated, Jackrabbit uses the configured AccessManager to control what parts of the repository content the user is allowed to access and modify.

The more advanced SimpleJBossAccessManager class is designed for use with the JBoss Application Server. It maps JBoss roles to Jackrabbit permissions.

See Jackrabbit Security Configuration.

Data store

The data store is optionally used to store large binary values. Typically, all node and property data is stored in a persistence manager, but for large binaries such as files, special treatment can improve performance and reduce disk usage.

The main features of the data store are:

Space saving: only one copy per unique object is kept
Fast copy: only the identifier is copied
Storing and reading doesn’t block others
Multiple repositories can use the same data store
Objects in the data store are immutable
Garbage collection is used to purge unused objects
Hot backup is supported

File data store

The file data store stores each binary in a file. The file name is the hash code of the content. When reading, the data is streamed directly from the file (no local or temporary copy of the file is created). The file data store doesn’t use any local cache, which means content is directly read from the files as needed. New content is first stored in a temporary file, and later renamed or moved to the right place.

The configuration options are outlined in the FileDataStore API.

<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
    <param name="path" value="${rep.home}/repository/datastore"/>
    <param name="minRecordLength" value="1024"/> <!-- default is 100 bytes -->
</DataStore>

See Jackrabbit File Data Store.

Database data store

The database data store keeps data in a relational database. All content is stored in one table and the unique key of the table is the hash code of the content. When reading, the data may first be copied to a temporary file on the server, or streamed directly from the database (depending on the copyWhenReading setting). New content is first stored in the table under a unique temporary identifier, and later the key is updated to the hash of the content.

MySQL doesn’t support sending very large binaries from the JDBC driver to the database. Therefore, you should avoid a database data store when using MySQL. See data store limitations.

The configuration options available are outlined in the DbDataStore API.

<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="url" value="java:jboss/datasources/jackrabbit"/> <!-- JNDI Datasource example -->
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="databaseType" value="oracle"/>
    <param name="schemaObjectPrefix" value="repo_ds" />
</DataStore>

See Jackrabbit Database Data Store.

S3 data store

S3 data store can be helpful for Docker-like deployments. In this type of deployment, it’s common to move as much of the content and configuration out of your file system as possible (by default the datastore stores the binaries on the file server). Performance when handling big files shouldn’t be affected by using S3.

The S3 data store requires the Amazon WebServices Extension from Jackrabbit.

<dependency>
    <groupId>org.apache.jackrabbit</groupId>
    <artifactId>jackrabbit-aws-ext</artifactId>
</dependency>

Jackrabbit also requires an AWS properties file to specify information about your AWS S3 account. See aws.properties for a sample.

The S3DataStore class extends CachingDataStore. This means that, by default, the disk space is still used to store the cached items. You can specify the location using the path parameter. Otherwise, the repository folder is used.

<DataStore class="org.apache.jackrabbit.aws.ext.ds.S3DataStore">
    <param name="config" value="/path/to/your/aws.properties"/>
    <param name="path" value="/path/where/you/want/the/s3/cache/stored"/>

    <!-- The cache can be disabled so the items will always be read from the S3 and no FS is needed. -->
    <param name="cacheSize" value="0"/> <!-- Setting cacheSize to 0 disables the cache. -->
</DataStore>

Workspaces

The <Workspaces/> element of the repository configuration specifies where and how workspaces are managed. The configuration of this element gets stored in the RepositoryConfig class.

<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />

The following global workspace configuration options are available.

Parameter Description

Parameter	Description
`rootPath`	required The native file system directory for workspaces. A subdirectory is automatically created for each workspace, and the path of that subdirectory can be used in the workspace configuration as the `${wsp.path}` variable.
`defaultWorkspace`	required Name of the default workspace. This workspace is automatically created when the repository is first started.
`configRootPath`	optional By default, the configuration of each workspace is stored in a `workspace.xml` file within the workspace directory, which is in turn within the `rootPath` directory. If this option is specified, then the workspace configuration files are stored within the specified path in the virtual file system (see above) configured for the repository.
`maxIdleTime`	optional By default, Jackrabbit only releases resources associated with an opened workspace when the entire repository is closed. This option, if specified, sets the maximum number of seconds that a workspace can remain unused before the workspace is automatically closed.

rootPath

required

The native file system directory for workspaces. A subdirectory is automatically created for each workspace, and the path of that subdirectory can be used in the workspace configuration as the ${wsp.path} variable.

defaultWorkspace

required

Name of the default workspace. This workspace is automatically created when the repository is first started.

configRootPath

optional

By default, the configuration of each workspace is stored in a workspace.xml file within the workspace directory, which is in turn within the rootPath directory. If this option is specified, then the workspace configuration files are stored within the specified path in the virtual file system (see above) configured for the repository.

maxIdleTime

optional

By default, Jackrabbit only releases resources associated with an opened workspace when the entire repository is closed. This option, if specified, sets the maximum number of seconds that a workspace can remain unused before the workspace is automatically closed.

See Jackrabbit Workspace Configuration.

Workspace

The configuration specified in the Workspace element becomes the template for all workspaces created by Jackrabbit. Each workspace has its own workspace.xml file generated from this template.

File system

Workspace-level virtual file system passed to the persistence manager and search index. The same configuration options are available here as described above for the repository-level virtual file system.

See Jackrabbit File System Configuration.

Persistence manager

The persistence manager (PM) is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Property values are also stored in the PM, except for large binary values (those are usually kept in the datastore). Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace.

<PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager">
    <param name="dataSourceName" value="magnolia"/> <param name="schemaObjectPrefix" value="pm_${wsp.name}_" />
</PersistenceManager>

Jackrabbit provides a lot of choices for how you can configure the PM. Choose the class that best fits your use case and click the link to see your configuration options.

All BundlePersistenceManager implementations that do not use a pool of JDBC connections have been marked as deprecated. Replace them with the pooled version.

Search index

The search index in Jackrabbit is pluggable and has a default implementation based on Apache Lucene. It’s configured in the file workspace.xml once the workspace is created.

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index" /> <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
    <param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
    <param name="useCompoundFile" value="true" /> <param name="minMergeDocs" value="100" />
    <param name="volatileIdleTime" value="3" /> <param name="maxMergeDocs" value="100000" />
    <param name="mergeFactor" value="10" /> <param name="maxFieldLength" value="10000" />
    <param name="bufferSize" value="10" /> <param name="cacheSize" value="1000" />
    <param name="forceConsistencyCheck" value="false" />
    <param name="autoRepair" value="true" />
    <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" />
    <param name="respectDocumentOrder" value="true" />
    <param name="resultFetchSize" value="100" />
    <param name="extractorPoolSize" value="3" />
    <param name="extractorTimeout" value="100" />
    <param name="extractorBackLogSize" value="100" /> <!-- needed to highlight the searched term -->
    <param name="supportHighlighting" value="true"/> <!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() -->
    <param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/>
</SearchIndex>

See Jackrabbit Search Index Configuration

Workspace security

Workspace security is handled by the class MagnoliaAccessProvider, a Magnolia-specific ACL provider. This class compiles the set of permissions a user has for a given workspace. If the user doesn’t have any permissions for the workspace, then root-only access is returned. If the user is detected as admin or superuser, which is checked first, then an implementation of CompiledPermissions that grants everything is returned.

The class MagnoliaAccessProvider has DEBUG output available that you can switch on using the Log Tools app.

<WorkspaceSecurity>
    <AccessControlProvider class="info.magnolia.cms.core.MagnoliaAccessProvider" />
</WorkspaceSecurity>

Versioning

The version histories of all versionable nodes are stored in a repository-wide version store configured in the Versioning element of the repository configuration. The versioning configuration is much like workspace configuration as they’re both used by Jackrabbit for storing content. The main difference between versioning and workspace configuration is that no search index is specified for the version store. This is because version histories are indexed and searched using the repository-wide search index. Another difference is that there are no ${wsp.name} or ${wsp.path} variables for the versioning configuration. Instead, the native file system path of the version store is explicitly specified in the configuration.

<Versioning rootPath="${rep.home}/version">
    <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/workspaces/version" />
    </FileSystem>
    <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.H2PersistenceManager">
        <param name="url" value="jdbc:h2:${rep.home}/version/db;AUTO_SERVER=TRUE" />
        <param name="schemaObjectPrefix" value="version_" />
    </PersistenceManager>
</Versioning>

Feedback

DX Core

Repository configuration

Configuration file

Datasources

Database

JNDI

Embedded

File system

Security

Data store

File data store

Database data store

S3 data store

Workspaces

Workspace

File system

Persistence manager

Search index

Workspace security

Versioning

Location

Main doc sections