Repository configuration
The repository configuration file specifies global options like security, data sources, and versioning.
A default workspace configuration template is also included as part of the repository configuration file.
For each workspace there will be a workspace.xml
file inside the workspace home directory. This allows
for specific configuration to be applied on a per workspace basis.
Data sources
One or more datasource configurations can be created within the DataSources element. It’s common to configure one DataSource per repository and each workspace will use the exact same data source connection. Technically any workspace could use a different configuration. Some workspaces could connected to an embedded datasource while others connect to a database datasource. There is a lot of flexibility.
Database
The configuration options are outlined in the API here
DataSourceConfig
.
<DataSources>
<DataSource name="magnolia">
<param name="driver" value="com.mysql.jdbc.Driver" /> (1)
<param name="url" value="jdbc:mysql://localhost:3306/magnolia" /> (2)
<param name="user" value="root" />
<param name="password" value="password" /> (3)
<param name="databaseType" value="mysql"/> (4)
<param name="validationQuery" value="select 1"/> (5)
</DataSource>
</DataSources>
Item | Parameter | Description |
---|---|---|
1 |
|
required Depending which database you choose to work with make sure to include the jar with the appropriate driver in the classpath. |
2 |
|
required The connection url for the database. |
3 |
|
required The password associated with the schema. |
4 |
|
optional Use the appropriate setting for the database.
|
5 |
|
optional The SQL query that will be used to validate connections from this pool before returning them to the caller. The query depends on the database type.
|
not shown in sample |
|
optional Restrict the number of connections in the pool to a max value. |
JNDI
Jackrabbit supports JNDI data sources. The container you use will determine how you setup your JNDI data source.
Note that you can use data sources from JNDI in both the configuration of the data sources and in configuration of the components. Jackrabbit will use these JNDI data sources as-is and will not wrap pools around them. See ConnectionPooling. |
-
Tomcat: JNDI Resources HOW-TO Tomcat 9.0.
-
JBoss: JBoss AS JNDI Datasource Setup.
-
WebLogic: JNDI for Oracle WebLogic Server.
-
WebSphere: JNDI namespace bindings for WebSphere.
File system
The virtual file system used by the repository to store things like registered namespaces and node types.
The configuration options are outlined in the API here FileSystem.
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository" />
</FileSystem>
Jackrabbit provides a lot of choices for how you can configure the file system. Choose the class that best fits your use case and click the link to see your implmentation specific configuration options.
-
org.apache.jackrabbit.core.fs.local.LocalFileSystem: The file system will be created in the location specified by magnolia.repositories.home.
-
org.apache.jackrabbit.core.fs.db.DbFileSystem: Persists file system entries in a database table.
-
org.apache.jackrabbit.core.fs.db.DerbyFileSystem: Persists file system entries in an embedded Derby database.
-
org.apache.jackrabbit.core.fs.db.MSSqlFileSystem: Persists file system entries in an MS SQL database.
-
org.apache.jackrabbit.core.fs.db.OracleFileSystem: Persists file system entries in an Oracle database.
-
org.apache.jackrabbit.core.fs.mem.MemoryFileSystem: An in-memory file system implementation (can be useful for development).
Security
The security configuration element is used to specify authentication and authorization settings for the repository.
<Security appName="magnolia"> <SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager"/>
<AccessManager class="org.apache.jackrabbit.core.security.DefaultAccessManager"></AccessManager>
<!--
Login module defined here is used by the repo to authenticate every request.
Not by the webapp to authenticate user against the webapp context
(this one has to be passed before thing here gets invoked).
-->
<LoginModule class="info.magnolia.jaas.sp.jcr.JackrabbitAuthenticationModule"></LoginModule>
</Security>
Jackrabbit uses the Java Authentication and Authorization Service (JAAS)
to authenticate users who try to access the repository. The appName
parameter in the <Security/>
element is used as the JAAS application name of the repository.
Once a user has been authenticated, Jackrabbit will use the configured AccessManager to control what parts of the repository content the user is allowed to access and modify.
The slightly more advanced SimpleJBossAccessManager
class is designed for use with the JBoss Application Server, where it maps JBoss roles to Jackrabbit permissions.
|
Data store
The data store is optionally used to store large binary values. Normally all node and property data is stored in a persistence manager, but for large binaries such as files special treatment can improve performance and reduce disk usage.
The main features of the data store are:
-
Space saving: only one copy per unique object it kept
-
Fast copy: only the identifier is copied
-
Storing and reading does not block others
-
Multiple repositories can use the same data store
-
Objects in the data store are immutable
-
Garbage collection is used to purge unused objects
-
Hot backup is supported
File data store
The file data store stores each binary in a file. The file name is the hash code of the content. When reading, the data is streamed directly from the file (no local or temporary copy of the file is created). The file data store does not use any local cache, that means content is directly read from the files as needed. New content is first stored in a temporary file, and later renamed / moved to the right place.
The configuration options are outlined in the API here FileDataStore
.
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
<param name="path" value="${rep.home}/repository/datastore"/>
<param name="minRecordLength" value="1024"/> <!-- default is 100 bytes -->
</DataStore>
Database data store
The database data store keeps data in a relational database. All content is stored in one table, the unique key of the table is the hash code of the content. When reading, the data may be first copied to a temporary file on the server, or streamed directly from the database (depending on the copyWhenReading setting). New content is first stored in the table under a unique temporary identifier, and later the key is updated to the hash of the content.
MySQL does not support sending very large binaries from the JDBC driver to the database. Therefore a database data store should be avoided when using MySQL. See data store limitations. |
The configuration options available are outlined in the API here
DbDataStore
.
<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
<param name="url" value="java:jboss/datasources/jackrabbit"/> <!-- JNDI Datasource example -->
<param name="driver" value="javax.naming.InitialContext"/>
<param name="databaseType" value="oracle"/>
<param name="schemaObjectPrefix" value="repo_ds" />
</DataStore>
S3 data store
This can be helpful for Docker-like deployments. In this type of deployment, it’s common to move as much of the content and configuration out of your file system as possible (by default the datastore stores the binaries on the FS). Performance when handling big files should not be affected by using S3.
The S3 data store requires the Amazon WebServices Extension from Jackrabbit.
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-aws-ext</artifactId>
</dependency>
Jackrabbit also requires an AWS properties file to specifiy information about your AWS S3 account. See
aws.properties
for a sample.
S3DataStore
class extends CachingDataStore
.
This means by default the disk space will be still used to store the cached items. The location can be specified by the path
parameter. Otherwise the repository folder will be used.
<DataStore class="org.apache.jackrabbit.aws.ext.ds.S3DataStore">
<param name="config" value="/path/to/your/aws.properties"/>
<param name="path" value="/path/where/you/want/the/s3/cache/stored"/>
<!-- The cache can be disabled so the items will always be read from the S3 and no FS is needed. -->
<param name="cacheSize" value="0"/> <!-- Setting cacheSize to 0 disables the cache. -->
</DataStore>
Workspaces
The <Workspaces/>
element of the repository configuration specifies where and how the workspaces are managed.
The configuration of this element gets stored in the class
RepositoryConfig
.
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />
The following global workspace configuration options are available.
Parameter | Description |
---|---|
|
required The native file system directory for workspaces. A subdirectory is automatically created for each workspace, and the path of that subdirectory can be used in the workspace configuration as the |
|
required Name of the default workspace. This workspace is automatically created when the repository is first started. |
|
optional By default the configuration of each workspace is stored in a |
|
optional By default Jackrabbit only releases resources associated with an opened workspace when the entire repository is closed. This option, if specified, sets the maximum number of seconds that a workspace can remain unused before the workspace is automatically closed. |
Workspace
The configuration specified in the Workspace
element becomes the template for all workspaces created by Jackrabbit. Each workspace will have its own workspace.xml
file generated from this template.
File system
Workspace level virtual file system passed to the persistence manager and search index. The same configuration options are available here as described above for the repository level virtual file system.
Persistence manager
The PM is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Property values are also stored in the persistence manager, with the exception of large binary values (those are usually kept in the DataStore). Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace.
<PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager">
<param name="dataSourceName" value="magnolia"/> <param name="schemaObjectPrefix" value="pm_${wsp.name}_" />
</PersistenceManager>
Jackrabbit provides a lot of choices for how you can configure the PersistenceManager
. Choose the class that best fits your use case and click the link to see your configuration options.
All BundlePersistenceManager implementations that do not use a pool of JDBC connections have been marked as deprecated. Replace them with the pooled version.
|
-
org.apache.jackrabbit.core.persistence.pool.BundleDbPersistenceManager
-
org.apache.jackrabbit.core.persistence.pool.DerbyPersistenceManager
-
org.apache.jackrabbit.core.persistence.pool.H2PersistenceManager
-
org.apache.jackrabbit.core.persistence.pool.MSSqlPersistenceManager
-
org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager
-
org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager
-
org.apache.jackrabbit.core.persistence.pool.PostgreSQLPersistenceManager
-
org.apache.jackrabbit.core.persistence.mem.InMemPersistenceManager (can be useful for development)
Search index
The search index in Jackrabbit is pluggable and has a default implementation based on Apache Lucene. It is configured in the file workspace.xml
once the workspace is created.
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index" /> <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
<param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
<param name="useCompoundFile" value="true" /> <param name="minMergeDocs" value="100" />
<param name="volatileIdleTime" value="3" /> <param name="maxMergeDocs" value="100000" />
<param name="mergeFactor" value="10" /> <param name="maxFieldLength" value="10000" />
<param name="bufferSize" value="10" /> <param name="cacheSize" value="1000" />
<param name="forceConsistencyCheck" value="false" />
<param name="autoRepair" value="true" />
<param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" />
<param name="respectDocumentOrder" value="true" />
<param name="resultFetchSize" value="100" />
<param name="extractorPoolSize" value="3" />
<param name="extractorTimeout" value="100" />
<param name="extractorBackLogSize" value="100" /> <!-- needed to highlight the searched term -->
<param name="supportHighlighting" value="true"/> <!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() -->
<param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/>
</SearchIndex>
Workspace security
Workspace security is handled by the class MagnoliaAccessProvider
. It is a Magnolia specific ACL provider. This class will compile the set of
permissions a user has for a given workspace. If the user does not have any permissions for the workspace then
root-only access
is returned. If the user is detected as admin or superuser, which is checked first, then an implementation of
CompiledPermissions
that grants everything is returned.
The class MagnoliaAccessProvider has DEBUG output available that can be switched on using the
Log Tools app.
|
<WorkspaceSecurity>
<AccessControlProvider class="info.magnolia.cms.core.MagnoliaAccessProvider" />
</WorkspaceSecurity>
Versioning
The version histories of all versionable nodes are stored in a repository-wide version store configured in the Versioning element of the repository
configuration. The versioning configuration is much like workspace configuration as they are both used by Jackrabbit for storing content. The main
difference between versioning and workspace configuration is that no search index is specified for the version store as version histories are indexed
and searched using the repository-wide search index. Another difference is that there are no ${wsp.name}
or ${wsp.path}
variables for the versioning
configuration. Instead the native file system path of the version store is explicitly specified in the configuration.
<Versioning rootPath="${rep.home}/version">
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/workspaces/version" />
</FileSystem>
<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.H2PersistenceManager">
<param name="url" value="jdbc:h2:${rep.home}/version/db;AUTO_SERVER=TRUE" />
<param name="schemaObjectPrefix" value="version_" />
</PersistenceManager>
</Versioning>