Clustering in Jackrabbit works as follows: content is shared between all cluster nodes.
That means all Jackrabbit cluster nodes need access to the same persistent storage (PersistenceManager, DataStore, and repository FileSystem).
The persistence manager must be clusterable (for example, a central database that allows for concurrent access).
Any DataStore (file or DB) is clusterable in its very nature, as it stores content using unique hash IDs.
However, each cluster node needs its own (private) repository directory, including the repository.xml file, workspace FileSystem and Search index.
Every change made by one cluster node is reported in a journal, which can be either file-based or written to a database.
Clustering requirements
In order to use clustering, the following prerequisites must be met:
Each cluster node:
must have its own repository configuration
needs its own (private) workspace level and version FileSystem (only those within the workspace and versioning configuration; the ones in the repository.xml and workspace.xml file)
needs its own (private) Search indexes
must be assigned a unique ID
must use the same (shared) journal
A DataStore must always be shared between nodes, if used.
The global repository FileSystem on the repository level must be shared (only the one that is on the same level as the DataStore; only in the repository.xml file).
A journal type must be chosen, either based on files or stored in a database.
The persistence managers must store their data in the same, globally accessible location.
The unclustered repositories use the embedded database (H2 database).
The clustered repository uses the MySQL database.
MySQL has been chosen as the persistence manager for the clustered repository since it supports concurrent access.
Any DataStore (filesystem or DB) is clusterable in its very nature, as they store content by unique hash IDs.
Prerequisites
The author and public instances must be installed and setup. Get a Magnolia bundle for this, see Installing Magnolia for more details.
The goal
For both instances, the repositories are created in the same parent folder.
The parent folder should be an external location (outside the webapp), making it easier to share the shared repository.
The repositories folder is the central place for the author, public, and shared repositories.
Key folders:
The author and publicmagnolia repositories.
A private search index for each instance - in the cluster folder.
The shared repository with its shared file system, data store, and a revision log.
The author and public are intended to be running in separate Tomcat instances on ports 8080 and 7070, respectively.
% mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.31 MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database magnolia_shared;
Query OK, 1 row affected (0.00 sec)Copy
Add the MySQL driver to the lib folder (for example /TOMCAT_HOME/WEBAPP_HOME/WEB-INF/lib) of both the instances, author and public.
See MySQL Connectors.
In this example, the entire setup is located on the same machine with everything in the same parent folder.
In a typical setup, each instance will most likely be located on different machines therefore you need to be sure the shared space can be accessed by all instances.
Create a repository configuration file for the clustering setup.
System properties will be used to set the path of the shared folder and the cluster id.
The system property approach will allow the cluster repo config file to be the same for both instances.
org.apache.jackrabbit.core.cluster.shared_folder
org.apache.jackrabbit.core.cluster.node_id
Add the system properties to the setenv.sh/bat file of each Tomcat instance.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE RepositoryPUBLIC"-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN""http://jackrabbit.apache.org/dtd/repository-2.0.dtd"><Repository><!-- Make sure you correctly configure this clustering configuration section, check especially the user and password values. --><DataSources><DataSourcename="magnolia"><paramname="driver"value="com.mysql.jdbc.Driver" /><paramname="url"value="jdbc:mysql://localhost:3306/magnolia_shared" /><paramname="user"value="root" /><paramname="password"value="magnolia" /><paramname="databaseType"value="mysql"/><paramname="validationQuery"value="select 1"/></DataSource></DataSources><!-- Make sure you correctly configure this clustering configuration section, check especially the user and password values. --><ClustersyncDelay="2000"><Journalclass="org.apache.jackrabbit.core.journal.DatabaseJournal"><!-- The revision log will be shared by both instances. Use the system property to set the path. --><paramname="revision"value="${org.apache.jackrabbit.core.cluster.shared_folder}/revision.log" /><!-- ********************************************************************************************--><paramname="driver"value="com.mysql.jdbc.Driver" /><paramname="url"value="jdbc:mysql://localhost:3306/magnolia_shared" /><paramname="user"value="root" />(1)<paramname="password"value="magnolia" />(1)<paramname="schema"value="mysql" /><paramname="schemaObjectPrefix"value="journal_" /></Journal></Cluster><!-- The repository level file system will be shared by both instances. Use the system property to set the path.--><FileSystemclass="org.apache.jackrabbit.core.fs.local.LocalFileSystem"><paramname="path"value="${org.apache.jackrabbit.core.cluster.shared_folder}/repository" /></FileSystem><!-- ******************************************************************* --><SecurityappName="magnolia"><SecurityManagerclass="org.apache.jackrabbit.core.DefaultSecurityManager"/><AccessManagerclass="org.apache.jackrabbit.core.security.DefaultAccessManager"></AccessManager><!-- login module defined here is used by the repo to authenticate every request. not by the webapp to authenticate user against the webapp context (this one has to be passed before thing here gets invoked --><LoginModuleclass="info.magnolia.jaas.sp.jcr.JackrabbitAuthenticationModule"></LoginModule></Security><!-- The repository level data store will be shared by both instances. Use the system property to set the path.--><DataStoreclass="org.apache.jackrabbit.core.data.FileDataStore"><paramname="path"value="${org.apache.jackrabbit.core.cluster.shared_folder}/repository/datastore"/><paramname="minRecordLength"value="1024"/></DataStore><!-- ***************************************************************** --><WorkspacesrootPath="${rep.home}/workspaces"defaultWorkspace="default" /><Workspacename="default"><FileSystemclass="org.apache.jackrabbit.core.fs.local.LocalFileSystem"><paramname="path"value="${wsp.home}/default" /></FileSystem><PersistenceManagerclass="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager"><paramname="dataSourceName"value="magnolia"/><paramname="schemaObjectPrefix"value="pm_${wsp.name}_" /></PersistenceManager><SearchIndexclass="info.magnolia.jackrabbit.lucene.SearchIndex"><paramname="path"value="${wsp.home}/index" /><!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home --><paramname="indexingConfiguration"value="/info/magnolia/jackrabbit/indexing_configuration_${wsp.name}.xml"/><paramname="useCompoundFile"value="true" /><paramname="minMergeDocs"value="100" /><paramname="volatileIdleTime"value="3" /><paramname="maxMergeDocs"value="100000" /><paramname="mergeFactor"value="10" /><paramname="maxFieldLength"value="10000" /><paramname="bufferSize"value="10" /><paramname="cacheSize"value="1000" /><paramname="forceConsistencyCheck"value="false" /><paramname="autoRepair"value="true" /><paramname="queryClass"value="org.apache.jackrabbit.core.query.QueryImpl" /><paramname="respectDocumentOrder"value="true" /><paramname="resultFetchSize"value="100" /><paramname="extractorPoolSize"value="3" /><paramname="extractorTimeout"value="100" /><paramname="extractorBackLogSize"value="100" /><!-- needed to highlight the searched term --><paramname="supportHighlighting"value="true"/><!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() --><paramname="excerptProviderClass"value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/></SearchIndex><WorkspaceSecurity><AccessControlProviderclass="info.magnolia.cms.core.MagnoliaAccessProvider" /></WorkspaceSecurity></Workspace><VersioningrootPath="${rep.home}/version"><FileSystemclass="org.apache.jackrabbit.core.fs.local.LocalFileSystem"><paramname="path"value="${rep.home}/workspaces/version" /></FileSystem><PersistenceManagerclass="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager"><paramname="dataSourceName"value="magnolia"/><paramname="schemaObjectPrefix"value="version_" /></PersistenceManager></Versioning></Repository>Copy
1
These manage the shared cluster repository user and password for both instances, author and public.
Make sure you have correct authentication values also in the DataSources section.
They should be in sync with those in the Cluster section.
The path WEB-INF/config/repo-conf/jackrabbit-bundle-mysql-search.xml is the same path used in the properties config (see further below) with the key magnolia.repositories.jackrabbit.cluster.config.
The cluster connection configuration to the MySQL database uses this configuration.
Add the clustered workspaces to WEB-INF/config/default/repository.xml.
The repository.xml file will need to be adjusted for the new clustered repository.
For this example, it will be the same for both author and public, where they share the comments workspace.
Both the name of the datasource and the reference to the repository in the repsoitory.xml for the shared Magnolia store need to be in sync.
Click to see the example
<JCR><!-- Already existing mapping configs. --><RepositoryMapping><Mapname="website"repositoryName="magnolia"workspaceName="website" /><Mapname="config"repositoryName="magnolia"workspaceName="config" /><Mapname="users"repositoryName="magnolia"workspaceName="users" /><Mapname="userroles"repositoryName="magnolia"workspaceName="userroles" /><Mapname="usergroups"repositoryName="magnolia"workspaceName="usergroups" /></RepositoryMapping><!-- This is the key update: you must configure a new repository mapping for the shared comments workspace. --><RepositoryMapping><Mapname="comments"repositoryName="cluster"workspaceName="comments" /></RepositoryMapping><!-- magnolia default repository --><Repositoryname="magnolia"provider="info.magnolia.jackrabbit.ProviderImpl"loadOnStartup="true"><paramname="configFile"value="${magnolia.repositories.jackrabbit.config}" /><paramname="repositoryHome"value="${magnolia.repositories.home}/magnolia" /><!-- the default node types are loaded automatically
<param name="customNodeTypes" value="WEB-INF/config/repo-conf/nodetypes/magnolia_nodetypes.xml" />
--><paramname="contextFactoryClass"value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" /><paramname="providerURL"value="localhost" /><paramname="bindName"value="${magnolia.webapp}" /><workspacename="website" /><workspacename="config" /><workspacename="users" /><workspacename="userroles" /><workspacename="usergroups" /></Repository><!-- magnolia cluster repository --><Repositoryname="cluster"provider="info.magnolia.jackrabbit.ProviderImpl"loadOnStartup="true"><paramname="configFile"value="${magnolia.repositories.jackrabbit.cluster.config}" /><paramname="repositoryHome"value="${magnolia.repositories.cluster}" /><!-- the default node types are loaded automatically
<param name="customNodeTypes" value="WEB-INF/config/repo-conf/nodetypes/magnolia_nodetypes.xml" />
--><paramname="contextFactoryClass"value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" /><paramname="providerURL"value="localhost" /><paramname="bindName"value="cluster-${magnolia.webapp}" /><!-- since forum module has been deprecated, we switch to contacts module for demonstration. --><!-- <workspace name="forum" /> --><workspacename="comments" /></Repository></JCR>Copy
Configure the properties files.
Some of the properties configuration will differ between the instances.
The author instance uses the magnoliaAuthor context while the public instance uses the magnoliaPublic context.
For the sake of this clustering example, let’s reconfigure the repository creation to be located centrally.
This will allow for a better overview of what is shared vs what is private.
The shared properties will go into the default properties file.
By making use of the system properties, the clustering config is a shared configuration.
The paths to the magnolia.repositories.home in those magnoliaAuthor and magnoliaPublic properties files should be in sync with the file structures (<path-to-cluster-example>/cluster-example/magnolia-dx-core/repositories/author and <path-to-cluster-example>/cluster-example/magnolia-dx-core/repositories/public) because the repositories are managed outside the Tomcat bundles.