Workspace configuration
Once a workspace is created, you can adjust the workspace configuration on a workspace-by-workspace basis.
For each new workspace, there is a corresponding workspace.xml
file for fine-tuning individual performance.
By default, the file is located in the file system inside the corresponding workspace folder.
To modify the configuration of an existing workspace, you need to change the workspace.xml file for that workspace.
Changing the <Workspace/> element in the repository configuration file does not affect existing workspaces.
|
File system
The virtual file system passed to the persistence manager and search index.
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository" />
</FileSystem>
Jackrabbit provides a lot of choices for how you can configure the FileSystem
.
Choose the class that best fits your use case.
Persistence manager
Each workspace in a Jackrabbit content repository uses separate persistence managers to store the content in that workspace.
<PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.DerbyPersistenceManager">
<param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
</PersistenceManager>
Jackrabbit provides a lot of choices for how you can configure the PersistenceManager
. Choose the class that best fits your use case.
Search index
Node names and property values are indexed as soon as the data is saved or as soon as the transaction is committed.
Text extraction is done asynchronously in a background thread. That means text that’s changed or added isn’t available immediately, but rather after a short delay. You can configure the exact behaviour using the extractor settings.
Jackrabbit provides the following options in the class SearchIndex
.
All parameters (except path
) have default values and you can omit them and use the default value instead.
See Jackrabbit Search for more details. |
Basic configuration
Parameter | Description |
---|---|
|
required The location of the index directory.
A reasonable value is: |
|
optional When not set, all properties of a node are indexed. Magnolia provides a default indexing configuration file located in the Core module.
|
|
optional The name of the class that implements |
|
optional Sets the default analyzer in use for indexing.
The default value is the This analyzer uses an English-language stop word set. Lucene provides language-specific analyzers, which you can configure property-by-property in the indexing configuration file. |
|
optional The name of the class that implements |
|
optional Indicates whether the |
Performance
You can tune indexing performance with the following parameters.
For more performance ideas, see Improve Indexing Speed in the Lucene documentation. |
Parameter | Description |
---|---|
|
optional All files belonging to a segment have the same name with varying extensions.
When using the Compound File format, these files are collapsed into a single |
|
optional This setting no longer exists in Lucene 3.x. |
|
optional The Lucene indexer doesn’t write changes to the permanent index immediately. At first, the indexer writes the changes to a volatile index. Once the volatile index reaches a certain size, it’s persisted to the permanent index. Also there is the option to set a timer, in seconds, to control how often changes are written. |
|
optional While merging segments, Lucene ensures that no segment with more than |
|
optional This value tells Lucene how many documents to store in memory before writing them to the disk, as well as how often to merge multiple segments together. With the default value of 10, Lucene stores 10 documents in memory before writing them to a single segment on the disk. |
|
optional Deprecated in Lucene 3.x. |
|
optional Maximum number of documents that are held in a pending queue until added to the index. |
|
optional Size of the document number cache. This cache maps UUIDs to Lucene document numbers. If the doc number cache hits are poor, then increasing this number could help. |
|
optional The maximum volatile index size in bytes until it’s written to disk. The default value is 1MB. |
|
optional The maximum age (in seconds) of the index history. The default value is 0, which means that index commits are deleted as soon as they’re not used anymore. |
|
optional With the default value of When set to |
Consistency
Repository consistency settings are covered in more detail in the Troubleshooting section.
Parameter | Description |
---|---|
|
optional Runs a consistency check on every startup. If |
|
optional Errors detected by a consistency check are automatically repaired.
If |
|
optional If set to If set to |
|
optional The name of the class that implements A redo log keeps track of changes that haven’t been committed to disk. While nodes are added and removed from the volatile index (held in memory), a redo log is maintained to keep track of the changes. If the Jackrabbit process terminates unexpectedly, the redo log is applied when Jackrabbit is restarted the next time. The default value is |
Search
Parameter | Description |
---|---|
|
optional Class used to perform JCR Queries.
|
|
optional If |
|
optional The number of results the query handler should initially fetch when a query is executed. Keep in mind that ACL checks must be performed on the result set. The larger the set, the more time to load and check. |
|
optional An |
Extraction
Parameter | Description |
---|---|
|
optional Defines the maximum number of background threads that are used to extract text from binary properties.
If set to |
|
optional A text extractor is executed using a background thread if it doesn’t finish within this timeout (defined in milliseconds).
This parameter has no effect if |
|
optional The size of the extractor pool back log. If all threads in the pool are busy, incoming work is put into a wait queue. If the wait queue reaches the back log size, incoming extractor work isn’t queued anymore but is executed with the current thread. |
|
optional Positive values are used as they are, negative values are interpreted as factors of the |
|
optional Java command used to fork external parser processes, or |
Term identification
You can configure the Lucene index to provide excerpts and highlighting in the search results.
For example, the workspace.xml
file in each workspace enables highlighting in search results.
The workspace.xml
files are in /<CATALINA_HOME>/webapps/<contextPath>/repositories/magnolia/workspaces/<workspace name>
.
Belowβs the relevant extract from workspace.xml
in the contacts
workspace.
<!-- needed to highlight the searched term -->
<param name="supportHighlighting" value="true"/>
<!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() -->
<param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/>
If you have configured your own app
that operates on its own workspace and provides content for the website,
you need to add these parameters to the |
If you have used fields which allow for the storing of HTML, then that HTML will be indexed along with content. There is potential for the excerpt to contain HTML tags which are not closed. |
Parameter | Description |
---|---|
|
optional If set to |
|
optional The name of the class that implements |
Parsing
Parameter | Description |
---|---|
|
optional Deprecated in Jackrabbit 2.x.
With Jackrabbit 2.x, Apache Tika was introduced as the default binaries parser.
By default, Jackrabbit comes with a default |
|
optional Sets the location of the See Configuring Tika for some example configurations, such as using the |
Synonym provider
This allows users to use generalized language-dependent synonyms and, more importantly, domain-specific synonyms like abbreviations or product names.
Parameter | Description |
---|---|
|
optional The name of a class that implements |
|
optional The path to the synonym provider configuration file.
This path interpreted relative to the |
Spellchecking
Parameter | Description |
---|---|
|
optional The name of a class that implements |
Scoring
Parameter | Description |
---|---|
|
optional The name of a class that extends |
Workspace security
Workspace security is handled by the MagnoliaAccessProvider
.
See the workspace security section for more details. |
Synchronize workspaces between Magnolia instances
When using Magnolia, you often store content in a variety of workspaces.
Typically, workspaces are kept under your magnolia.repositories.home
in the WEB-INF/config/default/magnolia.properties
file.
The Content Types module creates node types, workspaces, and namespaces on-the-fly.
If using, make sure your repository configuration and workspaces are properly synchronized as this on-the-fly feature makes changes to repository configuration files.
The following should be considered when creating a new content type:
π repo |
π magnolia |
π repository |
π datastore |
π meta |
βΈ¬ rootUUID |
π namespaces |
⬩ ns_idx.properties |
⬩ ns_reg.properties |
π nodetypes |
⬩ custom_nodetypes.xml |
⬩ db.mv.db |
π workspaces |
π config |
⬩ db.mv.db |
⬩ workspace.xml |
Item | Notes | ||
---|---|---|---|
Namespace definitions |
Found in the Copy your custom namespace registry and index to the target environment to synchronize them. |
||
Node type definitions |
Custom node type definitions are stored in the
|
||
Workspace configuration |
Your workspace configuration stores the detailed workspace configuration in the
|
||
Index and lock |
You can remove all files and folder under the Why is this important?
This ensures repository consistency and cleans up all unsynchronized indexes. For content synchronization, this folder should not be copied over different instances. It need to be cleaned up in the target instance instead. |
||
The actual content |
The actual content is typically stored in your configured database tables with the name prefix according to "schemaObjectPrefix" name where For example
|