Cache core
Operations Bundled: Community edition
Edition | CE, Cloud |
---|---|
License |
|
Issues |
|
Maven site |
|
Latest |
6.0.2 |
Magnolia employs a web cache to store server responses so that future requests for the same content can be served faster. The chief benefit of using cache is a reduction in information load on the network, meaning less bandwidth used, reduced processing and improved responsiveness.
Installing with Maven
Maven is the easiest way to install the module. Add the following to your bundle:
<dependency>
<groupId>info.magnolia.cache</groupId>
<artifactId>magnolia-cache-core</artifactId>
<version>6.0.2</version> (1)
</dependency>
1 | Should you need to specify the module version, do it using <version> . |
Disabling
Although the Cache module is bundled as a separate module, it is
essential to Magnolia and many other modules depend on it. Don’t
uninstall the Cache module. If necessary, disable caching by adding an
|
Node name | Value |
---|---|
📁 server |
|
📁 filters |
|
📁 cache |
|
⬩ defaultContentCachingConfigurationName |
defaultPageCache |
⬩ class |
info.magnolia.module.cache.filter.CacheFilter |
⬩ enabled |
false |
How caching works
Caching is performed by the Cache filter, which is part of the standard Magnolia filter chain. When a request arrives to the Cache filter, the filter passes it to the browser cache policy.
-
Content not modified: If a client has the latest version of content (i.e. not modified), the browser cache policy instructs the filter to respond with
304 Not Modified
. -
Modified content: If content has been modified or does not exist in the cache, the filter passes the request to the server cache policy. Server cache policy then analyzes the request and replies with the expected behavior. The filter then invokes the appropriate executor. This mechanism allows you to add, remove and use executors by changing the current cache policy.
-
Content not available: If the content is not available, the filter passes the request on to Magnolia. On the return trip, the filter reads the content from the response and stores it in the cache store for future use. Flush policy is completely independent of this chain and reacts on content changes rather than content served.
Configuration
Cache configuration is defined in /modules/cache/config/contentCaching/
. Within each configuration you can define:
-
What to cache.
-
When to flush the cache.
-
What header data to pass to browsers.
-
Specific implementations of tasks.
Node name | Value |
---|---|
📁 modules |
|
📁 cache |
|
📁 config |
|
📁 contentCaching |
|
⸬ defaultPageCache |
|
⸬ uuid-key-mapping |
|
⸬ compression |
|
⸬ cacheFactory |
|
⬩ cacheResponseThreshold |
1000 |
The default value for cacheResponseThreshold is 500 . For more information and configuration details, see the Cache threshold subsection below.
|
To select one of the cache configurations, set the defaultContentCachingConfigurationName
parameter in the Cache filter. The chosen configuration is read into a JavaBean using the Node2Bean mechanism, making it dynamically available to your own module code.
Node name | Value |
---|---|
📁 server |
|
📁 filters |
|
📁 cache |
|
⬩ defaultContentCachingConfigurationName |
defaultPageCache |
⬩ class |
info.magnolia.module.cache.filter.CacheFilter |
Policy configuration
Caching behavior for each configuration is defined with policies.
Server cache policy
This policy defines whether the requested content should be cached or not. The decision to cache relies on voters, which are used whenever configuration values aren’t assigned at startup but depend on rules.
Voters evaluate a rule such as should content residing at this URL be cached and return a positive or negative response. By default, all content on public instances is cached except the AdminCentral UI at /.magnolia
. Server cache policy is configured in /modules/cache/config/contentCaching/defaultPageCache/cachePolicy
.
The default implementation (info.magnolia.module.cache.cachepolicy.Default) checks if the content exists in the cache store and requests caching if the content is not found.
The alternative class (info.magnolia.module.cache.cachepolicy.Never) instructs the cache not to store the generated content. Server-side re-caching of no-cache requests (shift reload) is configurable and set to false
by default.
Node name | Value |
---|---|
⸬ defaultPageCache |
|
⸬ cachePolicy |
|
⸬ shouldBypassVoters |
|
⸬ urls |
|
⸬ deny |
|
⸬ parameters |
|
⸬ whitelist |
|
⬩ 0 |
^utm_.*$ |
⸬ resources |
|
⬩ class |
info.magnolia.module.cache.cachepolicy.Default |
⬩ refreshOnNoCacheRequests |
false |
Whitelisting specific parameters
Since Magnolia 6.2.30, under ../shouldBypassVoters/deny/parameters/whitelist
, you can configure a key-value(regex) map of request parameters that should be ignored and hence their requests cached as a result.
As shown in the example above, the map configured there causes caching of all requests containing the utm_
* tracking tag(s), which could otherwise generate a high server load, for example when sending marketing mails or other material utilizing the utm_
* tracking tag(s).
Cache key generator
You can define a custom cache key generator or configure the default one to meet your needs.
📁 cache |
|
📁 config |
|
📁 contentCaching |
|
⸬ defaultPageCache |
|
⸬ cachePolicy |
|
⸬ shouldBypassVoters |
|
⸬ ttlVoters |
|
⸬ cacheKeyGenerator |
|
⬩ useRequestParameters |
false |
⬩ class |
info.magnolia.module.cache.cachekey.DefaultCacheKeyGenerator |
⬩ class |
info.magnolia.module.cache.cachepolicy.Default |
Attribute | Default | Description |
---|---|---|
|
|
Use URI as part of cache key. |
|
|
Use request query as part of cache key. |
|
|
Use request method (GET/POST/HEAD) as part of cache key. |
|
|
Use server name as part of cache key. |
|
|
Save information whether this request was made using a secure channel, such as HTTPS as part of cache key. |
|
|
Use user name as part of cache key. |
|
|
Use locale |
|
|
Use channel |
Client (browser) cache policy
Allows for different policies for different content types. Voters are used to define the caching rules.
These policies define how long the browser may cache each content type. The time is passed to the browser in the response header.
-
info.magnolia.module.cache.browsercachepolicy.FixedDuration instructs the browser to cache the content for the specified length of time in minutes.
-
info.magnolia.module.cache.browsercachepolicy.Never instructs the browser to do nothing.
Client cache policy is configured in /modules/cache/config/contentCaching/defaultPageCache/browserCachePolicy .
|
Cache-Control directives
Using the directives
property, you can also specify standard Cache-Control
directives.
The directives property is only applied to info.magnolia.module.cache.browsercachepolicy.FixedDuration.
|
Example configuration
Node name | Value |
---|---|
⸬ defaultPageCache |
|
⸬ browserCachePolicy |
|
⸬ policies |
|
⸬ disableBrowserCacheInDevelopMode |
|
⸬ voters |
|
⸬ developMode |
|
⬩ class |
info.magnolia.voting.voters.PropertyVoter |
⬩ property |
magnolia.develop |
⬩ value |
true |
⬩ class |
info.magnolia.module.cache.browsercachepolicy.FixedDuration |
⸬ farFuture |
|
⬩ class |
info.magnolia.module.cache.browsercachepolicy.FixedDuration |
⬩ expirationMinutes |
525600 |
⸬ voters |
|
⸬ resources |
|
⬩ class |
info.magnolia.module.cache.browsercachepolicy.FixedDuration |
⬩ expirationMinutes |
60 |
⸬ voters |
|
⬩ op |
OR |
⸬ dotResources |
|
⬩ class |
info.magnolia.voting.voters.URIStartsWithVoter |
⬩ pattern |
/.resources/ |
⸬ contentType |
|
⬩ class |
info.magnolia.voting.voters.RequestExtensionVoter |
⬩ emptyExtensionVote |
false |
⸬ allowed |
|
⬩ css |
css |
⬩ js |
js |
⸬ dontCachePages |
|
⬩ class |
info.magnolia.module.cache.browsercachepolicy.Never |
⸬ voters |
|
⸬ default |
|
⬩ class |
info.magnolia.module.cache.browsercachepolicy.FixedDuration |
⬩ directives |
public, max-age=604800, immutable |
⬩ expirationMinutes |
10 |
⬩ class |
info.magnolia.module.cache.browsercachepolicy.BrowserCachePolicySet |
Since Magnolia 6.2.34, the |
Flush policy
The flush policy defines when to flush the cache. There are three policies, two of which are installed by default.
Flush upon changes from content publication
The default policy is info.magnolia.module.cache.flushpolicy.FlushAllFromPublishingEventPolicy
, and you can set it via /modules/cache/config/contentCaching/defaultPageCache/flushPolicy/policies/flushAll@class
.
This configuration flushes the cache whenever a publishing event is completely finished.
The cache can only be flushed completely.
Each module can register its own flush policy (or multiple policies) and receive events in each workspace.
Flush policies are informed about notifications from the publication module.
By default, all workspaces are allowed to flush unless they’re listed under the excludedWorkspaces
subnode.
Alternatively, you can list the allowed workspaces per policy under the workspaces
subnode of each policy.
Flush upon changes in workspaces
This configuration observes changes (content publication, import, edit) in a workspace and flushes the cache if new or modified content is detected.
The cache can be flushed completely, partially or not at all.
Each module can register its own flush policy (or multiple policies) and receive notifications about new or modified content in each workspace.
Flush policies are informed about changes in observed workspaces.
By default, all workspaces trigger a cache flush unless they’re listed under the excludedWorkspaces
subnode.
Performance aspects To improve performance, list the observed workspaces per policy under the If your custom workspace doesn’t affect the content of pages, you should register it under |
To enable this configuration, set /modules/cache/config/contentCaching/defaultPageCache/flushPolicy/policies/flushAll@class
to info.magnolia.module.cache.FlushAllListeningPolicy
.
Flush upon changes in light modules
Magnolia
|
Executors
These are actions taken once a caching decision is made. There are three possible actions:
-
useCache
: Retrieves the cached item from the cache and streams it to the client. -
store
: Stores the response in the cache for future use. -
bypass
: Skips caching. This is useful for content that can’t or shouldn’t be cached.
Executors can be configured at /modules/cache/config/contentCaching/defaultPageCache/executors .
Each of the executors is also responsible for configuring expiration headers.
|
Node name | Value |
---|---|
⸬ defaultPageCache |
|
⸬ executors |
|
⸬ bypass |
|
⸬ store |
|
⸬ useCache |
Cacheable response codes
By default, all cacheable response codes are cached. For a full list, see RFC-7231: Overview of Status Codes and RFC 7233: 206 Partial Content.
Using the cacheableResponseCodes
property, you can define exactly which codes you want cached.
...
defaultPageCache:
executors:
store:
cacheContent:
cacheableResponseCodes: 200,300,400 (1)
...
1 | Where 200 , 300 , and 400 are cached only. |
Response headers in redirect requests
Setting cacheHeadersInRedirects
to true
allows headers to be included in the response of the redirect request (302
).
...
defaultPageCache:
executors:
store:
cacheContent:
cacheHeadersInRedirects: true (1)
...
1 | By default, the property is set to false to keep the behavior consistent with the previous versions. |
Whitelisting headers to be included in a cache key
Under allowedRequestHeaders
, you can list headers that can be included in a cache key.
...
defaultPageCache:
cachePolicy:
cacheKeyGenerator:
allowedRequestHeaders:
x-my-header: x-my-header (1)
...
1 | The x-my-header is allowed in this example, but the list is empty by default. |
Compression
Compression is a simple and effective way to save bandwidth and speed up your site. It’s a common practice used by Google and Yahoo! for example. (How to Optimize Your Site with GZIP Compression is a great general introduction to the topic).
Compression is performed in the gzip filter, configured in /server/filters/gzip
}. When a client requests a resource such as index.html
, Magnolia delivers it zipped. A typical HTML page is compressed to 20% of its original size. So if your page is 100 kB uncompressed, it’s 20 kB compressed. To improve performance further, zipped content is streamed from the repository to the client rather than read into memory first.
Configuring
You can configure which content types to compress. By default, the gzip filter bypasses compression for HTML, JavaScript and CSS because they’re explicitly selected for compression in the Cache module configuration. These types can be compressed efficiently because they’re text. The decision to compress a particular content type is made with voters. Voters are used whenever configuration values aren’t assigned at startup but depend on rules instead. In the Cache module configuration there are three voting rules based on content type:
-
text/html
: HTML. -
application/x-javascript
: JavaScript. -
text/css
: Cascading Style Sheets.
Node name | Value |
---|---|
📁 cache |
|
📁 config |
|
⸬ compression |
|
⸬ voters |
|
⸬ contentType |
|
⸬ allowed |
|
⬩ 1 |
text/html |
⬩ 2 |
application/x-javascript |
⬩ 3 |
text/css |
⬩ class |
info.magnolia.voting.voters.RequestExtensionVoter |
To add more content types, such as XML, create a numbered property under allowed
. Use the Internet media type (MIME type) as value. Here are some common media types:
-
application/xhtml-xml
: XHTML. -
text/csv
: Comma-separated value. -
text/plain
: Textual data. -
text/xml
: Extensible Markup Language. -
application/pdf
: Portable Document Format.
As a rule, compressing the HTML, JavaScript and CSS is sufficient; it’s not necessary to compress binary content such as images. During the process, the browser sends a header telling the server that it accepts compressed content: Accept-Encoding: gzip . Note that Magnolia doesn’t cache big binaries.
|
Testing compression
To test your compression configuration, use a tool such as Web-Sniffer that allows you to change the
Accept
, Encoding
and User-Agent
sent headers easily.
Here’s what the headers look like when the Magnolia demo site home page is submitted^ to the sniffer.
GET /demo-project.html HTTP/1.1 Host: demopublic.magnolia-cms.com
Connection: close User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1;
de; rv:1.9) Gecko/2008052906 Firefox/3.0 Accept-Encoding: gzip
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7 Cache-Control: no
Accept-Language: de,en;q=0.7,en-us;q=0.3 Referer: http://web-sniffer.net
Status: HTTP/1.1 200 OK Date: Fri, 23 Jul 2010 07:45:10 GMT Server:
Apache/2.2.9 X-Magnolia-Registration: Registered Cache-Control:
max-age=900 Last-Modified: Thu, 01 Jul 2010 14:03:12 GMT
Content-Encoding: gzip Vary: Accept-Encoding Content-Length: 3852
Connection: close Content-Type: text/html;charset=UTF-8
Cache threshold
Cache threshold is used to determine if a response should be cached or not according to its size. Responses greater than the size configured aren’t cached.
Configuring
Since Magnolia 6.2.31, you can configure the threshold value at /modules/cache/config@cacheResponseThreshold
. The value must be an integer and represents kilobytes.
The following is an example configuring the threshold to one megabyte.
Node name | Value |
---|---|
📁 modules |
|
📁 cache |
|
📁 config |
|
⸬ compression |
|
⸬ cacheFactory |
|
📁 contentCaching |
|
⬩ cacheResponseThreshold |
1000 |
Raising the threshold probably increases the amount of cached content, which uses additional system resources. To control usage, you can configure the cache to use a fixed amount of memory. See the For example, you can set |
You can set this value programatically, for example, in your custom renderer which does time-consuming operations: MgnlContext.getWebContext().getRequest().setAttribute(CacheResponseWrapper.ATTRIBUTE_IN_MEMORY_THRESHOLD, CacheResponseWrapper.DEFAULT_THRESHOLD * 2); You need to set this attribute before anything is written to the output. |
Advanced strategies
In DX Core, advanced caching strategies are available in separate Advanced Cache modules.
Commands
Cache related commands are in the cache
catalog:
-
flushAll
: Completely flushes all caches. (Note that the imaging workspace - which technically is not a cache - is not flushed with this command.) -
flushByUUID
: Completely flushes all entries related to a given UUID from all available caches. This command expectsrepository
anduuid
as parameters. -
flushNamedCache
: Completely flushes a cache by name. Default cache names aredefault
anduuid-key-mapping
.
Cached URLs
By default, the following URLs are cached:
-
On public instance everything except
/.magnolia/*
which is AdminCentral. -
On author instance all static resources
/.resources/*
ifmagnolia.develop
property is set tofalse
.
Cache strategies
The system caches resources such as JavaScript files and CSS files on
the author instance by default to make authoring more responsive.
Disable this behavior when developing. Set the magnolia.develop
property to true
in the default magnolia.properties
file. For more
complex configurations, you need to adjust the configuration under the
/modules/cache/config/contentCaching/defaultPageCache/cachePolicy
node
Excluding content from cache
There are various reasons why you may wish to exclude content from cache. For example, you may have components that query an external data source dynamically. The rendered HTML changes even if the content of the Magnolia page has not changed. When we say cached content we mean the rendered output generated by Magnolia itself, the actual content of the page. When you exclude a page from cache you tell Magnolia that it should re-render that content every time the page is requested by a user.
Configuring an exclusion
The first option for excluding content from cache is to configure an
exclusion in the cache policy. The example
below excludes all pages whose URL starts with /.magnolia
. This means
that AdminCentral pages are not cached.
Node name | Value |
---|---|
📁 cache |
|
📁 config |
|
📁 contentCaching |
|
⸬ cachePolicy |
|
⸬ shouldBypassVoters |
|
⸬ urls |
|
⸬ includes |
|
⸬ excludes |
|
⸬ dotMagnolia |
|
⬩ class |
info.magnolia.voting.voters.URIStartsWithVoter |
⬩ pattern |
/.magnolia |
⬩ not |
true |
⬩ level |
1 |
Implementing a custom cache policy
To create a custom cache policy, implement the info.magnolia.module.cache.CachePolicy interface and override the methods you wish to customize. The default policy:
determines if a requested page should be cached, retrieved from the cache or not cached at all. It is called for every request and takes care of any expiration policy, that is if the page should be re-cached. The CacheFilter (or any other client component) can determine its behavior based on the return CachePolicyResult, which holds both the behavior to take and the cache key to use when appropriate.
Configure your custom cache policy class in
/modules/cache/config/configuration/default/cachepolicy
.
Cache header negotiation
Cache header negotiation is a mechanism that allows templates and components to influence whether the content should be cached and for how long. This mechanism can be used when it is too late to configure an exclude, but you do not wish for a page to be cached. Excludes are typically configured before it is clear what kinds of pages editors will add and what kind of content those pages will have. Cache header negotiation allows page components to influence whether the page should be cached. You can use cache header negotiation for:
-
Live, dynamic data: A component that displays dynamic data can indicate that the page the component is, or should be, cached only for a short duration: 5 minutes, 1 minute, etc.
-
Personalized content. Components that display personalized content can indicate that the page should not be cached at all.
-
Error resolution: If a component fails to read data from an external source and outputs a message to say there is a problem, you may not want to cache the error message, at least not for long. Instead, it is better if the component makes another attempt at getting the data when the next visitor requests the page. When you configure caching in many components, remember that the strictest criteria wins. If a page has a component that indicates that content should not be cached, the page will not be cached regardless of what components say. For example, if the live data and personalization components mentioned above are added on the same page, the page won’t be cached at all. A template can influence the decision in the same way. A template might want to cache the page for 10 minutes because the page displays real-time weather updates. If a component on the page wants to be cached for a maximum of 15 minutes, the template’s instructions win because they are stricter. The page is cached for 10 minutes.
Other options
The following options are not best practices but they may help you during testing. Don’t use them as a long-term production strategy.
-
Dummy URL parameter: The simplest way to exclude content is to link to the relevant page with a dummy query parameter in the URL such as
http://www.example.com?a=1
. A more subtle solution is to addbypass
to the cache filter. This ensures that no cache filter is executed on particular URLs. -
Deny URLs in cache policy: To exclude a URL from caching, add the URL to the
deny
list of the cachePolicy. Entries on thedeny
list are not cached by Magnolia but are taken through the entire filter chain, meaning that other policies such asBrowserCachePolicy
can still be applied. In effect, this solution switches off caching for the URL in question. -
Regular cache flush: Flush a page from the cache at regular intervals. This involves reconfiguring the underlying cache engine.
Setting cache headers
Cache header negotiation uses standard cache response headers. Cache needs to be enabled for the site or cache headers have no effect on the server side cache. The mechanism is built into the Cache module and requires no extra modules. Cache header negotiation is being introduced to Magnolia in two phases:
In code
In code, developers set the cache headers in the rendering model of a component or a template. This is the current implementation.
Example 1: Setting a cache header in Java code, snippet from info.magnolia.module.form.templates.components.AbstractFormModel.
MgnlContext.getWebContext().getResponse().setHeader("Cache-Control", "no-cache");
Example 2: Setting a cache header in a FreeMarker template script.
${ctx.response.setHeader("Cache-Control", "no-cache")}
In page properties dialog
Since the release of Magnolia 6.2.2, the basic
page
template in the Magnolia Templating
Kit (MTK) allows you to control page caching through the No Cache
checkbox in the Page properties dialog (the Meta Data tab):
The checkbox is defined using the noCache
field in the
basic.yaml
dialog definition file. If checked, the
htmlHeader.ftl
area template script sets the header.
Security aspects
Web Cache Poisoning
When creating page templates, developers often need to use various types of input such as
-
content,
-
language,
-
request parameters,
-
user information (for authentication).
Magnolia takes care of all of these attributes and includes them in cache keys to ensure that unique cache entries are generated for all attribute variants occurring.
Sometimes, however, it may be unavoidable to use more exotic or less obvious variables such as specific header values, date/time values or other types of input. When using such parameters, developers must consider their impact on the validity of cache to prevent Web Cache Poisoning attacks.
Example
Imagine using an HTTP referer header to change a teaser image link on a page. Once the image is rendered, it is - by default - cached and used for all the other incoming requests,
-
as long as the cache entry is valid,
-
and regardless of referer headers of those requests.
Depending on the content rendered it might or might not be an issue, but a case of Web Cache Poisoning occurs if somebody passes malicious content through a referer header that is trusted by and rendered for all the other incoming requests.
Prevention
Always consider the character and the origin of the input entering the process of rendering a response. For any original request input,
-
either the input must be included in the cache key, to allow Magnolia to generate an independent cache entry from it,
-
or the component rendering such content needs to be marked as dynamic and excluded from the cache (see Dynamic page caching for more details),
-
or else the whole page needs to be excluded from the cache.