Magnolia JCR failure with false status

Magnolia fails to connect to its database with an exception.

Symptoms

Exception

When Magnolia fails to connect to the database, an exception is thrown much like the following:

info.magnolia.repository.RepositoryNotInitializedException: org.apache.jackrabbit.core.data.DataStoreException: Can not init data store, driver=org.postgresql.Driver url=null user=null schemaObjectPrefix=ds_ tableSQL=datastore createTableSQL=CREATE TABLE ds_datastore(ID VARCHAR(255) PRIMARY KEY, LENGTH BIGINT, LAST_MODIFIED BIGINT, DATA BYTEA)
...
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (The connection attempt failed.)
...
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
...
Caused by: java.net.SocketTimeoutException: connect timed out

When Tomcat shuts down Magnolia, you see:

SEVERE org.apache.catalina.core.StandardContext startInternal One or more listeners failed to start. Full details will be found in the appropriate container log file

Tomcat receives 404

Requests sent to Tomcat receive a 404 since Magnolia is no longer running.

Requests sent to the Tomcat Magnolia REST service may work if they do not use JCR. The REST status endpoint, /.rest/status, may return a normal HTTP 200 response indicating the pod is okay.

If the REST status endpoint is used as the health path (e.g., magnoliaPublic.livenessProbe.path = /.rest/status or magnoliaAuthor.livenessProbe.path = /.rest/status) the Magnolia pod passes its health and readiness probes.

The REST status endpoint is the default value in the Helm chart.

Observations

The primary issue is that the pod may appear to be running normally despite Magnolia shutting down due to the exception and hence, no errors are reported in the container logs.

When does this happen?

The failure occurs randomly during Kubernetes upgrades. Because of this, both author and public instances are affected.

Workaround

We are working on fixing this as it is considered a bug. As of 2024-05-03, it has not been fixed, so you should consider the workaround here.

  1. Delete the affected Magnolia instance. You can do this via Rancher or using kubectl commands.

    Magnolia StatefulSet recreates the Magnolia pod.

  2. Check the Magnolia logs to ensure the problem is not reoccurring.

    You can do this in Magnolia using the Log Tools app.

Live traffic

If the Magnolia instance is serving life traffic, the 404 response may be cached.

  • If you’re using Fastly, flush the CDN cache.

  • If you’re using a different CDN, you’ll need to flush the cache as per your CDN instructions.

Feedback

PaaS

×

Location

This widget lets you know where you are on the docs site.

You are currently perusing through the DX Cloud PaaS docs.

Main doc sections

DX Core Headless PaaS Legacy Cloud Incubator modules