LinkAuditor

The info.magnolia.services.seo.audit.impl.LinkAuditor will find links in a rendered HTML page and check if they are accessible. The URLs contained in HTML anchor, link and, IMG elements are extracted and checked. Other URLs, such as URLs contained in JavaScript functions won’t be detected and so won’t be checked.

Checking a large number of links can be time consuming, you may want to use the excludedLinks property to ignore some links or run the auditor only when necessary.
Class

info.magnolia.services.seo.audit.impl.LinkAuditor

Properties

In addition to the common auditor properties, this auditor can be configured with the following properties:

Property Description

level

required

Determines how a failed audit will be counted:

  • Error (auditErrors)

  • Warning (auditWarnings)

  • Note (auditNotes)

auditProperty

required

Defines the property name for storing failed audit results.

The property name should be unique among auditors or auditors may overwrite results.

fetcher

required

Added in v5.6.2.

Defines the content fetcher for the selected node. The fetched content is then scanned for links.

There are two types of content fetchers available, for more about content fetchers, see: Content fetchers.

passedProperty

optional

Defines the property name for storing valid links.

The property name should be unique among auditors or auditors may overwrite results.

rootUrl

required

Defines the base URL to be used when checking relative links. Relative links will be appended to the base URL and then checked, so the base URL should not end with a slash.

targets

optional

Added in v5.6.2.

Defines a list of credentials (host, port, user name and password) when testing links. The credentials will be added to each request and will be available to preemptive basic authentication.

Accessing pages on the standard Magnolia author instance require authentication. You should configure credentials for accessing your local Magnolia author instance and any other sites that will require credentials.

excludedLinks

optional

Defines one or more patterns of URLs to be ignored as Java regular expression. You can define one or more regular expressions. If no regular expressions are defined, then all links will be checked.

validStatuses

optional

Defines the expected HTTP status codes for the link to be considered valid. If not set, then the list of valid status codes is: 200 (SC_OK).

pauseTime

optional

Defines a delay (in milliseconds) between checking links. You can set this property to a non-zero value to avoid flooding a server with HTTP requests. If not set, then the pause time will be 0 (no delay between requests).

validateUrl

optional

Added in v5.6.2.

Validate that each link is syntactically correct before attempting to test it. If the link is not correct, it will not be retrieved.

The default value of validateUrl is true.

allow2Slashes

optional

Added in v5.6.2.

Allow two slashes ("//") in URLs when validating URLs.

The default value of allow2Slashes is true.

allowAllSchemes

optional

Added in v5.6.2.

Allow any scheme in a URL when validating URLs. If set to false, either the values of schemes will be used, or if schemes is not configured, only http and https schemes will be accepted when validating a URL.

The default value of allowAllSchemes is true.

allowLocalUrls

optional

Added in v5.6.2.

Allow relative URLs when validating URLs.

The default value of allowLocalUrls is true.

allowFragments

optional

Added in v5.6.2.

Allow fragments in URLs when validating URLs.

The default value of allowFragments is true.

schemes

optional

Added in v5.6.2.

Specifies the allowed schemes when validating URLs, will only be used if allowAllSchemes is set to false. If not configured, only http and https schemes will be accepted when validating a URL.

Configuring credentials

The targets nodes allows you to define one or more credentials that will be added to each request used to check a link. You will probably have to define credentials for accessing links to your Magnolia instance if it is an author instance; pages and resources are protected with basic authentication.

The credentials will be added by host, so more than one credentials can be added.

Here is how to configure target credentials within your LinkAuditor configuration:

Property Description

targets

     <target credentials name>

required

The name of the credentials.

             class

required

The credentials class name, should be info.magnolia.services.seo.audit.impl.HostTarget.

             host

required

The host or domain name for the credentials.

             port

required

The port.

             scheme

optional

The scheme (e.g. http, https, …​)

The default value for scheme is http.

             user

required

The user name.

             password

required

The user’s password.

             preemptive

optional

If true the credentials will be added to requests with preemptive basic authentication.

The default value for preemptive is true.

Example

Here is an example from the SEO module. You can find this configuration here: /modules/seo/config/auditManager/auditors/deadLinks.

deadLinks:
  active: true
  class: info.magnolia.services.seo.audit.impl.LinkAuditor
  description: Check for dead links (pre-prod, prod)
  level: auditWarnings
  rootUrl: http://localhost:8080
  targets:
    localhost:
      class: info.magnolia.services.seo.audit.impl.HostTarget
      host: localhost
      password: superuser
      port: 8080
      scheme: http
      user: superuser
  fetcher:
    class: info.magnolia.services.seo.audit.impl.RequestFetcher
    targets:
      localhost:
        class: info.magnolia.services.seo.audit.impl.HostTarget
        host: localhost
        password: superuser
        port: 8080
        scheme: http
        user: superuser
Feedback

Incubators

×

Location

This widget lets you know where you are on the docs site.

You are currently perusing through the SEO module docs.

Main doc sections

DX Core Headless PaaS Legacy Cloud Incubator modules