ActivityPub is a standard for publishing structured social network data on the Web in JSON-LD format. This document describes various methods for discovering the ActivityPub object described by an HTML page, and conversely the HTML page for an ActivityPub object.

This is a draft of the Social Web Incubator Group (SocialCG) Discovery Task Force.

Introduction

...

Unless otherwise specified, the techniques described below can be used with any Activity Streams 2.0 types. The best-defined groups of AS2 types for HTML discovery are actor types:

and digital content types:

Motivating user stories

These are some of the user stories

Example URL formats

This document uses a consistent format for example URLs:

        https://{name}.example/{path}/{type}-{ordinal}{?ext}
      

Where:

Discovering equivalent ActivityPub objects from HTML

Content negotiation

Content negotiation is a catch-all term for ways of negotiating the representation of a resource through the HTTP protocol. In this document, it will specifically cover proactive negotiation using the Accept header.

Given the URL for an HTML document, such as https://mixed.example/some/path/to/note-1, a consumer could attempt to retrieve the corresponding ActivityPub JSON-LD object using this HTTP request:

        GET /some/path/to/note-1 HTTP/1.1
        Host: mixed.example
        Accept: application/activity+json; application/ld+json; application/json
      

A compliant server may respond with the ActivityPub JSON-LD object in the body of the response:

        HTTP/1.1 200 OK
        Content-Type: application/activity+json

        {
          "@context": "https://www.w3.org/ns/activitystreams",
          "id": "https://mixed.example/some/path/to/note-1",
          "type": "Article",
          "content": "This is a note."
        }
      

This is typically used when the ActivityPub server and the HTML server are implemented in the same software package. Because this has historically been the case for many implementations, some consumers expect this behavior to be the default.

Alternately, the server may respond with a 308 Permanent Redirect to indicate the location of the JSON-LD representation.

        HTTP/1.1 308 Permanent Redirect
        Location: https://mixed.example/different/path/to/note-1.jsonld
      

Content negotiation failure

If the server does not support content negotiation, it may respond with a 406 Not Acceptable status code.

          HTTP/1.1 406 Not Acceptable
          Content-Type: text/plain

          No representation matching this request could be found.
        

Less compliant servers may ignore the Accept header altogether and return the HTML content regardless:

          HTTP/1.1 200 OK
          Content-Type: text/html

          <html>
          <head>
          <title>Article 1</title>
          </head>
          <body>
          <p>This is a note.</p>
        

A more difficult failure mode to detect arises when the server does not support ActivityPub, but does support content negotiation for another JSON format. Such a server returns a 200 OK status code with a JSON object that does not use JSON-LD, or JSON-LD object that does not use the Activity Streams 2.0 vocabulary:

          HTTP/1.1 200 OK
          Content-Type: application/json

          {
            "property": "value",
            "otherProperty": "otherValue"
          }
        

HTTP Link header

The HTTP Link header can be used to indicate an alternative representation of a resource. A consumer can use this header to discover the ActivityPub JSON-LD object for an HTML page.

Given the URL for an HTML document, such as https://html.example/user/test1/article-1, the consumer can use a HTTP HEAD request to get the headers for the resource, which will hopefully include the Link header:

        HEAD /user/test1/article-1 HTTP/1.1
        Host: html.example
      

A compliant server will respond with the headers for the resource:

      HTTP/1.1 200 OK
      Link: ; rel="alternate"; type="application/activity+json"
      

The link header with the alternate relation type, and an ActivityPub-compatible media type, indicates that the ActivityPub JSON-LD object is available at the linked URL.

This can be a very efficient method of discovery, since the consumer does not need to download the entire HTML document and parse its contents.

Servers may also include the Link header in the response to a GET request for the HTML page.

        GET /user/test1/article-1 HTTP/1.1
        Host: html.example
        

A compliant server will respond with the headers for the resource:

      HTTP/1.1 200 OK
      Link: ; rel="alternate"; type="application/activity+json"
      Content-type: text/html

      <html>
      <head>
      ...
      

Link header failure

Some servers may return the full body of the HTML document in response to a HEAD request, without including a Link header.

        HTTP/1.1 200 OK
        Content-type: text/html

        <html>
        <head>
        ...
        

HTML <link> element

The link element is a metadata element used in the <head> section of an HTML document. It provides links for the whole document, using a number of different link relations.

To indicate its equivalent ActivityPub object, the HTML page at https://html.example/watch/video-1.html could include the following link element:

        <!doctype html>
        <html>
          <head>
            <title>Video 1</title>
            <link
              rel="alternate"
              type="application/activity+json"
              href="https://ap.example/api/descriptors/video-1.jsonld" />
          </head>
          <body>
            <!-- rest of the page -->
          </body>
        </html>
      

Consumers need to parse the HTML to find the link element with the alternate relation and an ActivityPub-compatible media type as type. This can be slow and complicated.

Link element failure

Some servers may include a link element with an alternate relation and with a JSON type or JSON-LD type that does not link to an ActivityPub resource.

          <!doctype html>
          <html>
            <head>
              <title>Video 1</title>
              <link
                rel="alternate"
                type="application/json"
                href="https://api.example/unrelated/videodescriptor.json" />
            </head>
            <body>
              <!-- rest of the page -->
            </body>
          </html>
        

HTML <a> element

The a element is an element used in the <body> section of an HTML document. It can be used to define relationships with other documents, with the benefit that the link is (usually) visible and clickable by a reader.

To indicate its equivalent ActivityPub object, the HTML page at https://html.example/profiles/person-1.html could include the following a element:

        <!doctype html>
        <html>
          <head>
            <title>Person 1</title>
          </head>
          <body>
            <a
              rel="alternate"
              type="application/activity+json"
              href="https://ap.example/users/person-1.jsonld" >
              Actor data for Person 1
            </a>
            <!-- rest of the page -->
          </body>
        </html>
      

Consumers will need to parse the HTML to find the a element with the alternate relation and an ActivityPub-compatible media type as type. This can be even more slow and complicated than with the link header. The link header is usually in the first few kilobytes of a document, and will usually be nested only 2 levels below the document in the DOM tree. An a element may be anywhere in the body, maybe nested very deep in the tree.

a element failure

As with the link element, some servers may include an a element with an alternate relation and with a JSON type or JSON-LD type that does not link to an ActivityPub resource.

In addition, many content management systems allow end users to set rel and other properties on a elements, which may result in false matches. Even more than with other methods, using the a element for discovery requires reverse discovery for confirmation (see Best practices for consumers).

Webfinger

Webfinger is a standard for discovering metadata about a resource identified with an URL. Finding the ActivityPub URL for an actor identified with an acct: URL is well documented in the ActivityPub and Webfinger report. However, Webfinger can be used to find metadata about other resources, including HTML pages with https: URLs.

Given an URL for a document, like https://html.example/group-1.html, a GET request can be made to an URL in the /.well-known/ path of the domain for the URL, as follows:

        GET /.well-known/webfinger?resource=https%3A%2F%2Fhtml.example%2Fgroup-1.html HTTP/1.1
        Host: html.example
      

A compliant server will respond with the metadata for the resource:

        HTTP/1.1 200 OK
        Content-Type: application/jrd+json

        {
          "subject": "https://html.example/group-1.html",
          "links": [
            {
              "rel": "alternate",
              "type": "application/activity+json",
              "href": "https://ap.example/api/groups/group-1.jsonld"
            }
          ]
        }
      

The JRD JSON format includes a number of properties, as defined in the Webfinger RFC 7033. The relevant data structure in this example is the object in the links array with the rel property set to alternate and the type property set to application/activity+json, an ActivityPub-compatible media type. The href property of this link is URL of the ActivityPub equivalent for the HTML page.

Webfinger failure

Not all Webfinger-aware servers return JRD documents for https URLs. Others might only return JRD documents for URLs that represent actors, such as registered users.

As with other link-relation-based discovery mechanisms, like the HTTP Link header or the <link> element, a JSON or JSON-LD media in the link's type property might not indicate an ActivityPub URL, but some other JSON or JSON-LD object.

Embedded JSON-LD

HTML documents can include JSON-LD data in a <script> element in the <head> section of the document. This data can be used to provide metadata about the document, including its equivalent ActivityPub object.

Given a page that shows an image at https://html.example/gallery/image-17.html, the HTML for the page could look like this:

        <!DOCTYPE html>
        <html lang="en">
          <head>
            <title>Image 17</title>
            <script type="application/ld+json">
            {
              "@context": "https://www.w3.org/ns/activitystreams",
              "type": "Image",
              "id": "https://ap.example/api/images/image-17.jsonld",
              "url": [
                {
                  "type": "Link",
                  "mediaType": "text/html",
                  "href": "https://html.example/gallery/image-17.html"
                }
              ]
            }
            </script>
          </head>
          <body>
              <h1>Image 17</h1>
              <p><img src="https://html.example/images/image-17.png"></p>
          </body>
        </html>
      

This embedded JSON-LD specifies that an ActivityPub object with the ID https://ap.example/api/images/image-17.jsonld exists, and that it has an HTML page url at https://html.example/gallery/image-17.html, that is, the current page's URL. This is a roundabout, but clear, way to specify the ActivityPub ID of the current page.

Consumers need to parse the HTML page, and the embedded JSON-LD, to extract the ActivityPub object ID. An advantage to this technique is that other properties of the ActivityPub object can be embedded as well; however, to confirm those properties, the consumer will need to fetch the object from its canonical URL, the ID, anyways.

Embedded JSON-LD failure

Complicated structures for the url property may make it hard to confirm that the object's URL is the same as the current page's.

Embedded JSON-LD is very popular for embedding Schema.org metadata. This can lead to false positives when looking for ActivityPub objects.

Discovering ActivityPub authors from HTML

Discover equivalent object

HTTP Link header

HTML <link> element

Embedded JSON-LD

OpenGraph

Discovering HTML pages from ActivityPub objects

url property

Content negotiation

HTTP Link header

Webfinger

Best practices for consumers

Two-way confirmation

Order of discovery techniques

Discovering ActivityPub objects

Discovering authors

Discovering HTML pages

Handling multiple results

Best practices for publishers

Two-way confirmation

Discovery techniques to support

Multiple results

This is required for specifications that contain normative material.