In MastoAPI media descriptions are updated via the
media update API not upon post creation or post update.
This functionality was originally added about 6 years ago in
ba93396649 which was part of
https://git.pleroma.social/pleroma/pleroma/-/merge_requests/626 and
https://git.pleroma.social/pleroma/pleroma-fe/-/merge_requests/450.
They introduced image descriptions to the front- and backend,
but predate adoption of Mastodon API.
For a while adding an `descriptions` array on post creation might have
continued to work as an undocumented Pleroma extension to Masto API, but
at latest when OpenAPI specs were added for those endpoints four years
ago in 7803a85d2c, these codepaths ceased
to be used. The API specs don’t list a `descriptions` parameter and
any unknown parameters are stripped out.
The attachments_from_ids function is only called from
ScheduledActivity and ActivityDraft.create with the latter
only being called by CommonAPI.{post,update} whihc in turn
are only called from ScheduledActivity again, MastoAPI controller
and without any attachment or description parameter WelcomeMessage.
Therefore no codepath can contain a descriptions parameter.
Due to JSON-LD compaction the full address of public scope
may also occur in shorter forms and the spec requires us to treat them
all equivalently. To save us the pain of repeatedly checking for all
variants internally, normalise inbound data to just one form.
See note at: https://www.w3.org/TR/activitypub/#public-addressing
This needs to happen very early, even before the other addressing fixes
else an earlier validator will reject the object. This in turn required
to move the list-tpye normalisation earlier as well, but since I was
unsure about putting empty lists into the data when no such field
existed before, I excluded this case and thus the later fixing had to be
kept as well.
Fixes: https://akkoma.dev/AkkomaGang/akkoma/issues/670
literally nothing uses C2S AP, and it's another route into core
systems which requires analysis and maintenance. A second API
is just extra surface for potentially bad things so let's take
it out back and obliterate it
We were overzealous with matching on a raw error from the object fetch that should have never been relied on like this. If we can't fetch successfully we should assume that the collection is private.
Building a more expressive and universal error struct to match on may be something to consider.
"id" is used for the canonical link to the AS2 representation of an object.
"url" is typically used for the canonical link to the HTTP representation.
It is what we use, for example, when following the "external source" link
in the frontend. However, it's not the link we include in the post contents
for quote posts.
Using URL instead means we include a more user-friendly URL for Mastodon,
and a working (in the browser) URL for Threads
previously we would uncritically take data and format it into
tags for static-fe and the like - however, instances can be
configured to disallow unauthenticated access to these resources.
this means that OG tags as a vector for information leakage.
_technically_ this should only occur if you have both
restrict_unauthenticated *AND* you run static-fe, which makes no
sense since static-fe is for unauthenticated people in particular,
but hey ho.
Per the XRD specification:
> 2.4. Element <Alias>
>
> The <Alias> element contains a URI value that is an additional
> identifier for the resource described by the XRD. This value
> MUST be an absolute URI. The <Alias> element does not identify
> additional resources the XRD is describing, **but rather provides
> additional identifiers for the same resource.**
(http://docs.oasis-open.org/xri/xrd/v1.0/os/xrd-1.0-os.html#element.alias, emphasis mine)
In other words, the alias list is expected to link to things which are
not just semantically the same, but exactly the same. Old user accounts
don't do that
This change should not pose a compatibility issue: Mastodon does not
list old accounts here (See e1fcb02867/app/serializers/webfinger_serializer.rb (L12))
The use of as:alsoKnownAs is also not quite semantically right here
(see https://www.w3.org/TR/did-core/#dfn-alsoknownas, which defines
it to be used to refer to identifiers which are interchangable) but
that's what DID get for reusing a property definition that Mastodon
already squatted long before they got to it
Since we always followed redirects (and until recently allowed fuzzy id
matches), the ap_id of the received object might differ from the iniital
fetch url. This lead to us mistakenly trying to insert a new user with
the same nickname, ap_id, etc as an existing user (which will fail due
to uniqueness constraints) instead of updating the existing one.
In order to properly process incoming notes we need
to be able to map the key id back to an actor.
Also, check collections actually belong to the same server.
Key ids of Hubzilla and Bridgy samples were updated to what
modern versions of those output. If anything still uses the
old format, we would not be able to verify their posts anyway.
To save on bandwith and avoid OOMs with large files.
Ofc, this relies on the remote server
(a) sending a content-length header and
(b) being honest about the size.
Common fedi servers seem to provide the header and (b) at least raises
the required privilege of an malicious actor to a server infrastructure
admin of an explicitly allowed host.
A more complete defense which still works when faced with
a malicious server requires changes in upstream Finch;
see https://github.com/sneako/finch/issues/224
Certain attacks rely on predictable paths for their payloads.
If we weren’t so overly lax in our (id, URL) check, the current
counterfeit activity exploit would be one of those.
It seems plausible for future attacks to hinge on
or being made easier by predictable paths too.
In general, letting remote actors place arbitrary data at
a path within our domain of their choosing (sans prefix)
just doesn’t seem like a good idea.
Using fully random filenames would have worked as well, but this
is less friendly for admins checking emoji dirs.
The generated suffix should still be more than enough;
an attacker needs on average 140 trillion attempts to
correctly guess the final path.
This will decouple filenames from shortcodes and
allow more image formats to work instead of only
those included in the auto-load glob. (Albeit we
still saved other formats to disk, wasting space)
Furthermore, this will allow us to make
final URL paths infeasible to predict.
Since 3 commits ago we restrict shortcodes to a subset of
the POSIX Portable Filename Character Set, therefore
this can never have a directory component.
E.g. *key’s emoji URLs typically don’t have file extensions, but
until now we just slapped ".png" at its end hoping for the best.
Furthermore, this gives us a chance to actually reject non-images,
which before was not feasible exatly due to those extension-less URLs
As suggested in b387f4a1c1, only steal
emoji with alphanumerc, dash, or underscore characters.
Also consolidate all validation logic into a single function.
===
Taken from akkoma#703 with cosmetic tweaks
This matches our existing validation logic from Pleroma.Emoji,
and apart from excluding the dot also POSIX’s Portable Filename
Character Set making it always safe for use in filenames.
Mastodon is even stricter also disallowing U+002D HYPEN-MINUS
and requiring at least two characters.
Given both we and Mastodon reject shortcodes excluded
by this anyway, this doesn’t seem like a loss.
Even more than with user uploads, a same-domain proxy setup bears
significant security risks due to serving untrusted content under
the main domain space.
A risky setup like that should never be the default.
Else malicious emoji packs or our EmojiStealer MRF can
put payloads into the same domain as the instance itself.
Sanitising the content type should prevent proper clients
from acting on any potential payload.
Note, this does not affect the default emoji shipped with Akkoma
as they are handled by another plug. However, those are fully trusted
and thus not in needed of sanitisation.