akkoma

Author	SHA1	Message	Date
Mark Felder	5da9cbd8a5	RichMedia refactor Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break. Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data. Implementation notes: - The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL - The Task will attempt to fetch the data 3 times with increasing sleep time between attempts - The HTTP request obeys the default HTTP request timeout value instead of 2 seconds - URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes - URLs that fail with an expected error will receive a negative cache with no TTL - Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered - Expiring image URLs are handled with an Oban job - There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time - The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering Overall performance of timelines and creating new posts which contain URLs is greatly improved.	2024-06-09 17:33:48 +01:00
Alex Gleason	3ff9c5e2a6	Break out activity-specific HTML functions into Pleroma.Activity.HTML Fixes cycles in lib/pleroma/ecto_type/activity_pub/object_validators/safe_text.ex	2021-05-29 12:29:11 -05:00
Haelwenn (lanodan) Monnier	c4439c630f	Bump Copyright to 2021 grep -rl '# Copyright © .* Pleroma' * \| xargs sed -i 's;Copyright © .* Pleroma .*;Copyright © 2017-2021 Pleroma Authors <https://pleroma.social/>;'	2021-01-13 07:49:50 +01:00
lain	e1e7e4d379	Object: Rework how Object.normalize works Now it defaults to not fetching, and the option is named.	2021-01-04 13:38:31 +01:00
lain	713612c377	Cachex: Make caching provider switchable at runtime. Defaults to Cachex.	2020-12-18 17:44:46 +01:00
rinpatch	e198ba492e	Rich Media: Do not cache URLs for preview statuses Closes #1987	2020-09-05 20:53:46 +03:00
rinpatch	46236d1d87	html.ex: optimize external url extraction By using a :not() selector and only extracting attributes from the first match.	2020-09-02 12:45:20 +03:00
Alexander Strizhakov	6512ef6879	excluding attachment links from RichMedia	2020-06-29 15:25:57 +03:00
Haelwenn (lanodan) Monnier	6da6540036	Bump copyright years of files changed after 2020-01-07 Done via the following command: git diff `fcd5dd259a` --stat --name-only \| xargs sed -i '/Pleroma Authors/c# Copyright © 2017-2020 Pleroma Authors <https:\/\/pleroma.social\/>'	2020-03-02 06:08:45 +01:00
rinpatch	472132215e	Use floki's new APIs for parsing fragments	2020-02-16 01:55:26 +03:00
feld	237b2068f9	Revert "Merge branch 'feat/floki-fasthtml' into 'develop'" This reverts merge request !2194	2020-02-11 16:55:18 +00:00
rinpatch	ea1631d7e6	Make Floki use fast_html	2020-02-11 16:17:21 +03:00
Egor Kislitsyn	b7a57d8e38	Use Pleroma.Utils.compile_dir/1 in Pleroma.HTML.compile_scrubbers/0	2019-12-10 00:38:01 +07:00
rinpatch	d6c89068f3	HTML: Compile Scrubbers on boot This makes it possible to configure their behavior on OTP releases.	2019-12-08 20:35:41 +03:00
rinpatch	a21340caa1	Fix never matching clause `length/1` is only used with lists.	2019-12-08 16:46:18 +03:00
Egor Kislitsyn	cf52106e05	Update Floki dependency	2019-12-02 13:38:35 +07:00
Egor Kislitsyn	a98cda7758	Fix Pleroma.HTML.extract_first_external_url/2	2019-11-29 15:49:35 +07:00
rinpatch	ae59b38203	Rip out the rest of htmlsanitizeex	2019-10-30 09:20:13 +03:00
rinpatch	77cfb08b8c	Remove commented-out code	2019-10-29 20:58:54 +03:00
rinpatch	08f6837065	Switch from HtmlSanitizeEx to FastSanitize	2019-10-29 01:18:08 +03:00
Egor Kislitsyn	cf3041220a	Add support for `rel="ugc"`	2019-09-19 14:56:10 +07:00
lain	ef43016b2c	Merge branch 'feature/custom-fields' into 'develop' Add custom profile fields See merge request pleroma/pleroma!1488	2019-08-20 12:44:14 +00:00
Haelwenn (lanodan) Monnier	a6a814420d	html.ex: Allow sub and sup elements by default Closes: https://git.pleroma.social/pleroma/pleroma/issues/1191	2019-08-14 22:49:13 +02:00
Egor Kislitsyn	f7bbf99caa	Use info.fields instead of source_data for remote users	2019-08-14 14:52:54 +07:00
rinpatch	035368d363	Rich Media: Skip Microformats hashtags When fixing this problem I incorrectly assumed a.hashtag is the proper way for detecting hashtags, but it is just something Pleroma and Mastodon add. Per microformats it should be detected by the presense of rel=tag. This MR adds a check for rel=tag, but I still left a.hashtag just in case	2019-06-19 00:46:30 +03:00
rinpatch	d0ebc0edf3	Fix hashtags being picked up by rich media parser Closes #989	2019-06-14 14:34:42 +03:00
Egor Kislitsyn	99f70c7e20	Use Pleroma.Config everywhere	2019-05-30 15:33:58 +07:00
Haelwenn (lanodan) Monnier	85b5c60694	Pleroma.Formatter: width/height to class=emoji	2019-05-03 16:25:58 +02:00
rinpatch	51e26f14f7	Remove redundant ensure_scrubbed_html It is never used as handling for fake and non-fake activities was merged into one function above it	2019-05-01 13:52:44 +03:00
Sachin Joshi	85fa2fbce4	add scrubber for html special char	2019-05-01 01:37:17 +05:45
kaniini	030a7876b4	Merge branch 'security/fix-html-class-scrubbing' into 'develop' html: lock down allowed class attributes to only those related to microformats See merge request pleroma/pleroma!1090	2019-04-23 23:07:56 +00:00
William Pitcock	f5535e5743	html: lock down allowed class attributes to only those related to microformats	2019-04-23 23:03:45 +00:00
rinpatch	627e5a0a49	Merge branch 'develop' into feature/database-compaction	2019-04-17 12:22:32 +03:00
rinpatch	f0f30019e1	Refactor html caching functions to have a key instead of a module, use more correct terminology and fix summaries in mastoapi	2019-04-05 15:19:44 +03:00
rinpatch	975482f091	insert object defaults for fake activities and make credo happy	2019-04-01 12:16:51 +03:00
rinpatch	45ba10bf47	Fix the issue with HTML scrubber	2019-04-01 11:55:59 +03:00
Fong-Wan Chau	4ed2618f6c	Allow 'rel' attribute on `<a>` link with specific values (for hashtag recognition).	2019-03-17 11:03:19 -04:00
Haelwenn (lanodan) Monnier	fb82f6fc7c	[Credo] Remove parentesis on argument-less functions	2019-03-13 04:26:56 +01:00
Haelwenn (lanodan) Monnier	381fe44172	HTML.Scrubber.Default: Consistency	2019-02-09 14:59:21 +01:00
Haelwenn (lanodan) Monnier	2272934a5e	Stash	2019-02-09 14:59:21 +01:00
Haelwenn (lanodan) Monnier	60ea29dfe6	Credo fixes: alias grouping/ordering	2019-02-09 14:59:20 +01:00
William Pitcock	a2bb5d890d	html: don't attempt to parse nil content	2019-02-05 05:06:17 +00:00
William Pitcock	ddb5545202	rich media: kill some testsuite noise	2019-01-28 20:55:33 +00:00
William Pitcock	be9abb2cc5	html: add utility function to extract first URL from an object and cache the result	2019-01-26 14:55:12 +00:00
William Pitcock	1ddab78247	html: allow microformats-related markup through the html filter	2019-01-16 03:54:01 +00:00
Rin Toshaka	1e2d58982e	oopsies	2019-01-05 00:25:31 +01:00
Rin Toshaka	846082e54f	Different caches based on the module. Remove scrubber version since it is not relevant anymore	2019-01-05 00:19:46 +01:00
William Pitcock	980b5288ed	update copyright years to 2019	2018-12-31 15:41:47 +00:00
Rin Toshaka	7e09c2bd7d	Move scrubber cache-related functions to Pleroma.HTML	2018-12-31 08:19:48 +01:00
Rin Toshaka	c50353e6ae	shame on me for not testing after revert	2018-12-30 20:44:17 +01:00

1 2

73 commits