6 min read

XBlock Summer 2024 Update

Howdy! I figure it's probably not the worst idea to do a quick update on XBlock progress, as well as give some indication what I'm planning to do in the future with it. This will be more roadmap-y than my last post.

According to Internect.info, I created the @xblock.aendra.dev account on 4 March, 2024 (in reality, it was closer to April when the labeller came online, albeit as an entirely manual process). Since then, it has processed over 3 million images and labelled about over 132,000 screenshot posts, with a few over 500 mislabelled reports handled by myself via the moderation interface.

That is an astonishing number. To break it down:

 altright-screenshot         |   832
 bluesky-screenshot          | 29170
 facebook-screenshot         |  3376
 fediverse-screenshot        |   247
 instagram-screenshot        |  6388 # I suspect this number is a bit inflated
 news-screenshot             |   237 # These are fully manual still
 ngl-screenshot              |  3633
 reddit-screenshot           |  3989 # Also a bit inflated due to mislabels
 threads-screenshot          |   159
 tumblr-screenshot           |  2670
 twitter-screenshot          | 85991

(Note, this is reported from my Ozone label table, these numbers include mislabels that were eventually negated, and the deprecated "uncategorised-screenshot" label has been excised from the data. Current as of 03:40 on 21 May 2024.)

It is mind-boggling how many Twitter screenshots it has labelled. Sometimes I feel down when I see XBlock mislabel something on the timeline or have a bunch of reports in my queue, but then I realise the service labels vastly more screenshots than I'll ever see personally and despite mislabels being frustrating, I still feel it's providing a really useful service – anecdotally, the other day when it was down due to a bug in my code, I noticed vastly more screenshot-related discourse, which I feel was in no small part due to users being bombarded with unlabelled screenshots. One of the goals of XBlock is to help improve the quality of discourse on Bluesky, so it's really satisfying to notice when it has an impact (even though in this case it was unfortunately due to downtime).

ændra. (@aendra.com)
It’s interesting that there was so much negative discourse around JKR screenshots yesterday given nothing was being labelled. I kind of suspect there would have been far less of that had XBlock been working, or if I had been on top of reports. That’s wild, almost a measurable change in discourse. [contains quote post or other embedded content]

Next steps

Be that as it may, there's definitely room for improvement. My training pipeline is reporting that it had an 87% accuracy rate after further fine-tuning the xblock-large-patch2-224 model, and I'd really like to get it upwards of 95% if at all possible. I'm not entirely sure it is given the incredible amount of variance in screenshots online (particularly when many are of 3rd party apps), but I'm damn well going to try.

My next step is to take the entire patch2 training set and combine it with the patch3 training set, pull the latest set of screenshots labelled since patch3, then go through each set of images and make sure they're both accurate as well as good representations of what I want to search for. Previously I included anything that had a remote chance of being a relevant screenshot, which worked fairly well, however, I think it's resulted in a lot of irrelevant screenshots of apps with "dark mode" colour schemes getting mislabelled. I also think the patch2 NGL training set wasn't particularly good, because there's no way that it should miss any of those, they're so visually distinct. Lastly, I want to make sure it has a huge "irrelevant" training set to help it better distinguish between Instagram screenshots and normal photographs; I think most of the failings of patch3 are a result of having far too small a set of those last training run.

For anyone submitting reports, please continue doing so! They help me both collect more accurate training data as well as ensure a high quality of output, plus they help give me a yardstick to measure how accurate my latest model is. Reminder you should report screenshots XBlock has missed using Bluesky's built-in tools (or create an appeal if it's your own post!), not by mentioning the account in a skeet.

Auto-appeals

One thing I'll probably do sooner rather than later is create a process to auto-appeal your own posts. In my opinion, this is low risk and fills in the gaps where I'm not able to immediately action reports (I also kind of hate manually actioning appeals because they take twice as many steps as actioning a normal report in Ozone). You'll be able to use the Bluesky appeals system to automatically appeal a XBlock label, or if you put another social media platform in the appeal's body, it will relabel it to that. I don't anticipate people will actively abuse the appeals process but will monitor it closely to prevent people from unlabelling all of their posts. Expect this shortly after I finish training patch4.

Community annotation

I plan to eventually get CVAT.ai working on AWS, which will then be incorporated into the XBlock pipeline. Newly labelled images will get uploaded to CVAT automatically, and anyone who's interested can help with the training process by creating an account and annotating images. This will save me dozens of hours manually annotating screenshots myself. Note that these annotations won't result in the related post being labelled, it's simply to help create a larger and more accurate training set. I may also create an opt-in leaderboard for anyone participating just as a little extra encouragement. I'm planning to do this as soon as my AWS credit is applied to my account (see below!).

Building on XBlock

I welcome attempts to build things on top of XBlock's label firehose, models, and firehose consumer (all of which are open source) – please feel free to (though I'd love it if you could mention me so I know that you are!). One thing I plan to do is create an endpoint for summary statistics, for two reasons: because I want to create a bot to generate daily/weekly/month report charts showing how many images have been labelled, and also because I think this is a really interesting, uncharted area of data gathering so providing social media researchers with aggregate statistics could be helpful for their work. I'll probably start on this after releasing auto-appeal support.

ændra. (@aendra.com)
Think once I get the model running in a satisfactory state I’m going to build a bot that publishes charts with labelling stats for XBlock, things like number of labels per day, % change, etc.

Bluesky grant and triage volunteering

In other exciting news, I recently received a grant from Bluesky, which includes $500 to help with XBlock, in addition to $5000 over the next two years in AWS credit. This is fantastic, it means I can spin up incredibly powerful on-demand cloud instances, upload my training data, and have them do everything for me. No more training in WSL2! No more pining wistfully at Nvidia cards I won't ever use to play games! No more waiting two whole-ass days for my pitiful GTX 980 to crunch through 10,000 screenshots!

As for the $500, I said in my initial grant application I wanted to hire some part-time moderators to help manage the queue while creating the initial training set, but I did that myself and now managing the queue takes less than half an hour a day. I recently posted that I'm looking for people to help with triage and have gotten a few people express interest, so I may use that to pay a small honorarium to anyone who helps out; it won't be much, maybe enough to buy a pizza per month, but I'd hate not to do something for anyone doing that labour, even if I have to pay it out-of-pocket once the grant funds run out (hopefully by that point it'll only be a handful of reports to manage per day because the model is accurate enough by then it doesn't result in as many!).

GitHub Sponsors

I had to create a GitHub Sponsors page so I could receive the grant from Bluesky; if you want to chuck me a few bucks to help with maintenance or buy me a coffee, please feel free to, I'd really appreciate it! 💚


That's it for now!

-æ.