GM
More tractor fun from last night 😂
mike
mike@mikehardcastle.com
npub1aqak...237n
Building "Brian", my first brain in silicon
Lord Provost of Bitcoin
Fully vested Chief AI Officer (CAIO)
Former, failed, Chief VLOG Officer
Former Chief Shitpost Officer - NOSTR Inc.
Node runner - Miner - Author
My public relay: https://nortis.nostr1.com/
My book: https://mikehardcastle.com/my-book-why-bitcoin/
Here’s an interesting question.
Which is greater 8.9 or 8.11 ?
Now what if I told you it was book notation describing chapters (before the decimal) and paragraphs (after the decimal?
In this context, paragraph 9 comes before paragraph 11 and is therefore smaller.
What if you change the question
Which is greater 8.90 or 8.11 ?
Now, 90 is greater than 11.
But what if we add an irrelevant zero to 11?
Which is greater 8.90 or 8.110 ?
And we blame AI when it get’s the answer wrong, because it was trained on the wrong context.
It turns out, AI’s extensive training on the bible caused early models to always assume 8.9 is smaller than 8.11
I might be the former Chief Vlog Officer (disgraced), but once a vlogger, always a vlogger.
I call this edition:
Tractors, Tractors, Tractors 😂
We’re 20 minutes early for the tractors, but I do have good 5G signal 😂


We're off to see some tractors with fairy lights drive past a neighbouring village tonight 😂


Katharine House Hospice | Palliative and end-of-life care
Banbury Christmas Tractor Run | Katharine House Hospice
The Banbury Christmas Tractor Run returns on 13 December 2025. Watch 100 tractors dressed in all their finery as they drive through Banbury and sur...
GM
What AI calls “Training”, most people would call “Data scraping and encoding”
What AI calls “Alignment” or “Post Training”, most people would call “Training”
What AI calls “Hallucination”, most people would call “Prediction failure”
These incorrect abstractions annoy me and force me to hold a translation layer while constructing a fundamental model of AI in my brain.

It's 6:34pm, I'm a bit late, but:
GM


AI Transformers:


Some fun AI and Internet stuff:
FineWeb is a project to download the text for the entire Internet and remove noise and duplications. Once done, it is a good general purpose training set for LLMs. It comes in at a tiny 44TB's of data:
It uses the original dataset from CommonCrawl, a free to use Web Crawler designed and built between 2007 - 2011 and still active today:

FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW
This page lets you explore how the FineWeb dataset was created, its size, filtering and deduplication steps, and how to download the full dataset a...
Common Crawl - Get Started
Dive into Common Crawl: your guide to accessing vast web data. Start here to harness the web's potential effortlessly.
Watson, my AI receptionist casually doing his thing 😂


One Sunday morning, after a heavy Saturday night in the local pub with my mates, my wife got a call from our neighbour, one of the world’s leading proctologists. He needed help with a powerpoint presentation. He couldn’t get a video to play inside his presentation.
Guess what it was a video of? 😂
I couldn’t look at the screen, instead I had to tell him what to do while looking in the other direction. After In fixed it, I came home, threw up and went to bed.
Later that day, we went to an American neighbours 4th July party. At the event, I met a nice elderly lady, so I recounted the story. She asked the name of the proctologist. I told her, she exclaimed “That’s my ass doctor” 😂
View quoted note →