Budget debacle (how not to protect website content)

The BBC has reported that "OBR calls in a cyber expert over botched release of Budget analysis". We think we know what happened.

I am not that cyber expert, but I think we can work it out.

It seems that the 204-page PDF file with the market-sensitive analysis of the Budget was uploaded to the OBR's site ready for it to be published immediately after Rachel Reeves sat down from delivering her much anticipated budget.

Guessing the file name

Unfortunately, a sleuthing journalist was able to guess the file's URL correctly as:
https://obr.uk/docs/dlm_uploads/OBR_Economic_and_fiscal_outlook_November_2025.pdf

This did not require them to be Sherlock Holmes when last year's PDF file was called
https://obr.uk/docs/dlm_uploads/OBR_Economic_and_fiscal_outlook_Oct_2024.pdf

Robots.txt file that doesn't work

The site has a very simple two-line robots.txt file with the sole purpose of dissuading search engines from crawling and discovering files in the /docs/dlm_uploads/ folder.

User-agent: *
Disallow: /dlm_uploads/

Unfortunately, the syntax is wrong. It should be:

User-agent: *
Disallow: /docs/dlm_uploads/

It should be said that robots.txt files do not protect confidential information. A correctly written robots.txt file only dissuades good bots (e.g. Googlebot) from crawling the defined URLs. It does not guarantee that they will not be indexed in some circumstances. And for a bad bot, it is just a really helpful signpost.

Lessons

The OBR is not the first organisation to be embarrassed by content which it thought was hidden. This will only get worse now that we have AI bots able to do the guessing themselves and uncover all manner of confidential material.

A humbling reminder to all of us involved with confidential documents and online publishing to think carefully about what we are doing.

Newsletter sign up

Post Information