Twitter Implements Usage Limits for All to Combat Data Scrapers

Never a dull day at Elon Musk’s ‘Twitter 2.0’, even as we head into a long weekend.

Today, Twitter has taken the extreme step of essentially restricting users from viewing tweets, in an effort to address what Elon describes as ‘extreme levels of data scraping’.

To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:

- Verified accounts are limited to reading 6000 posts/day
- Unverified accounts to 600 posts/day
- New unverified accounts to 300/day
— Elon Musk (@elonmusk) July 1, 2023

As outlined in this tweet, in order to address data scraping concerns, Twitter has restricted both verified and unverified usage at different thresholds, which Musk says is a necessary step to stop data scrapers.

Go over the limits and you’ll see this message:

Twitter has since increased these thresholds to:

Verified accounts – 10,000 posts
Unverified accounts – 1,000 posts
New unverified accounts – 500 posts

It’s actually increased these limits twice within the five hours since Elon’s initial tweet, which does suggest that this could be fully resolved very shortly. But still, it’s an extreme measure to combat misuse - and there may also be no definitive way to stamp out tweet scraping entirely.

Twitter’s action against data scrapers actually started two days ago, when Twitter began restricting non-logged-in users from viewing tweets.

Which, in itself, is a significant move, because around 40% of tweet viewers actually do so without logging in, meaning that Twitter’s immediate addressable audience would be severely impacted by this change.

The gamble that Twitter’s taking is that maybe this prompts more people to log in, where it can show them more targeted ads. But the risk is that they don’t, and Twitter loses reach and relevance as a result.

Twitter CTO Elon Musk later explained that this was a temporary measure to address misuse:

Several hundred organizations (maybe more) were scraping Twitter data extremely aggressively, to the point where it was affecting the real user experience.

What should we do to stop that? I’m open to ideas.
— Elon Musk (@elonmusk) June 30, 2023

A key concern here is that many organizations are now trying to get into the generative AI game, and in order to do that, they need conversational data, and Twitter’s ‘open garden’ approach, which is designed to facilitate broader discussion, makes it a prime target for scraping to gather such info.

Other platforms already took measures to address this long ago. Facebook, for example, restricts the amount of information non-users can access, as does Instagram, and LinkedIn as well. These organizations recognized the value of their proprietary data, but Twitter’s approach has always been to host broader, global discussion, which is why tweets have remained publicly accessible to a large degree.

But now, that looks to be a problem, which Twitter 2.0 is working to address in a way that could severely impact usage.

Add to this the fact that Twitter has also significantly upped the price of its API access, which was also designed to stop misuse of its data (note: Reddit has also increased the price of its API usage), and it leads to a point of no compromise for the platform – because while higher API access costs have priced many providers out, the majority of tweet data remains publicly available. That means that many of these developers will simply revert to the scraping route instead, unless they can’t, which is the loophole that Twitter’s looking to close with this change.

So what will the impact of that be?

Well, as noted, around 40% of Twitter users in Europe access tweets without logging in. That’s not necessarily reflective of all regions, but it does suggest that a significant portion of the platform’s 252 million daily active users are actually doing so without ever logging into a profile.

Those users will now be far more limited in what they can view, while regular users will also potentially be restricted from using Twitter at a certain point, unless Twitter can come up with a better solution.

That could be to pinpoint accounts that are being used by scrapers, in order to stamp them out, and maybe that’s a means to stop the practice. But the problem will likely remain, as scrapers will find other ways in, in order to keep taking Twitter data.

As such, there’s no real solution to the problem, and it’ll likely take Twitter some time to address the key entry points and overuse issues before it can comfortably lift all content limits.

That, eventually, could also end up seeing Twitter raising its walls and restricting tweet access permanently, which could also impact its position as a key news breaking and discussion platform.

But at the same time, Twitter’s caught in the middle, with scrapers overloading its systems to fuel large language models. I do think this is a legitimate concern, but the solutions, for Twitter, don’t look very appealing.