This is mostly for folks who work at Twitter or who have heard some behind-the-scenes stories about what it took to do this change.
Given that Twitter has long been only allowing 140 characters, I'm wondering if the 140 character limit assumption was hard-baked into any parts of their architecture that they had to spend time refactoring, or if it was as simple as removing a constraint on the front-most part of their stack? Did it for example require any changes to the storage layer?
The technical “changing-140-to-280” is only one aspect of the work. Other very important aspects include QA (e.g does increasing the limit break the layout when 280 characters are used?) and data science (e.g does having longer tweets actually keep people on Twitter longer and increase engagement)?
Both of those are reasons why companies do staged rollouts of features incase the company makes a mistake.
Agreed, but also storage requirements I would imagine. Those extra 140 characters add up quick in an environment like Twitter.
It's estimated at 500 million tweets a day. So UTF-8 characters take 1 byte. That's an extra 140 bytes needed per tweet (at the minimum, I'm not positive on how it's stored in the back end if they do ngrams or whatever else to a tweet). So that's 70,000,000,000 bytes. That's... 70GB, right? 70gb per day, that's an extra 24.920 TB of space needed per year. Not a large amount. Hell, I have a surveillance server with more storage than that. But then running analytics against that extra data, ALL THE DAMN TIME. Plus, if going 280 does, in theory, increase usage, then that number would be higher. I'm assuming UTF-8. I think I'm wrong. Might be 16, which then increases it to...6 bytes a character? I should really open a google tab, but I'm too lazy. I already grabbed a calculator. My exercise for the day is met :P
An extra 140 character SEEMS small, but on Twitter scale, that's not a light matter. Shit like that can run away on you fast.
edit: Jesus Christ, yes, I didn't include every single bloody thing. This is just a simple 5 minute example how Twitter scale can have a huge impact on "small" decisions. There's outliers in the average statistics. Not everyone will use the full character limit, frequency changes, etc. Nitpickers man...
Only a small percentage of tweets actually use 140, I would not assume that because the limit is doubled 100% of new tweets will be 240. Probably less than 5% of tweets will go beyond 140.
There is also increased network traffic, more bytes to keep in the caches (RAM?), maybe effectively doubling the amount of data you store (in the worst case) creates terrible effects on (some of) your queries and increase latency somewhere in the system and you have to find where...
There could be "big" ramifications, it's a pretty interesting thing to think about :)
Well there will also be a reduction when all these (1/2) double tweets go away. I'm pretty sure the metadata, analytics, events, etc 2 tweets generate is way more than storing those extra bytes per tweet.
I saw a chart somewhere recently comparing the tweet length distributions between Japanese and other countries which basically says, there are less surge near the 140 char limit in JP than in other countries, and that should describe the reason why Twitter is still popular in JP while it's struggling in others: JP people are less frustrated by the char limit and thus experience a better UX.
I think that's the parent's point: it's not "refactoring" if the external behavior changed. Though I generally agree with the down-voting of the comment as needlessly pedantic, given the common usage of the term.
Twitter's original limit was 140 bytes, but it's been 140 normalized Unicode code points for years. The physical message can be much longer than 140 bytes.
The interesting question is why did SMS only have 160 characters. I know those 160 bytes are 7 bit characters and that a 8 bit SMS can contain 140 characters, but why does the control channel limit it to that?
I worked on a WAP browser a long time ago, which would support SMS in PDU mode as a bearer. WDP packets would just be encapsulated in an SMS. I am too old ... :(
Many years ago I worked on an SMSC. SMS are sent over the same data channels as phone signalling (call setup etc.) using SS7 protocol stack. The SS7 protocol physical layers used fixed size data packets of around 256 bytes, for example on T1 data channels. On top of that you need routing info such as addressing in MTP, TCAP to get the packet to the correct mobile switching center. Then GSM or IS-41 MAP to route to the device. Finally the sms bearer encoding needs some extra bytes for thing like delivery reports and source phone number. This left 160 bytes free.
In other words, SMS messages could be delivered across the communication channel that the devices were already doing in-order to remain in-contact with towers, essentially allowing providers to charge (SMS used to be an add-on) for something that cost carriers basically nothing in transmission costs.
Given that Twitter has long been only allowing 140 characters, I'm wondering if the 140 character limit assumption was hard-baked into any parts of their architecture that they had to spend time refactoring, or if it was as simple as removing a constraint on the front-most part of their stack? Did it for example require any changes to the storage layer?
Just curious.