The tech story behind Flipkart’s Big Billion sale
By Akhilesh Shukla
Around two and half months before the Big Billion sale day the IT team of Flipkart started focusing on the project. The biggest challenge was to scale up the IT infrastructure to match the 20 to25 times spike in the traffic they were expecting.
Controlling mobile application remotely was another challenge for the e-tailer. The IT team wanted to create a new buying experience for the day. They had to roll out the new features on all mobile platform on the same day.
On the D-day it turned out that Flipkart’s calculation was not upto the mark. The management, it seems, did not have fair idea about the expectation of customer and growing popularity of their brand.
“It was our first time. We had no prior experience, all we had to look upto was our pervious experience of flash sale of mobile handsets” said Amod Malviya, Chief Technology Officer, Flipkart.
The engineering company, as Flipkart founders call themselves, had added 4500 servers permanently to its compute capacity to create an internal hybrid cloud to manage the traffic.
As the spike in the traffic was higher than expected a few of the services were affected.
Product service, which leads a buyer to add product to shopping kart, was badly hit because of the humongous traffic. The error affected almost 30% of the buyers during the early sale.
The IT team had to reshuffle compute resources to ensure the service remains live. As a result they have to suspend review and rating services, which they could have easily afforded.
“Another calculation that went wrong was the duration of spike in the traffic. We were expecting spike for few hours only. But actually, it lasted through out the day and started dying around end of the day” Malviya told to ETCIO in a video chat from Bangalore.
The company, expecting a high amount of cyber attack, has also created a separate team for digital security.
One of the beauties of the entire planning was preparation for failure. The IT team has created a contingency plan, as they knew that failures were bound to happen. Whenever, their was a failure a team of engineers was dedicated to ensure the services remain live and while another set just focused on fixing the bug.
As many as 500 engineers worked through out the day.
The company will now re-allocate the new storage capacity for internal use.
“One of the key lessons that we have learnt is to create a better transparency mechanism for such day. Also we need to have a strong testing mechanism for the services and applications that we roll out” concluded Malviya