Want to start a big data company? Here are 5 things you need to know
Big data is one of the hottest tech terms today, but it’s also one of the toughest ventures to start. The acquisition of Infochimps by CSC suggests that many big data startups unable to secure their second round of funding face either closure or acquisition, as seen with companies like Drawn to Scale, Ravel Data, and Nodeable. There are countless other big data startups that haven’t garnered much attention.
In a recent article on Gigaom, author Derrick Harris discussed the growth and financing challenges faced by big data startups, emphasizing that these companies must focus on several key areas to survive and attract investor interest: wisely choosing your battlegrounds and target users while building a community around your technology. Big data needs doers, not cheerleaders. Harris's insights can be summarized into five points, which IT Manager Network has excerpted and organized below:
1. Infrastructure is extremely difficult
Not only is developing infrastructure technology products challenging, but selling them is even harder—especially when it comes to big data infrastructure tools such as Hadoop, NoSQL databases, and stream processing systems. Customers require extensive training and education, and paying customers demand significant support and timely product development follow-ups.
This requires substantial financial backing. For example, Greenplum secured $100 million in funding in 2010 but still couldn't complete all its work, ultimately opting to sell to EMC. Today’s most well-known big data startups have raised even more funds, such as Cloudera. Infrastructure-focused big data startups typically require millions of dollars in seed funding to get started, yet securing Series A funding remains an arduous journey.
New big data startups must also compete against established players who already have some recognition or partnerships with clients, such as Cloudera, Hortonworks, 10gen, Amazon AWS, IBM, and Oracle.
In contrast, starting a big data application business is relatively simpler, whether it involves vertical industry applications or general-purpose data visualization tools. These applications offer more direct value to customers, are closer to business operations, and integrate more easily into enterprise IT systems.
2. Cloud computing is your friend
Whether you're selling big data infrastructure or applications, cloud computing is a more efficient platform for conducting business. Choosing the cloud isn't just about hosting in the cloud; it's about delivering services to customers through the cloud. You gain more control, and optimizing operations with limited resources deepens your understanding of your product.
Cloud computing also reduces the cost and barriers for potential users to try out your product. Companies from NewRelic to Amazon AWS have benefited significantly from the cloud + big data model.
3. Developers are your friends
If you're primarily involved in big data analytics, such as ClearStory, Platfora, or CRM marketing applications, then data analysts are your allies. In any case, the best approach is to develop and market your product targeting developers and marketers rather than CIOs, who may not be ideal targets!
Focusing on CIOs instead of developers often leads to tricky issues when closing deals. Many cloud startups and pure big data software companies adopt developer-centric marketing strategies, such as Splunk and Tableau.
For instance, both Infochimps and Continuuity offer similar products (both were forced to pivot to user data centers), but Continuuity is entirely developer-oriented, meaning it can accumulate more technical fans.
4. Bring data scientists to the forefront
This is both a marketing and sales strategy. Data scientists are the ones who can showcase the power of data and platforms, and they are the most sought-after speakers at conferences.
However, data scientists must carefully select what they present. With Hadoop and NoSQL widely accepted, there's no need to repeatedly emphasize the four Vs during every meeting. Discussions on configuring and integrating big data systems will only appeal to a small audience unless your project is exceptionally large.
Cloudera stands out from its competitors for many reasons, but Jeff Hammerbacher is undoubtedly a key figure. Instead of merely discussing the value and architecture of big data, talk specifically about what analyses can be performed and how to execute them from the audience's perspective.
5. How important open source is depends on you
Almost all big data companies rely on open-source software. Some "borrow" existing projects like Hadoop, Storm, and various databases, while others develop their own or use a hybrid model, adding features on top of something like HBase. These open-source projects thrive due to the strength of their communities.
Open-source is not as easy as it looks. Simply posting code on GitHub doesn't mean you're contributing to the community. The goal of open-source is to bring together people using the same code into a community and continuously improve it. This ties back to point three about attracting developers. Only when more users and developers show interest in your product, invest time and effort in it, will they eventually be willing to pay.
Countless startups have open-sourced their code, but only those truly driving projects forward and building communities stand out. Examples include Neo Technology's Neo4j, Concurrent's Cascading, and 10gen's MongoDB. Even consumer-facing companies like Twitter have open-sourced projects like Storm and Mesos.