How the cloud is gobbling up your data and what to do about it?

Image for post
Image for post

How many times have we heard the phrase “data is the new oil” in recent times? For all the hype that surrounds the hackneyed phrase and it’s siblings such as “data drives the new engines of growth” etc, we are slowly losing control over our own data. In this age of ubiquitous cloud computing and data democratization pushes, the amount of personal, enterprise, business data that is literally silo-ed in various SaaS services that are used by us and the companies we work for. Isn’t moving to the cloud supposed to deliver us from this “silo” hell and isn’t “big data” a standard practice in every company not to mention hoarders of our personal data such as google, Facebook and Amazon?

Before we can answer these questions, we need to examine 3 prevalent trends and how accurate their claims are and how they are really detrimental to data ownership. This analysis is based on my world-view that the cloud is still the wild-west and no one is interested in defining standards thinking it could work against them because they consider that the way they do things fundamentally separates them from the competition.

First, cloud computing: yes, cloud computing is the defining technology of our times, that even techies and non-techies come to be intimately familiar with in one form of the other. You want to scale your application? Move to the cloud. You lost your photos? Don’t worry the cloud has it… These perfunctory solutions are thrown around as if cloud is one big shared resource that’s equal for everyone and everywhere. In reality, every company’s data, may it be google or xyz, inc are stored in “walled gardens” that are totally separate from each other even if they use the same cloud provider. So “cloud” is not a synonym for a shared, collective resource. Just because you move your application or data to the “cloud” doesn’t mean you get the same cost structure or even protection or performance as say Netflix. Also it doesn’t mean it’s free or it’s forever. It gives an illusion of permanence because it’s backed by a large company such as Microsoft or Amazon, but it’s not guaranteed.

On the personal front, your cloud data such as your passwords that is stored by a service like Lastpass or even your fitness logs etc, is as secure or permanent as that company’s existence or the vulnerability that the company presents, and by no means perfect and worst of all, your data is bound to the whims of their operating principles and their terms buried behind fancy legalese. Even if these companies are using leading cloud vendors to store your data, amazon or google won’t let you access your data or even store them if the company fails to make their next payment or changes their policy. The line that demarcates your data obsolescence and nirvana is as thick as your service vendor’s ability to pay their next credit card bill. Even if you are an large enterprise who uses AWS, Azure or GCP as a replacement to your data center, moving large amounts of data from one cloud vendor to another could result in a large bill if you don’t plan for it correctly, moreover, the competing (convenient) services offered by each of the cloud vendors strive for feature parity, while inter-operability or even neutrality is far down in their priority list, in other words, a virtual lock-in… but there’s a difference: instead of wearing power suits like the old timey IBM execs, these guys wear hoodies and $300 sneakers and they won’t ever meet with you. Make no mistake that cloud computing is useful, or even mandatory in today’s world (be it for your personal use or for your company), but it doesn’t mean they have your interests in mind.

Secondly, let’s talk about Big Data: yes, big data is probably the most abused and misunderstood term in recent times. Yes, there are huge (pun intended) benefits to big data analysis and visualization to gain insights into your data (personal or enterprise), but who’s really benefiting? Not you. By accumulating data about you, the product companies such as Nest, Ring, Apple, insurance companies (that analyze your driving behavior, health care and other data), and the elephant in the room… your online usage and behavior data that feeds the giant money making machines of Google, Twitter and Facebook (in the form of ads, sponsored posts etc), are the big benefactors of the big data trend.

As an engineer I welcome the big budgets, big salaries and world-changing claims of big data that somehow promises to provide us with insight into human behavior and get that elusive competitive advantage that blows everyone else away, not all companies have the actual data, scale or even wherewithal to squeeze out the “big prize” out of their data. There’s been many debates and studies that show the futility of many of the big data projects and the recent industry pundits’s verdict falls on the side of “little data’’ that uses insight at an individual customer level to make their experiences better and of course don’t forget the cross sell/up-sell opportunities. If we keep providing the service companies our behavior and usage data, they obviously are benefiting from this more than the producers of this data, us. To take a prominent example, Facebook boasts that its content is “user generated” and it’s a mere platform, though quarterly revenues are in billions, when was the last time you got a check from Facebook?

Businesses might think it may not apply to them but think about a service like Slack or QuickBooks, while they accumulating data in their warehouses, you don’t see a dime of it, not to mention the number of inevitable browser plugins and myriad open source and “free” tools that litter the office computer jungle, most of these plugins and tools have been collecting and selling your data, when your internal company browsing behavior to advertisers and other data mongers without your consent (I am sure everyone of your employees read the entire t&c before your clicked that I agree button). Geoffrey Fowler has a fascinating piece on this in the Washington Post. This so called “shadow IT” is so hard to contain, fortunes of companies like Dropbox, Coda, Quip, Slack are built on top of that very theory combined with the freemium catnip. Big Data insights are real in the AI/ML world of facial recognition, spam filtering, even disease research (organizations like Folding@Home are doing great things in this space), but it doesn’t mean every IT organization needs a Big Data initiative or as an individual we should give up our personal data to help organizations like Google, Facebook voluntarily without any compensation, without regards to our privacy violations.

Lastly, let’s talk about the services… ah the beautiful promise of SaaS that launched a million startups. The “move fast and break things” Facebook ethos runs deep in the startup culture and any entrepreneur worth her salt knows the quickest way to launch a company is to sign-up with the plethora of SaaS services that can automate parts of your business so you can focus your core… Want to manage customer data? Signup for Salesforce, Want to send invoices? Signup with bill.com, Want to manage your financial data? Signup with Freshbooks, Want to implement customer service? Signup with Zendesk. The list goes on and on, there is a service for every business function you can think of and more. As of 2019, Synergy Research reports that the Enterprise SaaS market hit $100 billion quarterly run rate. On an average companies big and small use between 40 to 120 services on average depending on the size of the company.

While this is great news for startups or even SMB companies, or even large enterprises, what we are missing is…. data. Specifically the ownership to your own data and the ability to see relationships between your own data because none of these systems talk to each other, and yes, if you want to make them all work with each other to send data from one to another? You can signup with another service like Zapier or Microsoft Flow or Tray.io. If you want to create a background process that ties in all the data into a data warehouse or data lake? Signup with another service for this like Snaplogic or Alooma and store it on the cloud again. Want to get insights into your data? There’s a service for that, want to run reports or produce dashboards? There’s a service for that. You might say this sounds like heaven, but the bad news is that you never own your data and depend upon the various companies to be up and running when you need your data.

How is this different from a traditional data center? The difference is your data is wrapped inside the service sandwich that you can’t get rid of without some serious surgery. The point we miss is that, the data, metadata (format) and its access mechanisms are so interlinked you can’t just replace your service with another if you were to move from one accounts payable system to another.

There are no standards of how an accounting database needs to be designed and what format the data should take so any serious business is virtually bound to the service. The only solution is to design your systems internally that fits your business and maintain a separate data warehouse that is owned by you (this could be on premises or a cloud data center) and use ETL processes (batch and real-time) to feed your data warehouse continuously so the service is interchangeable (and hope the next service you move to can ingest your current data so your business is uninterrupted). If that sounds like an herculean proposition, it is. And honestly how many of the startups or SMBs have capable IT departments to handle this? Not many is my guess.

Consider a single service like a collaboration platform or an online chat service, you think you can migrate all your old data and search index to a newer platform? In truth, we thought we are using Google Docs to circumvent paying for Microsoft Office licensing fees, but what ended up happening is Google is holding your data hostage and let’s you see it whenever for a fee, we wandered here looking for a instant program that can edit our document and ended up giving it our document and at the end of the day our computer has no program and the document vanished too. This is the same for any service/program. What really baffles me is not the size or the number of services, but how many of these vendors or even customers really thought of migration, inter-operability or even data standard definition? One of my pet peeves is how a company labels their product/service, instead of saying what it really does, they give it a fanciful, inventive moniker that sounds like it’s the only product in that space. This breeds chaos because there’s overlap and adds unnecessary complexity.

So what’s the solution? Yes, I am an engineer and I could give multiple solutions or even list out companies (ahem cloud based SaaS Vendors) that claim to solve these problems and be your one stop cloud shop or I could even smugly utter “block chain” and pause for your applause, but I don’t think there’s one solution, even if block chain is our purported savior here, the ecosystem or the services available on block chain networks are not mature enough to abandon traditional computing or cloud models (yet). Even if we move our storage and programs to block chain, we are still bound to the vicissitudes of the peer-to-peer network and its performance and more importantly, you still don’t own your data.

So without solutioning this to death as some of colleagues accuse me often, here’s a short and sweet list of areas to explore to increase your awareness and get your brains going:

  1. Stop idolizing this “cloud” and understand for what it is

I love building software.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store