Movin’ on up… SAN switching to DevOps automation, workload generation to application performance monitoring
I’m reminded of the Jeffersons theme song “movin’ on up” as I read the latest news in the technology world. Every vendor appears to be moving on up and wants a bigger piece of the proverbial pie. SAN switching vendor Brocade aims to move up into automation of DevOps by acquiring StackStorm, workflow generator vendor LoadDynamix merges with application performance monitoring vendor Virtual Instruments.. These developments are a sign that vendors are not content to stick to their niche but keen to expand into new areas and deliver additional value.
Brocade has managed a steady $2.25Billion in annual revenues as of 2015. However jaded storage analysts looked at these earnings with a “okay but where are the growth prospects?”.. Brocade demonstrates out-of-the-box thinking by moving beyond its comfort zone of Fibre Channel and Ethernet switching into DevOps automation and orchestration – a world previously dominated by products like Puppet, Chef, Ansible. From what I hear configuration mgmt. tools like Puppet and Chef work at a node level, they execute code against a node, updated state information is then sent to an upstream device like a Chef server or PuppetDB , then other nodes converge with updated data. These multiple steps result in added latency. StackStorm acts as an uber orchestrator by orchestrating runs of automation tools like Chef or Puppet on different nodes. It should come as no surprise that while numerous customers downloaded and used the open-source tool from StackStorm only a few like Target, Netflix and MasterCard paid for the enterprise version. The pressure to show revenue from paying customers might have caused StackStorm to take Brocade up on its offer of a buy-out. With Brocade thinking out of the box Cisco won’t be content to remain a mere investor in PuppetLabs but might take a more serious interest in being the single pane of glass for orchestration and automation.
On a different note, LoadDynamix is a vendor with an appliance that generates network load and can be used to model workflows. An IT dept. can use it to make an objective comparison of multiple storage solutions to identify the one that best meets their needs. A storage vendor could also use LoadDynamix products to emulate the effect of mount storms from NFS clients. On the other side of the spectrum, Virtual Instruments is a vendor that targeted a niche untapped by SAN vendors – namely troubleshooting latency exhibited by applications running in virtual environments. They do this by capturing and monitoring application flows all the way from multiple hypervisors down to the storage. Since both vendors have non-overlapping technology offerings there is scope for innovation after they merge. This should cause Cisco and Brocade to take notice..
Intel & Micron jointly announced a new class of memory called 3D XPoint™. They claim that this new persistent (non-volatile) memory type will be 1000 times faster and last 1000 times longer than traditional NAND flash. 3D XPoint uses a cross point architecture – the memory cell which stores one bit of data, sits at the point of intersection of a word line and a bit line, allowing each cell to be addressed individually. This is a different approach than is found in transistor based NAND where a large block of cells have to be erased before a single bit can be stored.
The goal in computing has always been to bring data closer to the CPU. Today some data is stored within the CPU, some in DRAM, some in NAND flash based SSD (where latency is measured in microseconds) and the rest on spinning disks (where latency is measured in milliseconds). As CPUs got faster, spinning disk improved in density but didn’t keep up in performance.
Enter 3D XPoint with characteristics like:
• 10 times the density of DRAM
• latency in nanoseconds (one billionth of a second)
• Non-volatile nature (persist data when the power is turned off).
• Durability of 10, 000 000 write cycles (vs 10,000 write cycles for NAND)
What are the implications for a personal computer manufacturer? This new memory type could be used as main memory (think Terabytes of 3D XPoint memory) since it will be almost as fast as RAM but cheaper than RAM. The computer user in turn experiences the efficiencies of an “always-on” application.
How about server manufacturers? If you consider that worldwide over 1 billion Android capable devices were sold in 2014, and if you assume 2 active TCP connections from each of these devices to Google’s datacenter, in 2014 alone 2 billion TCP connections were hitting Google’s custom manufactured servers so the users may run applications like Gmail or Hangout. Google would achieve greater efficiencies by using 3D Xpoint in place of RAM in their custom servers.
How about enterprise applications like real-time fraud detection? Rather than move data from spinning disk to SSD then to RAM, 3D XPoint can pre-fetch the data that may be needed by the application, hence improving the real-time aspect of detecting fraud in ongoing financial transactions.
How about consumer applications? Voice recognition applications like Siri Assistant on a 3D XPoint -enabled smart phone could see a benefit. Another consumer application that would benefit is 8K gaming (at 7680*4320 resolution). 8K is not so far out considering that Sharp is promoting 8K resolution TVs today.
What’s in it for networked storage? Just as RAM, DRAM and NAND Flash co-exist with spinning disk today, 3D XPoint will add one more effective transit point between the CPU in the server and the networked storage. Once the interfaces are worked out it could even replace NAND from SSDs. I welcome any relevant thoughts you may have on this topic.
A Computerized Tomography (CT) scanner uses ionizing radiation in small doses to produce a diagnostic image – a cross sectional image of the human body. Increase the radiation dose above the minimum required level and you risk causing cancer in the patient. How do you find that right balance of minimum dose and optimal diagnostic image?
Consider how one CT scanner maker GE achieves this balance – Each GE CT scanner is connected to a web based tool called GE Dosewatch™ which gives hospitals a web-based radiation dose monitoring system that tracks a patient’s exposure to radiation from imaging devices. This means clinicians can reduce the cumulative radiation dose produced by a series of imaging procedures, while still delivering the image quality needed to diagnose and treat cancer. DoseWatch uses GE Predix™ (GE’s software platform for the industrial internet) which in turn bundles Pivotal software. Gazzang provides encryption and key management for the Pivotal app that is embedded within GE Predix. You may wonder how secure wireless communications is achieved for such a solution? GE partners with AT&T and Verizon who aim to deliver a global SIM for secure machine-to-machine communications.
Meanwhile GE’s competitors namely Siemens and Toshiba are not sitting idle. While GE partners with Pivotal, Siemens partners with Teradata and is deploying Teradata Unified Data Architecture (data warehouse appliance, discovery platform, Hadoop appliance) for a big data lake. Siemens also partners with SAP to use the HANA Cloud Platform (HCP) as the basis of its own cloud to derive insights from IoT machine data. Siemens has its own deviceWISE IOT Cloud software which appears to be their answer to GE’s Predix. Siemens has also invested in CyberFlow Analytics to secure the IoT. Not to be outdone Toshiba has partnered with Microsoft so consumers with sensor-enabled Toshiba devices can access predictive analytics over Microsoft Azure IoT cloud infrastructure. This intersection of healthcare, IoT, big data and predictive analytics is just a scratch on the surface of what is to come in the years ahead.
Cisco Systems predicts that 50 billion devices will be connected to the internet by the year 2020. While the actual number is debatable it is a fact that today billions of devices are generating a cacophony of sensor data. In the field of consumer healthcare, consider the Fitbit which monitors heart rates and sleep patterns. It collects PIA information – names, email addresses, phone numbers, payment account info, height, weight and other biometric information and sends out location data 24×7 using Bluetooth technology. Since most of the user data is sent over HTTP protocols, it is susceptible to hacking as explained here. Fitbit relies on 3rd parties to protect this consumer data and since the data it collects is not officially termed as Personal Health Information (PHI), it is not bound by government regulations like HIPAA. The same is true for products like NikeFuel.
Assume you are looking at the other end of the spectrum, an invalid patient confined to his/her home and using a programmable thermostat like NEST. It has been proven that NEST can be hacked. In principle a cyber-attacker could subject the patient to extremes of heat and cold using their own home’s heating/cooling system! Granted you need physical access to the NEST device – but this can be easily obtained by contractors, painters, cleaning crew!
Consider devices like insulin pumps and continuous glucose monitors. These can be hacked by cyber-attackers who could potentially release an excess dose of insulin causing a severe drop in blood sugar levels resulting in the patient being rendered unconscious.
Security concerns are not limited to wearable devices and devices implanted in the patient’s body as a cardiac defibrillator at a place of work could be hacked to deliver excessively high levels of shock resulting in death.
Why is healthcare more susceptible to cyber-attack? One reason is that unlike credit card hacks which can be spotted almost instantaneously by sophisticated fraud detection algorithms used by the major credit card vendors like Visa, Amex and Mastercard, health care related hacks could go undetected for a long time. This gives the cyber criminals the luxury of doing harm or selling patient information on the black market without having to watch their backs.
What are healthcare companies doing to address this? GE acquired Wurldtech to enhance cybersecurity for its devices deploying sensors. While Wurldtech has focused on protecting Supervisory Control & Data Acquisition (SCADA) systems – which are IT systems used to manage power plants and refineries, the same technology could be re-purposed to protect GE wearable devices from cyber-attacks. GE’s competitor Siemens has invested in cyber-security startups like CyActive and CounterTack. Outside healthcare GE has a range of businesses whose products rely on sensors for their reliable operation: air craft engines, gas turbines, locomotives. Hence GE purchased a 10% stake in Platform-as-a-Service (PaaS) vendor Pivotal and developed its own Predix software (essentially an operating system for industrial equipment) and plans to run Predix over Pivotal’ data lake. The goal is to derive insights which can predict and prevent problems before they occur. While the big vendors like GE and Siemens are taking the right measures, the plethora of emerging wearable device makers must follow their lead or risk putting them and us at considerable risk in the years to come.
Initially coined in the Hadoop context, a “data lake” referred to the ability to consolidate data (structured and semi-structured) from different data silos (say from CRM, ERP, supply chain) in its native format into Hadoop. You didn’t need to worry about schema, structure or other data requirements until the data needed to be queried or processed. For a typical eCommerce site it might be transactional data, data from marketing campaigns, clues from the online behavior of consumers. The goal of an eCommerce site might be to analyze all this data and send out targeted coupons and other promotions to influence prospective buyers.
Later software companies like Pivotal appropriated this term. The storage vendor behind Pivotal – EMC came up with its own marketing spin on a data lake which involved EMC ViPR with Isilon storage on the back-end. Not to be outdone HDS acquired Pentaho and could make a claim that Pentaho actually coined the term “data lake”. Microsoft marketing uses the term “Azure data lake” to refer to a central repository of data where data scientists could use their favorite tools to derive insights from the data. The analyst firm Gartner cautioned that the lack of oversight (“governance” if you want to use big words) of what goes into the data lake could result in a “data swamp”. Not to be outdone, Hortonworks (the company selling services around Apache Hadoop) counters with the argument that technologies like the Apache Knox gateway (security gateway available for Hadoop) enable a way to democratize access to corporate data in the data lake while maintaining compliance with corporate security policies.
Who actually uses a data lake today? Besides Google and Facebook? I’d be curious to know. In the interim deriving insights via Hadoop analytics on data wherever it resides (whether it be on a NetApp FAS system or on some other networked storage) may be the right first step. I’d welcome input from readers who use data lakes today to solve business related problems.
On the news tonight I learned that the Indian spacecraft “Mangalyaan” built at a cost of $73.5M had reached Mars orbit after a 11 month trek, making India the first nation to succeed in its maiden attempt to send a spacecraft to Mars. Granted spacecraft and Mars Rovers aren’t connected to the internet so can’t be categorized under the Internet of Things (IoT) today but who knows what can come to pass years from now…
The analyst firm Gartner claims that 26 billion IoT-ready products will be in service by the year 2020. Sensors are everywhere (dare I say “ubiquitous”?) – from the helmets worn by football players to aircraft engines, from smart-watches to internet tablets, from luxury BMWs to microwave ovens, from smart thermostats made by “Nest”, from 9 million smart meters deployed by PG&E right here in California. Companies like Fedex would like to embed sensors on all your packages so you can track their route to a destination from the comfort of your home office. Appliance vendors like Samsung, LG, Bosch, Siemens are embedding sensors in your consumer appliances – LG and Bosch would like to take photos of your fast emptying refrigerator and send an email with a shopping list to your smartphone. Granted this brings us to the realm of “nagging by appliance” .. So it is going to be a cacophony of sensor data which thankfully we humans can’t hear but we will need to analyze, understand and act on.
Sensors by themselves aren’t very useful. When was the last time you took notice of an auto-alarm blaring away on a busy street? Sensors need actuators (devices which can convert the electrical signal from a sensor into a physical action) Sensors may provide real time status updates in the form of data but there needs to be tools in place to analyze the data and make business decisions. Examples of vendors who build such tools:
- Keen IO – offers an API for custom analytics – what this means for you and me is that if a business isn’t happy with analytics using existing tools nor wants to built an entire analytics stack on their own, Keen IO meets them half way and allows your business to collect/analyze/visualize events from any device connected to the internet.
- ThingWorx (acq by PTC whose head coined the term “product as a service”). He has a point here. In the past we signed up with cellular providers for 2 years of cellphone coverage – now vendors like Karma are offering a better model where you pay-as-you-go for Wifi – effectively relegating 2 year cell phone contracts to a thing of the past.
- Axeda (acq by PTC) – whose cloud makes sense of machine generated messages from millions of machines owned by over 150 customers.
- Arrayent whose Connect Platform is behind the toy company’s Mattel’s IoT Toy network enabling 8 year old girls to chat with other 8 year olds over the Mattel IM-Me messaging system
- SmartThings (acq by Samsung) who offer an open platform to connect smart home devices
- Neul (acq by Huawei) specialized in using thin slices of the spectrum to enable mobile operators to manage the IoT and profit from it.
- Ayla Networks – hosts wifi based weather sensors for a Chinese company.
On the networked storage side, each storage vendor has a different view: DataGravity recommends a selective approach to deciding which pieces of sensor data (from potentially exabytes of unstructured data) to store. EMC recommends customers buy EMC Elastic Cloud Storage appliances and store all sensor data on it (discarding nothing), Nexenta claims that “software-defined-storage” is the savior of the IoT, SwiftStack claims that cloud-based IoT using OpenStack Swift is the way to go.
I think it is naïve to assume that all IoT data will need to be preserved or archived for years to come. Data from sensors in aircraft engines may need to be preserved on low cost disk storage in the event of future lawsuits resulting from air crashes but there is little value in preserving a utility’s smart meter data for 7 years for regulatory reasons if the data can be analyzed in real time to understand consumer usage patterns, enable tiered pricing and the like. By the same reasoning does any vendor really need to preserve sensor data from my $100 home microwave unit for years to come? However I see cloud providers focusing on IoT to need SSD as well as HDD based networked storage.
How about you? What is your view wrt networked storage and IoT? What unique capabilities do you feel need to be delivered to enable IoT in the cloud? Any and all civil feedback is welcome.
In the past if your goal was to isolate applications (from a memory, disk I/O, network I/O resource and security perspective) on a physical server you had one choice – run your application over a guest OS over a hypervisor. Each VM had a unique guest OS on top of which you had binaries/libraries and on top of these your applications. The flip side of this solution is that if you ran 50 applications on a physical server you needed 50 VMs over the hypervisor with 50 instances of guest OS. Fortunately for developers, Linux containers had a recent resurgence and offer you another alternative. If you are an application developer and want to package your source code in Linux containers with the goal of being able to run it on any bare metal server or on any cloud provider, Docker (a startup with less than 50 employees) offers you a way to make a Linux container easy to create and manage.
Benefits of Docker container technology:
- No need for many different guest operating systems on the same server. Instead you run a Docker engine over a Linux kernel (v3.8 or higher) on a 64-bit server and run your apps on binaries/libraries running over the Docker engine. This allows you to do away with the relatively expensive VMware licensing per server.
- Lower performance penalty than with traditional hypervisors (Red Hat KVM, Citrix Xen or VMware ESXi).
This is ideally suited for apps that are stateless and do not write data to a file system. At a high level, Docker containers make it easy to package and deploy applications over Linux. Think of a container as a virtual sandbox which relies on the Linux OS on the host server without the need for a guest OS. When an application moves from a container in host A to a container in host B the only requirement is that both hosts must have the same version of the Linux kernel.
You may ask, how do containers differ from virtual machines? Containers and virtual machines (VM) both isolate workloads on a shared host. Some would argue that containers don’t provide the levels of security one could have with using a VM. VMs also allow you to run a Windows app on a Linux kernel something which is not possible with Docker containers. Container technology is actively used by Google so much so that Google released Kubernetes into the open source community to help manage containers.
Rather than follow the model of the taxi industry which bitterly attacked ride sharing startup Uber, VMware is taking the high ground and embracing Linux containers and its proponent Docker – perhaps recalling Nietzsche’s words “That which does not kill us makes us stronger.”
You may wonder – why aren’t enterprises embracing containers and Docker? One issue with Linux containers is that if your application in the container needs access to data, the database hosting that data has to be housed elsewhere. This means the enterprise has to manage two silos – the container itself and the database for the container. This problem could be solved by giving every application running in a container its very own data volume where the database could be housed. ClusterHQ, an innovative startup offers “Flocker” – a free and open source volume and container manager for Docker which aims to make data volumes in Direct Attached Storage (DAS) portable. ClusterHQ’s future roadmap includes continuous replication, container migration and Distributed Resource Scheduler (DRS) like services – which sound eerily similar to the capabilities offered by VMware vMotion or DRS – causing VMware to put the brakes on an all-out embrace of the Docker ecosystem. Perhaps VMware strategists recalled Billy Livesay’s song “Love can go only so far”
Another startup Altiscale is looking into the problem of how to run Hadoop applications within Docker containers. In view of all this we can be sure of one thing, Linux containers and Docker are here to stay and its just a question of when (not if) enteprises begin adopting this new way of achieving multi-tenancy on a physical server.