The promise of Software Defined Networking (SDN) was to abstract the human network manager from vendor specific networking equipment. SDN promised network managers at cloud providers and webscale hosting companies a utopian world where they could define network capacity, membership, usage policies and have that magically pushed down to underlying routers/switches regardless of which vendor made the router/switch. It offered service providers a way to get new functionality, reconfigure and update existing routers/switches using software rather than having to allocate shrinking CAPEX budgets for newer router/switch hardware. How it proposed to achieve all this was by separating the control plane from the network data plane. The SDN controller was to have a big view of the needs of the applications and translate that need into appropriate network bandwidth.
Just as combustion engine based auto-makers panicked at the dawn of electric cars, proprietary router/switch vendors saw their ~60% gross margins at risk from SDN and hurriedly came up with their own interpretations of SDN. Cisco’s approach based on technology from the “spin in” of Insieme Networks is that if you as a customer want the benefits of SDN, Cisco will sell you a new application-aware switch (NX 9000) which will run an optimized new OS (available in 2014) on which you can use merchant silicon (Broadcom Trident II silicon) and you’ll have OpenFlow, OpenDaylight controllers, a control plane that is de-coupled from the data plane. It assumes that customers would live with the lack of backward compatibility with older Cisco hardware like the Nexus 7000. There was a silver lining to this argument: Should you choose to forgo the siren song of open hardware & merchant silicon and return to the Cisco fold, you will be rewarded with an APIC policy controller (in 2014) which will manage compute, network, storage, applications & security as a single entity. APIC will give you visibility into application interaction and service level metrics. Cisco also claims that using its Application Centric Infrastructure (ACI) switching configuration will lower TCO by eliminating the per-VM tax imposed by competitor VMware’s network virtualization platform NSX and reduce dependence on the VMware hypervisor. VMware with Nicira under its belt, will of course disagree and have its own counter spin.
Juniper’s approach was to acquire Contrail and offer Contrail (commercial version) and OpenContrail (open source version) instead of OpenDaylight. This is a Linux based network overlay software designed to run on commodity x86 servers and aiming to bridge physical networks and virtual computing environments. Contrail can use OpenStack and CloudStack as the orchestration protocol but won’t support OpenFlow.
Startup Big Switch Networks (the anti-overlay-software startup) has continued to use OpenFlow to program switches -supposedly 1000 switches per controller. Once considered the potential control plane partner of the major router/switch vendors they have been relegated to a secondary role quite possibly since Cisco and Juniper have no intentions of giving up their cozy gross margins to an upstart. Another startup Plexxi (the anti-access-switch startup) relies on its own SDN controller and switches connected together by wave division multiplexing (WDM). Its approach is the opposite of that taken by overlay software like Contrail since its talking about assigning a physical fiber to a flow.
Where do SSDs play in all this?
Startup SolidFire makes iSCSI block storage in the form of 1U arrays crammed with SSDs and interconnected by 10GbE. Service providers seem to like the SolidFire approach as it offers them a way to set resource allocation per user (read IOPS per storage volume) for the shared storage. Plexxi is an SDN startup with its own line of switches communicating via Wave Division Multiplexing and its own SDN controller with software connectors. Plexxi and SolidFire have jointly released an interesting solution involving a cluster of all flash storage arrays from SolidFire and a Plexxi SDN controller managing Plexxi switches.
It appears that the Plexxi connector queries the SolidFire Element OS (cluster manager) learns about the cluster, converts this learned information into relationships (“affinity” in Plexxi-speak) and hands it down to a Plexxi SDN controller. The controller in turn manages Plexxi switches sitting atop server racks. What all this buys a service provider is a way to migrate array level quality-of-service (QoS) from SolidFire to network level QoS across the Plexxi switches.
While the big switch vendors are duking it out with technology from Insiemi versus Contrail relying on expensive spin-ins versus acquisitions, their service provider customers like Colt, ViaWest (customer of Cisco UCS servers), Databarracks and others who use SolidFire arrays are looking with interest at solutions like the Plexii-SolidFire solution mentioned above which promises tangible RoI from deploying SDN. Vendors selling high margin switches would do well to notice that the barbarians are at the gates and the citizenry of service providers is quietly preparing to embrace them.
If you are a service provider or enterprise considering deploying private clouds using OpenStack (an open source alternative to VMware vCloud) then you are in the company of other OpenStack adopters like PayPal and eBay. This article considers the value of SSDs to cloud deployments using OpenStack (not Citrix CloudStack or Eucalyptus).
Block storage & OpenStack: If your public or private cloud is supporting a virtualized environment where you want up to a Terabyte of disk storage to be accessible from within a virtual machine (VM) such that it can be partitioned/formatted/mounted and stays persistent till the user deletes it, then your option for block storage is any storage for which OpenStack Cinder (an OpenStack project for managing storage volumes) supports a block storage driver. Open source block storage options include:
- LVM (block storage exposed as logical volumes to the OS)
- Ceph (open source object storage mounted as a thinly provisioned block device)
- Zettabyte File System – ZFS (a file system and volume manager original designed by Sun but now also available as OpenZFS)
Proprietary alternatives for OpenStack block storage include products from IBM, NetApp, Nexenta and SolidFire.
Object storage & OpenStack: On the other hand if your goal is to access multi terabytes of storage and you are willing to access it over a REST API and you want the storage to stay persistent till the user deletes it, then your open source options for object storage include:
- Swift – A good choice if you plan to distribute your storage cluster across many data centers. Here objects and files are stored on disk drives spread across numerous servers in the data center. It is the OpenStack software that ensures data integrity & replication of this dispersed data
- Ceph - A good choice if you plan to have a single solution to support both block and object level access and want support for thin-provisioning
- Gluster – A good choice if you want a single solution to support both block and file level access
Solid state drives (SSD) or spinning disk?
An OpenStack Swift cluster that has high write requirements would benefit from using SSDs to store metadata. Zmanda (a provider of open source backup software) has run benchmarks to prove that SSD based Swift containers outperform HDD based Swift containers especially when the predominant operations are PUT and DELETE. If you are a service provider looking to deploy a cloud based backup/recovery service based on OpenStack Swift and each of your customers is to have a unique container assigned to them, then you stand to benefit from using SSDs over spinning disks.
As a service provider if you are looking for an OpenStack cloud-in-a-box to compete with Amazon S3 consider vendors like MorphLabs. They offer turn-key solutions on Dell servers with storage nodes running NexentaStor (commercial implementation of OpenSolaris and ZFS), KVM hypervisor, VMs running Windows or Linux as the guest OS all on a combination of SSDs and HDDs. The use of SSDs allows MorphLabs to claim lower power consumption and price per CPU as compared to “disk heavy” (their term not mine) vBlock (from Cisco & EMC) and FlexPod (from NetApp) systems.
In conclusion if you are planning to deploy clouds based on OpenStack, SSDs offer you some great alternatives to spinning rust (oops disk).
Price per GB, performance and endurance are the yardsticks used to decide which solid state drive (SSD) to buy for use in a corporate data center or to use in a server or flash based storage array.
Are endurance numbers really comparable? Especially when you consider that one vendor might use consumer grade cMLC NAND while another might use enterprise grade eMLC NAND with vastly different program erase (P/E) cycles? What was the write amplification factor (WAF) – the ratio of SSD controller writes versus the host writes - that was used for the calculation? One vendor might quote endurance in TB or PB written while another might use Drive Writes Per Day (DWPD).
One vendor might state fresh-out-of-the-box (FOB) performance numbers in IOPS on their datasheet while another might display steady state numbers. One might use a synthetic benchmark tool like IOMETER which focuses on queue depth (number of outstanding I/Os), block size and transfer rates instead of an application based benchmark like SysMark which ignores all these criteria and focuses on testing how a real-world application might drive the SSD. Even with tools like IOMETER whether IOMETER 2006, 2008 or 2010 is used will cause the results to vary. To add further complexity, the performance numbers will vary widely depending on whether they were measured with a queue depth (number of outstanding I/Os) of 3, 32, 64, 128 or 256. To compound it one vendor might be looking at compressible data (Word docs, spreadsheets) while another might be quoting numbers for incompressible data (.zip files or .jpeg), some might be using SandForce (now LSI) controllers which compress data before writing it to NAND while others might not. So what is an SSD buyer to do? Get a drive from a vendor you trust and run your own benchmarks whether they are synthetic or application based and derive your own conclusions.
Now why do I find $ per GB as a yardstick amusing? Consider this analogy – could we convince a Japanese consumer that the cantaloupe we buy from a local store for $2.99 here in California is equivalent to a musk melon purchased in Japan for $16,000 yen? From a $ per melon point of view, the price differences are difficult for us to fathom but to a buyer of the $16,000 melon it is apparently a premium worth paying for.
- High frequency Trading
- High end NAS filers for enterprise use
Consider your first area of focus – High frequency trading (HFT). As we discussed before it is extremely fast trading by traders using supercharged computers, complex programs & algorithms and servers tactically co-located next to servers at a trading exchange. HFT traders capitalize on the split second advantages offered by their unique systems. If you are building computer infrastructure for high frequency trading then when it comes to memory devices that can handle transactional logging you have to choose between HDD, NAND flash based SSD and DRAM based SSDs.
For your second area of focus, you may be building high end NAS filers based on the Zettabyte File System and you will need an accelerator device for the ZFS Intent Log (ZIL). This accelerator device has to be optimized for synchronous writes and must exhibit two key characteristics:
- Extremely low latency
- Very high sustained write IOPS
In addition, The ZIL accelerator has to be accessible from both nodes of a 2 node cluster (for high availability) to allow both nodes to access the log. This precludes a single port SATA interface based NAND flash SSD and requires that you consider dual ported SAS SSDs. While traditional NAND flash based SAS SSD will give you an advantage over the fastest HDD, you are still dealing with SSDs which wear out over time. You are forced to decide between an eMLC NAND based SSD with 30,000 Program Erase (P/E) cycles or a relatively expensive SLC NAND based SSD with 200,000 P/E cycles.
Ideally you want a Non Volatile Memory (NVM) device with infinite endurance, ultra-low latency and extremely high sustained IOPS performance which stays consistent regardless of the IO distribution (random, sequential or mixed). Such a device would give you the best of both worlds: the SSD form factor that you are familiar with and the infinite endurance of DRAM. However DRAM based SSDs on the market today are more likely to exhibit latencies of ~23 micro seconds and ~65,000 read IOPS with 4K sustained and 50,000 write IOPS with 4K sustained. I would contend that you’d be better served with emerging products that exhibit ultra-low latencies of less than 5 micro seconds and 125,000 read IOPS with 4K sustained. When I think of how far these emerging DRAM drives have come in terms of performance, I’m reminded of the Oldsmobile jingle “This is not your father’s Oldsmobile”.
However if I leave you with the conclusion that these blazing fast DRAM drives are relevant only for HFT and ZFS ZIL then I would be doing you a dis-service. I believe that these emerging DRAM drives would be ideal devices to store log files for write-intensive mail systems, logs for Microsoft Exchange and metadata for file systems which allow you to separate metadata from file data.
So what do you think? Is a DRAM drive in your future? All constructive feedback from integrators and storage practitioners is welcome.
Considering that more than half of all the stock market trades in the USA come from high frequency trading let us consider how the emergence of Serial Attached SCSI (SAS) SSDs helps this particular segment.
High Frequency Trading (HFT) involves looking for obscure signals in the market (including spikes in interest rates) using very high end servers, making trading decisions and conveying orders to the exchanges in microseconds (millionth of a second). Sophisticated HFT algorithms on servers placed close to the trading exchanges, combined with technologies like wireless microwave instead of fiber optics, help ensure that trading decisions occur in micro-seconds. Such HFT applications typically use high end servers with multi-core processors along with hardware and firmware optimized for the lowest possible latencies. This is where solid state drives come in. With no moving parts, no noise, very low power consumption, 500x improvement in IOPS (100,000 IOPS for SSD versus 200 IOPS from a 15000 rpm SAS HDD), what’s not to like about SSDs? In terms of workloads, SSDs outperform HDDs when it comes to random small block (8KB or 4KB block size) workloads as found in apps like OLTP.
In enterprise SSDs you have a choice: Serial ATA (SATA) or Serial Attached SCSI (SAS). Servers used in HFT would benefit from SAS over SATA SSDs. Why is that you ask?
- Multi-path: SAS SSDs offer dual-porting (for high availability – if one path to the data on the SSD goes down there is another path to access the same data)
- Longer cable lengths (25 feet versus 3 feet for SATA) due to the use of higher signaling voltages by SAS.
- Greater transfer rates or throughput: Between 6 Gb/s and 12 Gb/s for SAS.
- Support for wide ports: Multiple paths between the server and the SSD device.
- Data integrity end-to-end: Achieved using cyclic redundancy checks (CRC) from the time data leaves the server travels to the SSD and returns back to the server.
To understand the need for SAS SSDs we must first understand what an SSD is all about. An SSD comprises a device controller (made by Marvell, SandForce, Indilinx, PMC Sierra etc.,) behind which is NAND flash memory (made by Micron, Samsung, Toshiba etc.,) managed by a mgmt. system within the enclosure of the SSD.
NAND flash memory refers to non-volatile memory (contents are retained even when power to the circuit is shut off) using a logical circuit called a NAND gate. The SSDs interface to the server could be SATA, SAS (6 Gb/s or 12 Gb/s) or PCIe.
SSDs write information in a sequential manner into NAND flash in contiguous blocks (each block has multiple pages each of which is ~ 8K in size). Unlike mechanical HDDs which can merrily overwrite information, SSDs have a quirk in that to reclaim a page the SSD has to erase an entire block where the page resides. (A human analogy might be where you and your significant other are enjoying a candle-lit dinner in a restaurant when the maître d’hôtel walks over to request that you move to another location in the restaurant as a celebrity party needs to be seated in a contiguous set of tables within the same section of the restaurant). In SSD terminology the flash controller (maître d’hôtel in our human example) would be doing “garbage collection” by re-allocating data in pages within a block to a new block prior to overwriting the first block
Enterprise class SSDs are usually described in terms of:
- Endurance or Program/Erase cycles (the more you write to a NAND flash cell the weaker it becomes eventually getting marked as a bad cell). Wear leveling usually done by the flash controller refers to a way to prolong the life of the NAND flash blocks by distributing writes across blocks.
- Write amplification factor (refers to the amount of data the SAS controller in the server has to write in relation to the amount of data the flash controller in the SAS SSD has to write)
- Drive writes per day – DW/D (10 to 25 DW/D) over a period of many years (3 to 5 years).
When comparing SAS SSDs you’d look at features like:
- Type of NAND (Usually MLC, eMLC or cMLC) used.
- Performance at “steady state” not just out-of-the-box.
- Mean time between failure (MTBF) and mean time to data loss.
- Sustained read/write throughput in MB/s.
- Read/write IOPS with 4 KB random operations
- IOPS with a mixed workload of read and writes.
- Encryption (128 bit or 256 bit AES compliant)
- Unrecoverable bit error rates
- Protection for any in-flight data in the event of a sudden power loss to the SSD
- Extent to which SMART (Self-Monitoring, Analysis and Reporting Technology) attributes are supported
SAS SSDs are a good choice wherever you have enterprise class servers with workloads that have write-once read-many characteristics. In conclusion, if you have high end applications like HFT running on enterprise class servers you need enterprise class SAS SSDs. This is not to imply that only the HFT segment benefits from SAS SSDs, the same benefits can also be experienced across other verticals like Web 2.0 and cloud computing.
Microservers came into the mainstream when cloud providers realized that they don’t need beefy power-hungry servers if they are not running heavy computing workloads. If a provider is to run less demanding compute jobs like serving contact information up to a user on your website, why deal with the floor tile space, power consumption, HVAC costs associated with high end servers? Why not benefit from the power savings associated with Intel Atom or ARM based servers?
For instance, HP runs a portion of its www.hp.com website using its own microservers using the Intel “Centerton” Atom S1200 processers with appreciable power savings. HP claims that microservers consume “89 per cent less energy and 60 per cent less space”. This is assuming 1600 Moonshot Calxeda EnergyCore microservers (based on ARM based SoC) crammed in a ½ server rack to do the job typically done by 10 racks of 1U servers.
HP markets the microservers as suitable for hyperscale workloads. A hyperscale workload is a lightweight workload that has to be done in large numbers so requires scaling from a few servers to several thousand servers. To appreciate the low power consumption of ARM based servers check out this article on “bicycle powered ARM servers”.
One concern that makes cloud providers hesitate to use ARM based microservers is whether legacy apps written for x86 based servers can be ported over to ARM based microservers. To address this, a UK-based company called Ellexus sells a product called Breeze which makes it easy to migrate applications from x86 based servers to ARM based servers. What if you are not sure that Breeze can do the job for you? Ellexus partners with a cloud provider Boston who offer ARM-as-a-cloud service so you can actually try migrating your apps using Breeze without the intial CAPEX of buying ARM based microservers for your in-house datacenter.
Dell offers the Intel Xeon E3-based Dell PowerEdge C5220 microservers. IBM is talking about Hadoop on the P5020.
To get further savings and cut out the 25% to 30% gross margins made by HP and Dell consider going directly to their Original Design Manufacturers (ODM)s like Quanta. Taiwan based Quanta the server arm of $37bn Quanta Computer makes “white box” servers for big names like Google, Facebook, Amazon, Rackspace, Yahoo and Baidu. The Quanta S910 series is an example of a microserver using Intel Xeon processors. Quanta addresses the Facebook Open Compute Project which is of interest to cloud operators like Rackspace. The Open Compute Project and specifically Facebook’s “Group Hug” motherboard spec aims to support different vendor CPUs on the same system. This news will send shivers through the boardrooms of Intel and AMD but is good for you the consumer as you are not beholden to one CPU vendor.
At a time when major server vendors like IBM are trying to exit the x86 server business it may make sense to cut out the big name server vendors and deal directly with their suppliers if not for savings, at least to ensure a long term consistent roadmap with ever more computing capacity and lower power and space consumption.
Most commercial switches, routers, firewalls in your datacenter support NetFlow (network logs) and in some cases IPFIX (next-gen replacement for NetFlow version 9). Why not use NetFlow to alert you to cyber threats like worms, botnets, Advanced Persistent Threats (APT)?
The idea of using NetFlow is not a new one. Argonne National Labs have been using NetFlow to detect zero-day attacks since 2006.
An Advanced Persistent Threat (APT) starts by mining employee data from Facebook, LinkedIn and other social media sites and focuses on stealing a corporation’s intellectual property using innocuous applications like Skype to move the content around. APTs fly under the radar of signature-based perimeter security appliances like firewalls and Intrusion Detection Systems (IDS). However, you can use NetFlow/IPFIX to identify APTs by comparing flows in the NetFlow/IPFIX collector with a host reputation database offered by cloud services like McAfee GTI. The actual ingest of the host reputation database and comparison with flows would involve a tool like Plixer Scrutinizer™. By blocking traffic going to the known compromised hosts (which hosts the APT command and control malware) you are neutralizing the goal of the adversary who sent the APT into your network.
Why bother with IPFIX (next-gen NetFlow) when there are older versions of NetFlow?
You can export URL information via IPFIX (using vendor extensions supported by IPFIX). This allows you to determine what URL a user clicked on before succumbing to malware. How many other people clicked on the same bad URL? Products which export URL information via IPFIX include Ntop nProbe, Dell SonicWALL, Citrix AppFlow.
For Voice-over-IP (VoIP) traffic you can export details like caller-id, codec, jitter and packet loss.
Why use dedicated NetFlow/IPFIX sensors when routers/switches/firewalls may suffice?
Even router vendors like Cisco recognize that customers who buy high end routers may not want to expend expensive CPU cycles on NetFlow/IPFIX generation nor rely on sampling NetFlow which makes it unusable for cyber-security applications. The need is for offloaded appliances that product packet-accurate non-sample NetFlow/IPFIX. Cisco’s own NetFlow Generation Appliance (NGA) is an option. The older NGA 3140 tops out at 120,000 frames per second (fps). Higher end offload appliances from some vendors can sustain 250,000 to 500,000 fps to keep up with busy 10 Gb network pipes.
So we have a way to generate NetFlow/IPFIX but what about the analytics needed to actually detect cyber-attacks? While you may have a traditional SIEM (HP ArcSight ESM, McAfee ESM, IBM QRadar) or a tool like Splunk it is unrealistic to send NetFlow/IPFIX data at very high rates into these systems. A better way would be to trim down the traffic and analyze it on the wire before sending it to the SIEM.
NetFlow Logic a bay area startup has a high volume NetFlow processing product “NetFlow Integrator” which can ingest NetFlow/IPFIX records and process the stream in flight using an in-memory database. The product scales its throughput based on the number of underlying server cores. For instance a 16 core server would allow it to scale throughput to over 500,000 fps.
The product is not a NetFlow collector but is categorized as a NetFlow/IPFIX Mediator (see RFC 5982). NetFlow Integrator reduces NetFlow data by consolidating information into “conversations” rather than flows within a conversation. Flow records are processed by one or more rules (canned or custom – creating using a GUI/SDK) which have their own logic to apply to each flow record. These rules can aid in the following types of detection:
Detection of Botnet:
A user would load a list of known Command & Control servers (possibly obtained from sites like Emerging Threats or from your own private source) into the rule. Every incoming NetFlow record is examined to determine if the source or destination IP address matches this list. If there is a match this matched information is forwarded to the SIEM. The SIEM in turn will alert a security analyst if any botnet slaves are detected on the network.
NetFlow Integrator has rules to identify scanners who are doing “port sweeps” of your network. It can also look for data exfiltration by examining an infected host that starts proliferating on the internal network. A custom algorithm detects when a client suddenly starts behaving like a server. This is something that can’t be done by signature based firewalls.
In conclusion, to detect malware/botnets/APT use NetFlow/IPFIX which is something your routers, switches, firewalls support today. Keep your existing SIEM in place but introduce an IPFIX offload appliance especially if you have large 10 Gb network pipes and don’t want to burden your routers. Use a tool like that from NetFlow Logic to analyze NetFlow/IPFIX records on the wire and use your SIEM for the alerting and remediation.