Microservers came into the mainstream when cloud providers realized that they don’t need beefy power-hungry servers if they are not running heavy computing workloads. If a provider is to run less demanding compute jobs like serving contact information up to a user on your website, why deal with the floor tile space, power consumption, HVAC costs associated with high end servers? Why not benefit from the power savings associated with Intel Atom or ARM based servers?
For instance, HP runs a portion of its www.hp.com website using its own microservers using the Intel “Centerton” Atom S1200 processers with appreciable power savings. HP claims that microservers consume “89 per cent less energy and 60 per cent less space”. This is assuming 1600 Moonshot Calxeda EnergyCore microservers (based on ARM based SoC) crammed in a ½ server rack to do the job typically done by 10 racks of 1U servers.
HP markets the microservers as suitable for hyperscale workloads. A hyperscale workload is a lightweight workload that has to be done in large numbers so requires scaling from a few servers to several thousand servers. To appreciate the low power consumption of ARM based servers check out this article on “bicycle powered ARM servers”.
One concern that makes cloud providers hesitate to use ARM based microservers is whether legacy apps written for x86 based servers can be ported over to ARM based microservers. To address this, a UK-based company called Ellexus sells a product called Breeze which makes it easy to migrate applications from x86 based servers to ARM based servers. What if you are not sure that Breeze can do the job for you? Ellexus partners with a cloud provider Boston who offer ARM-as-a-cloud service so you can actually try migrating your apps using Breeze without the intial CAPEX of buying ARM based microservers for your in-house datacenter.
Dell offers the Intel Xeon E3-based Dell PowerEdge C5220 microservers. IBM is talking about Hadoop on the P5020.
To get further savings and cut out the 25% to 30% gross margins made by HP and Dell consider going directly to their Original Design Manufacturers (ODM)s like Quanta. Taiwan based Quanta the server arm of $37bn Quanta Computer makes “white box” servers for big names like Google, Facebook, Amazon, Rackspace, Yahoo and Baidu. The Quanta S910 series is an example of a microserver using Intel Xeon processors. Quanta addresses the Facebook Open Compute Project which is of interest to cloud operators like Rackspace. The Open Compute Project and specifically Facebook’s “Group Hug” motherboard spec aims to support different vendor CPUs on the same system. This news will send shivers through the boardrooms of Intel and AMD but is good for you the consumer as you are not beholden to one CPU vendor.
At a time when major server vendors like IBM are trying to exit the x86 server business it may make sense to cut out the big name server vendors and deal directly with their suppliers if not for savings, at least to ensure a long term consistent roadmap with ever more computing capacity and lower power and space consumption.
Most commercial switches, routers, firewalls in your datacenter support NetFlow (network logs) and in some cases IPFIX (next-gen replacement for NetFlow version 9). Why not use NetFlow to alert you to cyber threats like worms, botnets, Advanced Persistent Threats (APT)?
The idea of using NetFlow is not a new one. Argonne National Labs have been using NetFlow to detect zero-day attacks since 2006.
An Advanced Persistent Threat (APT) starts by mining employee data from Facebook, LinkedIn and other social media sites and focuses on stealing a corporation’s intellectual property using innocuous applications like Skype to move the content around. APTs fly under the radar of signature-based perimeter security appliances like firewalls and Intrusion Detection Systems (IDS). However, you can use NetFlow/IPFIX to identify APTs by comparing flows in the NetFlow/IPFIX collector with a host reputation database offered by cloud services like McAfee GTI. The actual ingest of the host reputation database and comparison with flows would involve a tool like Plixer Scrutinizer™. By blocking traffic going to the known compromised hosts (which hosts the APT command and control malware) you are neutralizing the goal of the adversary who sent the APT into your network.
Why bother with IPFIX (next-gen NetFlow) when there are older versions of NetFlow?
You can export URL information via IPFIX (using vendor extensions supported by IPFIX). This allows you to determine what URL a user clicked on before succumbing to malware. How many other people clicked on the same bad URL? Products which export URL information via IPFIX include Ntop nProbe, Dell SonicWALL, Citrix AppFlow.
For Voice-over-IP (VoIP) traffic you can export details like caller-id, codec, jitter and packet loss.
Why use dedicated NetFlow/IPFIX sensors when routers/switches/firewalls may suffice?
Even router vendors like Cisco recognize that customers who buy high end routers may not want to expend expensive CPU cycles on NetFlow/IPFIX generation nor rely on sampling NetFlow which makes it unusable for cyber-security applications. The need is for offloaded appliances that product packet-accurate non-sample NetFlow/IPFIX. Cisco’s own NetFlow Generation Appliance (NGA) is an option. The older NGA 3140 tops out at 120,000 frames per second (fps). Higher end offload appliances from some vendors can sustain 250,000 to 500,000 fps to keep up with busy 10 Gb network pipes.
So we have a way to generate NetFlow/IPFIX but what about the analytics needed to actually detect cyber-attacks? While you may have a traditional SIEM (HP ArcSight ESM, McAfee ESM, IBM QRadar) or a tool like Splunk it is unrealistic to send NetFlow/IPFIX data at very high rates into these systems. A better way would be to trim down the traffic and analyze it on the wire before sending it to the SIEM.
NetFlow Logic a bay area startup has a high volume NetFlow processing product “NetFlow Integrator” which can ingest NetFlow/IPFIX records and process the stream in flight using an in-memory database. The product scales its throughput based on the number of underlying server cores. For instance a 16 core server would allow it to scale throughput to over 500,000 fps.
The product is not a NetFlow collector but is categorized as a NetFlow/IPFIX Mediator (see RFC 5982). NetFlow Integrator reduces NetFlow data by consolidating information into “conversations” rather than flows within a conversation. Flow records are processed by one or more rules (canned or custom – creating using a GUI/SDK) which have their own logic to apply to each flow record. These rules can aid in the following types of detection:
Detection of Botnet:
A user would load a list of known Command & Control servers (possibly obtained from sites like Emerging Threats or from your own private source) into the rule. Every incoming NetFlow record is examined to determine if the source or destination IP address matches this list. If there is a match this matched information is forwarded to the SIEM. The SIEM in turn will alert a security analyst if any botnet slaves are detected on the network.
NetFlow Integrator has rules to identify scanners who are doing “port sweeps” of your network. It can also look for data exfiltration by examining an infected host that starts proliferating on the internal network. A custom algorithm detects when a client suddenly starts behaving like a server. This is something that can’t be done by signature based firewalls.
In conclusion, to detect malware/botnets/APT use NetFlow/IPFIX which is something your routers, switches, firewalls support today. Keep your existing SIEM in place but introduce an IPFIX offload appliance especially if you have large 10 Gb network pipes and don’t want to burden your routers. Use a tool like that from NetFlow Logic to analyze NetFlow/IPFIX records on the wire and use your SIEM for the alerting and remediation.
Snort and Intrusion Detection:
Snort is a widely used open-source network Intrusion Detection System (NIDS) capable of both traffic analyses in real time as well as packet logging. The reason for its popularity is that it is open source and effective at detecting everything from port scans, buffer overflows to OS fingerprinting attempts. Known attacks follow a certain activity pattern and these are captured in “signatures” available from the open source community as well as from SOURCEfire.
In the Snort architecture, the packet sniffer as the name suggests eavesdrops on network traffic, the preprocessor checks packets against plug-ins to determine if the packets exhibit a certain behavior, the detection engine takes incoming packets and runs them through a set of rules. If the rules match what is in the packet then an alert is generated. The alert may go to a log file or a MySQL or PostGres database.
Using Snort on big data stored in Hadoop
What if today you received new Snort signatures you didn’t have 3 months ago but want to use the new signatures to detect zero-day attacks (unkown exploits) in historical packet capture data? This historical packet capture data may be in archive storage within your corporate data center or located on cloud storage like Amazon S3.
One solution is to analyze full packet captures using Apache Pig (a tool that abstracts a user from the complexities of MapReduce). If you aren’t comfortable using MapReduce but have a few days of packet capture data on your laptop and know how to write some queries to query this local capture data, you can then transition your queries to a Hadoop cluster containing weeks or months of packet capture data using an open-source tool called PacketPig.
PacketPig (an open source project located on github) offers many loaders (Java programs which provide access to specific info in a packet capture) one of which is SnortLoader () which allows you to analyze months of packet capture data dispersed across Hadoop nodes. The way you detect a zero day attack using PacketPig is by using SnortLoader() to scan archived packet capture data using old Snort signatures (from an old snort.conf file) then scanning it a second time using the latest Snort signatures. After filtering out signatures that appear in both scans what you have left are zero day attacks. More details of how this is done may be found here. This is yet another example of how Hadoop running on commodity servers with direct attached storage can help provide a cyber-security solution for zero day attacks.
Why secure the Hadoop cluster?
Hadoop has 2 key components: Hadoop Distributed File System (HDFS) and MapReduce. HDFS consists of geo-dispersed DataNodes accessed via a NameNode. Hadoop was designed without much security in mind hence a malicious user can bypass the NameNode and access a data node directly and if he/she knows the block location of data, that data can be retrieved or modified. In addition, data that is being sent from a DataNode to a client can be easily sniffed using generic packet sniffing technology.
Securing the Hadoop cluster requires that you know that authentication (Is the user who claims to be Bob really Bob?) differs from authorization (Is user Bob authorized to submit HDFS or MapReduce jobs?)
To address the question of authentication Hadoop recommends the user of Kerberos which is bundled with Hadoop. Unlike network firewalls which assume that all dangers reside outside the corporate network, Kerberos operates on the assumption that the network connections are unreliable and a possible weak link. How does Kerberos itself work? We assume that in addition to the client and the server there is a Kerberos Key Distribution Center (KDC) which has 2 components: Authentication Server, Ticket Granting Server.
- A client who wishes to access a server will first authenticate itself with the Authentication Server (AS). The AS will return an authentication ticket called a Ticket-Granting-Ticket (TGT).
- The client now uses the TGT to request a service ticket from the Ticket Granting Server (TGS) in the KDC
- Lastly the client uses the service ticket to authenticate itself with the server which provides the desired service to the client. In a Hadoop cluster the server could be the NameNode or JobTracker.
Benefits of using Kerberos for authentication:
- Kerberos helps you keep rogue nodes and rogue applications out of your Hadoop cluster.
It is also recommended that Hadoop daemons use Kerberos to do authentication between daemons.
Encryption of data-in-motion can be achieved using SSL/TLS to encrypt data moving between nodes and applications. Encryption of data-at-rest at the file layer is recommended to protect against malicious users who obtain unauthorized access to data nodes to inspect files. While the benefits of encryption are clear, encryption is very compute intensive. Intel has jumped in to address this with its own flavor of Apache Hadoop that has been optimized to do AES encryption provided your Hadoop nodes are using Intel Xeon processors. Intel claims that Intel Advanced Encryption Standard New Instruction (AES-NI) which is built into Intel Xeon processors can accelerate encryption performance in a Hadoop cluster by 5.3x and decryption performance by 19.8x.
Cynics would argue that Intel’s new-found interest in Hadoop has more to do with keeping low-cost ARM processor based microservers at bay and less to do with improving Hadoop security. Whatever the rationale behind it, speeding up encryption & decryption can only increase the use of these data protection techniques by the Hadoop user base.
Security in future (a gateway to Hadoop)
Hortonworks (as well as NASA, Microsoft and Cloudera) are promoting Knox Gateway a perimeter security solution involving a single point of authentication for the Hadoop cluster. Clients who want to access the Hadoop cluster would have to first traverse the gateway which itself would reside in a DMZ. Time will tell how this new technology is embraced by the Hadoop user community.
ETL is another 3-letter acronym which stands for Extract Transform & Load. ETL refers to the extraction of data from diverse sources (like relational databases, flat files), transforming (sorting, joining, synchronizing, cleaning) said data and then loading it into a data warehouse. This workflow assumes you know before hand the type of repetitive reports you would run on the data in the data warehouse.
Companies like Informatica, IBM, Oracle, SAS, Business Objects have built profitable businesses providing enterprise class ETL for huge volumes of data in heterogeneous environments. Now that open-source Hadoop on commodity x86 server hardware can import data from ERP, DBMS, XML files and export it after some transformations into a regular data warehouse, one is tempted to ask: Is ETL still relevant now that Hadoop does a lot of the ETL function?
To answer this question step back and ask yourself:
Did Hadoop replace my traditional data warehouse?
If your goal is to determine sales trends for you company, you are more likely to run queries on corporate data in the data warehouse, if you are the marketing dept of a company tasked with determining the RoI from a recent email campaign you are morely likely to query data residing in a datawarehouse like Netezza rather than direct your queries at the Hadoop cluster. So just like Hadoop didn’t replace your data warehouse, its not likely to replace your ETL tools either.
Data warehouse and ETL continue to exist, what changes is where/how you actually implement them. For instance Amazon is offering “RedShift” a data warehouse in the public cloud at prices ~$1000 per TB, this takes away the need to spend $19,000 to $25,000 per TB to store the same data in a traditional data warehouse on big iron within your private data center. If you decide to go with RedShift you still need an ETL tool to get all of your on-premise data into RedShift in the public cloud.
As an enterprise you may want to merge your customer data currently residing on salesforce.com with your in-house financial systems. There again you’d need ETL tools possibly from vendors like SnapLogic.
The demarcation lines between what data is stored in-house versus in-the-cloud are getting blurrier every day. In this changing world the need for ETL continues to exist, what may change is who ends up providing the ETL functionality, a few years ago it was only Informatica or IBM with high upfront costs in training, consulting and acquisition. Today it might be open source tools like Pentaho Kettle or Talend or commercial tools from SnapLogic, Syncsort and other vendors with more innovative approaches and lower acquisition costs.
Last week at RSA Conference 2013, I had the opportunity to attend some Chief Information Security Officer (CISO) round-table sessions (or the RSA version of it with ~100 people crammed in cramped seats staring at 4 CISOs on stage and praying that a fire alarm doesn’t set off a mad rush to the exits). Here is what I took away:
- Predicting potential risk-exposure to the firm and keeping the CEO’s name out of the front cover of the New York Times.
- Responding to the Board’s concerns about enterprise risk management. Reducing the risk to your CEO of “material weakness” in internal controls.
- Building business relationships with the CFO so that CISO budgets can be linked to corporate goals like innovation.
- Educating the C-suite on how to present on risk profile to share-holders and customers.
- Determining the level of risk that is acceptable to the firm as a cost of doing business.
- Dealing with shrinking budgets and intense scrutiny over new head-count in the CISO team.
- Using “loss avoidance” rather than ROI to get approval for projects – Advanced Persistent Threat (APT) detection, Distributed Denial of Service (DDoS) attacks, and big-data. This could involve explaining to the CFO how the cost of a project is far lower than the cost of a data breach.
- Operationalizing cyber threat indicators to drive product selection, investments and training.
- Addressing the Bring Your Own Device (BYOD) issue.
- Working effectively with internal auditors who may often overstep their bounds in interpreting regulations like HIPAA or Sarbanes-Oxley (SOX).
- Managing up so that the Chief Technology Officer (CTO) or Chief Information Officer (CIO) will go to the c-suite to get you the budget you need.
- Tying CISO projects to network availability or uptime.
- Using technologies like Data Loss Prevention (DLP) to detect data loss via email.
- Implementing security for home grown applications by implementing coding guidelines at the start of an internal project.
- Sharing threat intelligence on topics like DDoS attacks across the organization and between like organizations.
What causes the most angst among CISOs?
- Dealing with geographically dispersed sales people.
- Having to effectively educate employees on proper security as opposed to relying solely on technologies like encryption.
- Having to build high performance teams of the best individuals with expertise in Virtual Desktop Infrastructure (VDI), application virtualization, data flow mapping and business flow mapping.
- Lack of visibility into the pedigree and origin of internal hardware and software.
- Creating contingency plans for legacy apps (especially in sectors like healthcare and education) that may not work with VDI and Citrix.
- Projects like encryption and two-factor authentication (2 out of 3 factors: something known, something possessed and something unique about a person).
- Developers who try to outsource bug management to a CISO and technologies like Web Application Firewall (WAF) instead of doing internal code reviews! Developers must understand that the WAF is for dealing with legacy code which cannot be rewritten in a cost-effective manner.
- Identity and Access Management (IAM) projects which have seemingly no end in sight.
What causes the least angst?
- Security for mobile devices – every vendor and his brother is offering a solution in this area.
- Noise around newly minted marketing terms like “big data”. Vendors don’t seem to realize that it is not big data itself but big data analytics and the resulting insights which are of use.
What challenges do CISOs face today?
- Fraud detection and IT security groups working in silos, with little or no interaction.
- Info-sec and audit departments not always in lock-step.
- Responding quickly to cyber-attacks and implementing damage control
- Moving big data systems into the cloud as the economics of the cloud become more compelling.
What is most interesting to a CISO?
- Micro-virtualization capabilities where a micro virtual machine traps malware and analyzes it for the IT administrator.
- Industry wide ways of dealing with cyber-threats.
- The context around big data. For instance while enterprise storage vendors would love to see you collect and archive petabytes of big data blindly, the CISO is more likely to use geo-location and HR data for co-relation but unlikely to see value in just blindly collecting and archiving such data for extended periods.
As a CIO you may get tasked by your CEO with identifying a solution for say 10,000 employees who need to move around the company but still have access to their desktop environment (applications, data). The employees might use a laptop computer one day, an iPAD the next but still expect to access all the same applications and have their data remain consistent across locations. Before you invite your embedded storage vendor to give you a high powered pitch on “why our enterprise storage is the only one for VDI“ you may wish to explore a few alternative solutions.
Virtual Desktop Infrastructure or ”VDI“ refers to the scenario where a user’s desktop no longer resides on a desktop or laptop computer but instead resides in a virtual machine running over a hypervisor (software that creates and runs virtual machines) on a high end server residing in a corporate data center. The end user uses a “thin client” which could be a notebook computer or even an iPAD to access compute resources from this centrally located server. All applications and data continue to reside on the central server.
VDI using VMware (or Microsoft or Citrix) hypervisors solves one part of the problem. You run the virtualization vendor’s hypervisor on a high end server and have the server assign virtual machines (VM) images to each desktop. The servers connect to direct attached storage (DAS) or to networked storage – which could be block storage accessed over an iSCSI or Fibre Channel SAN or file storage accessed over Ethernet. While this solution is elegant in its simplicity, it creates some new issues:
- When ~10,000 employees get into the office at 9 am and all of them bring up their desktops this results in “boot storms”. During the end of quarter when salespeople are trying to close deals and finance personnel are trying to close the books, there is contention for IO from all the desktops owned by finance and sales.
- While tools like VMware View Composer allow you to make thousands of clones of a single virtual machine, your networked storage is now under a heavy burden as it is expected to store images for these thousands of desktops. In addition, networked storage uses spinning disks and disks like having data written to them in a sequential fashion so they may cache it and then write the data with minimal movement of the heads. However VDI results in servers sending IO streams which are small block, random in nature and which are predominantly writes. This in turn makes poor use of the networked storage resulting in poor desktop performance and unhappy users.
You ask yourself:
- Is there a way to better use my existing networked storage (NAS, SAN) by reducing IO bottlenecks?
- Should I do away with networked storage and use only direct-attached-storage (DAS)?
- Should I do away with storage altogether and rely on just RAM in my servers to serve up VDI?
A solution that addresses all of the above involves VMware (no reason you can’t use Microsoft Hyper-V) for server virtualization, Cisco UCS (or HP or Dell) blade servers for the compute, 3rd party software from Atlantis Computing or Virsto for the remaining magic. Be aware that if you go the VMware route you’ll be looking at least:
- VMware vSphere for the virtualization
- VMware vCenter to configure, provision and manage your VMs
- VMware Composer to take a single VM and clone it numerous times
- VMware View Planner to simulate client desktop workloads so you head off unhappy customers
- VMware View Persona Mgmt. to ensure that user data and settings are stored in a central profile
Here are 2 ways to address this problem:
- Deploy Atlantis Computing ILIO software – This physically runs in a VM on the hypervisor in the server. Logically it sits between other VMs (supporting the virtual desktops) on the hypervisor and the VDI storage which would be your networked storage from EMC, NetApp, HP or IBM. ILIO provides an NFS or iSCSI interface to the VMs but provides NFS, iSCSI or Fibre channel interfaces to your existing networked storage. ILIO processes all the VDI traffic locally with Windows NTFS content awareness within the server. This means less traffic goes to your networked storage. In addition ILIO does inline de-duplication of VDI images before they reach your storage – this translates into fewer Windows image components actually stored on your relatively expensive networked storage! (no doubt providing validation to your CEO’s favorite mantra of “Do more with less!”). Cisco claims that such a solution will work out to ~$200 per seat if you decide to forego networked storage and deploy what they call “diskless VDI”. Customers who use this solution include Colt a European service provider with a 30,000 seat VDI deployment.
- Deploy Virsto for vSphere 2.0 software (Virsto was acquired by VMware) – This software also runs in a VM on the hypervisor but works by intercepting random writes from the hypervisor, writing them to a dedicated write log in a serial manner then de-staging to a copy of the virtual hard disk (VHD) residing in your networked storage. In other words it takes away the pesky “random writes” that your storage loves to hate. Virsto claims to have a customer base that includes big names like Microsoft, Fiserv, Bechtel and others. Virsto claims that for a 3000 seat VDI deployment you can get by with just 3 TB of SSD (instead of 97TB). Pricing appears to start at $2500 per TB.
What if you don’t like the idea of loading additional licenses and software on your central servers? You have appliance based alternatives like that from Astute Networks which allows you to keep your networked storage intact but buy dedicated inline appliances (1500 users supported per appliance) to address the I/O bottleneck problem introduced by VDI. Any of these three approaches gives you a way to deploy VDI without buying additional networked storage capacity or by doing away altogether with networked storage for your VDI project.