News
July 22, 2014
NetNames Reimagines the Data Center with AMD’s SM15000 Server for More than $1.5 Million Annual Savings

SUNNYVALE, Calif. — July 22, 2014 AMD (NYSE: AMD) today announced the completion by NetNames of a massively efficient, hyperscale data center based on AMD’s SeaMicro SM15000™ server.  NetNames is the largest and most trusted leader in Europe for online brand protection and digital asset management. AMD worked with the NetNames team to build out a new data center that simplifies operations, reduces power and removes layers of networking. 

“NetNames’ goal was to fundamentally rethink how a data center should be built to provide best-in-class agility, scalability and economics,” said David Jones, chief information officer, NetNames. “The SM15000 server fabric reduces power, simplifies operations and eliminates layers of networking while accelerating our ability to introduce new services and provide a high quality service to our customers.  It was the only server that provided a positive return on investment (ROI) and made the project commercially viable.”

By deploying AMD’s SM15000 server, NetNames was able to achieve:

  • More than $1.5 million in expected annual operating cost savings
  • Reduction in physical server rack space by 83 percent
  • Consolidation of 500 servers into four SM15000 servers
  • Reduction in ongoing operating expense by 75 percent

NetNames was able to transform the data center using the SM15000 server as the foundation to create a new architecture from the ground up.  The design redefined the server building block leveraging the SeaMicro Freedom™ Fabric to disaggregate compute, storage and networking to build right-sized computing and storage nodes that are optimal to the applications being deployed.  Traditional server building blocks are more limited because they come in fixed configurations that cannot be dynamically provisioned as needed.

“NetNames is clearly a leader in its industry, and the new data center will provide them a strategic asset that should help them extend their market leadership,” said Dhiraj Mallick, corporate vice president and general manager, AMD Data Center Server Solutions. “Overcoming space and power constraints will continue to be a critical operational issue for companies.  By using the SM15000 server to reimagine the data center, NetNames created enormous operational savings that was not possible with other servers.”

AMD’s SeaMicro SM15000 Server

AMD’s SeaMicro SM15000 system is the highest-density, most energy-efficient server on the market. In 10 rack units, it links 512 compute cores, 160 gigabits of I/O networking and more than five petabytes of storage with a 1.28 terabyte high-performance supercompute fabric, called Freedom™ Fabric. The SM15000 server eliminates top-of-rack switches, terminal servers, hundreds of cables and thousands of unnecessary components for a more efficient and simple operational environment.

AMD’s SeaMicro server product family currently supports the next-generation AMD Opteron™ (“Piledriver” core) processor, Intel® Xeon® E3-1260L (“Sandy Bridge”), E3-1265Lv2  (“Ivy Bridge”), E3-1265Lv3  (“Haswell”) and Intel® Atom™ N570 processors. The AMD SeaMicro SM15000 also supports the Freedom Fabric Storage products, enabling a single system to connect with more than five petabytes of storage capacity in two racks. This approach delivers the benefits of expensive and complex solutions such as network attached storage (NAS) and storage area networking (SAN) with the simplicity and low cost of direct attached storage.

Supporting Resources

  • Learn more about AMD’s SeaMicro SM15000 here
  • Download the NetNames case study here
  • Become a fan of AMD on Facebook
  • Follow AMD Server on Twitter

About AMD

AMD (NYSE: AMD) designs and integrates technology that powers millions of intelligent devices, including personal computers, tablets, game consoles and cloud servers that define the new era of surround computing. AMD solutions enable people everywhere to realize the full potential of their favorite devices and applications to push the boundaries of what is possible. For more information, visit www.amd.com.

 

Contact:

Kristen Lisa
AMD Public Relations
(512) 602-6020
Kristen.lisa@amd.com

 

June 23, 2014
How We Scaled Openstack to Launch 168,000 Cloud Instances

Atlanta, GA — June 18, 2014 In the run up to the OpenStack summit in Atlanta, the Ubuntu Server team had it’s first opportunity to test OpenStack at real scale.SM_FS_15K_Frt_web-lo

AMD made available 10 SeaMicro 15000 chassis in one of their test labs. Each chassis has 64, 4 core, 2 thread (8 logical cores), 32GB RAM servers with 500G storage attached via a storage fabric controller – creating the potential to scale an OpenStack deployment to a large number of compute nodes in a small rack footprint.

As you would expect, we chose the best tools for deploying OpenStack:

  • MAAS – Metal-as-a-Service, providing commissioning and provisioning of servers.
  • Juju – The service orchestration for Ubuntu, which we use to deploy OpenStack on Ubuntu using the OpenStack charms.
  • OpenStack Icehouse on Ubuntu 14.04 LTS.
  • CirrOS – a small footprint linux based Cloud OS

MAAS has native support for enlisting a full SeaMicro 15k chassis in a single command – all you have to do is provide it with the MAC address of the chassis controller and a username and password. A few minutes later, all servers in the chassis will be enlisted into MAAS ready for commissioning and deployment:

maas local node-group probe-and-enlist-hardware \
  nodegroup model=seamicro15k mac=00:21:53:13:0e:80 \
  username=admin password=password power_control=restapi2

Juju has been the Ubuntu Server teams preferred method for deploying OpenStack on Ubuntu for as long as I can remember; Juju uses Charms to encapsulate the knowledge of how to deploy each part of OpenStack (a service) and how each service relates to each other – an example would include how Glance relates to MySQL for database storage, Keystone for authentication and authorization and (optionally) Ceph for actual image storage.

Using the charms and Juju, it’s possible to deploy complex OpenStack topologies using bundles, a yaml format for describing how to deploy a set of charms in a given configuration – take a look at the OpenStack bundle we used for this test to get a feel for how this works.

scale_test_juju

Starting out small(ish)

All ten chassis were not all available from the outset of testing, so we started off with two chassis of servers to test and validate that everything was working as designed. With 128 physical servers, we were able to put together a Neutron based OpenStack deployment with the following services:

  • 1 Juju bootstrap node (used by Juju to control the environment), Ganglia Master server
  • 1 Cloud Controller server
  • 1 MySQL database server
  • 1 RabbitMQ messaging server
  • 1 Keystone server
  • 1 Glance server
  • 3 Ceph storage servers
  • 1 Neutron Gateway network forwarding server
  • 118 Compute servers

We described this deployment using a Juju bundle, and used the juju-deployer tool to bootstrap and deploy the bundle to the MAAS environment controlling the two chassis. Total deployment time for the two chassis to the point of a OpenStack cloud that was usable was around 35 minutes.

At this point we created 500 tenants in the cloud, each with its own private network (using Neutron), connected to the outside world via a shared public network. The immediate impact of doing this is that Neutron creates dnsmasq instances, Open vSwitch ports and associated network namespaces on the Neutron Gateway data forwarding server – seeing this many instances of dnsmasq on a single server is impressive – and the server dealt with the load just fine!

Next we started creating instances; we looked at using Rally for this test, but it does not currently support using Neutron for instance creation testing, so we went with a simple shell script that created batches of servers (we used a batch size of 100 instances) and then waited for them to reach the ACTIVE state. We used the CirrOS cloud image (developed and maintained by the Ubuntu Server teams’ very own Scott Moser) with a custom Nova flavor with only 64 MB of RAM.

We immediately hit our first bottleneck – by default, the Nova daemons on the Cloud Controller server will spawn sub-processes equivalent to the number of cores that the server has. Neutron does not do this and we started seeing timeouts on the Nova Compute nodes waiting for VIF creation to complete. Fortunately Neutron in Icehouse has the ability to configure worker threads, so we updated the nova-cloud-controller charm to set this configuration to a sensible default, and provide users of the charm with a configuration option to tweak this setting. By default, Neutron is configured to match what Nova does, 1 process per core – using the charm configuration this can be scaled up using a simple multiplier – we went for 10 on the Cloud Controller node (80 neutron-server processes, 80 nova-api processes, 80 nova-conductor processes). This allowed us to resolve the VIF creation timeout issue we hit in Nova.

At around 170 instances per compute server, we hit our next bottleneck; the Neutron agent status on compute nodes started to flap, with agents being marked down as instances were being created. After some investigation, it turned out that the time required to parse and then update the iptables firewall rules at this instance density took longer than the default agent timeout – hence why agents kept dropping out from Neutrons perspective. This resulted in virtual interface (VIF) creation timing out and we started to see instance activation failures when trying to create more that a few instances in parallel. Without an immediate fix for this issue (see bug 1314189), we took the decision to turn Neutron security groups off in the deployment and run without any VIF level iptables security. This was applied using the nova-compute charm we were using, but is obviously not something that will make it back into the official charm in the Juju charm store.

With the workaround on the Compute servers and we were able to create 27,000 instances on the 118 compute nodes. The API call times to create instances from the testing endpoint remained pretty stable during this test, however as the Nova Compute servers got heavily loaded, the amount of time taken for all instances to reach the ACTIVE state did increase:

Doubling up

At this point AMD had another two chassis racked and ready for use so we tore down the existing two chassis, updated the bundle to target compute services at the two new chassis and re-deployed the environment. With a total of 256 servers being provisioned in parallel, the servers were up and running within about 60 minutes, however we hit our first bottleneck in Juju.

The OpenStack charm bundle we use has a) quite a few services and b) a-lot of relations between services – Juju was able to deploy the initial services just fine, however when the relations where added, the load on the Juju bootstrap node went very high and the Juju state service on this node started to throw a larger number of errors and became unresponsive – this has been reported back to the Juju core development team (see bug 1318366).

We worked around this bottleneck by bringing up the original two chassis in full, and then adding each new chassis in series to avoid overloading the Juju state server in the same way. This obviously takes longer (about 35 minutes per chassis) but did allow us to deploy a larger cloud with an extra 128 compute nodes, bringing the total number of compute nodes to 246 (118+128).

And then we hit our next bottleneck…

By default, the RabbitMQ packaging in Ubuntu does not explicitly set a file descriptor ulimit so it picks up the Ubuntu defaults – which are 1024 (soft) and 4096 (hard). With 256 servers in the deployment, RabbitMQ hits this limit on concurrent connections and stops accepting new ones. Fortunately it’s possible to raise this limit in /etc/default/rabbitmq-server – and as we were deployed using the rabbitmq-server charm, we were able to update the charm to raise this limit to something sensible (64k) and push that change into the running environment. RabbitMQ restarted, problem solved.

With the 4 chassis in place, we were able to scale up to 55,000 instances.

Ganglia was letting us know that load on the Nova Cloud Controller during instance setup was extremely high (15-20), so we decided at this point to add another unit to this service:

juju add-unit nova-cloud-controller

and within 15 minutes we had another Cloud Controller server up and running, automatically configured for load balancing of API requests with the existing server and sharing the load for RPC calls via RabbitMQ. Load dropped, instance setup time decreased, instance creation throughput increased, problem solved.

Whilst we were working through these issues and performing the instance creation, AMD had another two chassis (6 & 7) racked, so we brought them into the deployment adding another 128 compute nodes to the cloud bringing the total to 374.

And then things exploded…

The number of instances that can be created in parallel is driven by two factors – 1) the number of compute nodes and 2) the number of workers across the Nova Cloud Controller servers. However, with six chassis in place, we were not able to increase the parallel instance creation rate as much as we wanted to without getting connection resets between Neutron (on the Cloud Controllers) and the RabbitMQ broker.

The learning from this is that Neutron+Nova makes for an extremely noisy OpenStack deployment from a messaging perspective, and a single RabbitMQ server appeared to not be able to deal with this load. This resulted in a large number of instance creation failures so we stopped testing and had a re-think.

A change in direction

After the failure we saw in the existing deployment design, and with more chassis still being racked by our friends at AMD, we still wanted to see how far we could push things; however with Neutron in the design, we could not realistically get past 5-6 chassis of servers, so we took the decision to remove Neutron from the cloud design and run with just Nova networking.

Fortunately this is a simple change to make when deploying OpenStack using charms as the nova-cloud-controller charm has a single configuration option to allow Neutron and Nova networkings to be configured. After tearing down and re-provisioning the 6 chassis:

juju destroy-enviroment maas
juju-deployer --bootstrap -c seamicro.yaml -d trusty-icehouse

with the revised configuration, we were able to create instances in batches of 100 at a respectable throughput of initially 4.5/sec – although this did degrade as load on compute servers went higher. This allowed us to hit 75,000 running instances (with no failures) in 6hrs 33 mins, pushing through to 100,000 instances in 10hrs 49mins – again with no failures.

100k

As we saw in the smaller test, the API invocation time was fairly constant throughout the test, with the total provisioning time through to ACTIVE state increasing due to loading on the compute nodes:

100k

Status check

OK – so we are now running an OpenStack Cloud on Ubuntu 14.04 across 6 seamicro chassis (1,2,3,5,6,7 – 4 comes later) – a total of 384 servers (give or take one or two which would not provision). The cumulative load across the cloud at this point was pretty impressive – Ganglia does a pretty good job at charting this:

100k-load

AMD had two more chassis (8 & 9) in the racks which we had enlisted and commissioned, so we pulled them into the deployment as well; This did take some time – Juju was grinding pretty badly at this point and just running ‘juju add-unit -n 63 nova-compute-b6′ was taking 30 minutes to complete (reported upstream – see bug 1317909).

After a couple of hours we had another ~128 servers in the deployment, so we pushed on and created some more instances through to the 150,000 mark – as the instances where landing on the servers on the 2 new chassis, the load on the individual servers did increase more rapidly so instance creation throughput did slow down faster but the cloud managed the load.

Tipping point?

Prior to starting testing at any scale, we had some issues with one of the chassis (4) which AMD had resolved during testing, so we shoved that back into the cloud as well; after ensuring that the 64 extra servers where reporting correctly to Nova, we started creating instances again.

However, the instances kept scheduling onto the servers in the previous two chassis we added (8 & 9) with the new nodes not getting any instances. It turned out that the servers in chassis 8 & 9 where AMD based servers with twice the memory capacity; by default, Nova does not look at VCPU usage when making scheduling decisions, so as these 128 servers had more remaining memory capacity that the 64 new servers in chassis 4, they were still being targeted for instances.

Unfortunately I’d hopped onto the plane from Austin to Atlanta for a few hours so I did not notice this – and we hit our first 9 instance failures. The 128 servers in Chassis 8 and 9 ended up with nearly 400 instances each – severely over-committing on CPU resources.

A few tweaks to the scheduler configuration, specifically turning on the CoreFilter and setting the over commit at x 32, applied to the Cloud Controller nodes using the Juju charm, and instances started to land on the servers in chassis 4. This seems like a sane thing to do by default, so we will add this to the nova-cloud-controller charm with a configuration knob to allow the over commit to be altered.

At the end of the day we had 168,000 instances running on the cloud – this may have got some coverage during the OpenStack summit….

The last word

Having access to this many real servers allowed us to exercise OpenStack, Juju, MAAS and our reference Charm configurations in a way that we have not been able undertake before. Exercising infrastructure management tools and configurations at this scale really helps shake out the scale pinch points – in this test we specifically addressed:

  • Worker thread configuration in the nova-cloud-controller charm
  • Bumping open file descriptor ulimits in the rabbitmq-server charm enabled greater concurrent connections
  • Tweaking the maximum number of mysql connections via charm configuration
  • Ensuring that the CoreFilter is enabled to avoid potential extreme overcommit on nova-compute nodes.

There where a few things we could not address during the testing for which we had to find workarounds:

  • Scaling a Neutron base cloud past more than 256 physical servers
  • High instance density on nova-compute nodes with Neutron security groups enabled.
  • High relation creation concurrency in the Juju state server causing failures and poor performance from the juju command line tool.

We have some changes in the pipeline to the nova-cloud-controller and nova-compute charms to make it easier to split Neutron services onto different underlying messaging and database services. This will allow the messaging load to be spread across different message brokers, which should allow us to scale a Neutron based OpenStack cloud to a much higher level than we achieved during this testing. We did find a number of other smaller niggles related to scalability – checkout the full list of reported bugs.

And finally some thanks:

  • Blake Rouse for doing the enablement work for the SeaMicro chassis and getting us up and running at the start of the test.
  • Ryan Harper for kicking off the initial bundle configuration development and testing approach (whilst I was taking a break- thanks!) and shaking out the initial kinks.
  • Scott Moser for his enviable scripting skills which made managing so many servers a whole lot easier – MAAS has a great CLI – and for writing CirrOS.
  • Michael Partridge and his team at AMD for getting so many servers racked and stacked in such a short period of time.
  • All of the developers who contribute to OpenStack, MAAS and Juju!

.. you are all completely awesome!

About AMD

AMD (NYSE: AMD) designs and integrates technology that powers millions of intelligent devices, including personal computers, tablets, game consoles and cloud servers that define the new era of surround computing. AMD solutions enable people everywhere to realize the full potential of their favorite devices and applications to push the boundaries of what is possible. For more information, visit www.amd.com.

 

Contact:

Kristen Lisa
AMD Public Relations
(512) 602-6020
Kristen.lisa@amd.com

 

May 13, 2014
AMD’s SeaMicro SM15000™ Server Sets Industry Benchmark Record for Hyperscale OpenStack Clouds

SAN FRANCISCO, Calif. — May 13, 2014 AMD (NYSE: AMD) today announced that its SeaMicro SM15000™ server set a significant industry benchmark record for hyperscale cloud computing with a demonstration that highlights how OpenStack can quickly and reliably provision on-demand computing services at scale. The test provisioned 168,000 virtual machines on 576 physical hosts.  The first 75,000 virtual machines were deployed in six hours and thirty minutes.  This is the largest known demonstration of OpenStack scalability ever.  AMD achieved the record in collaboration with Canonical using the Ubuntu OpenStack (Icehouse) distribution. MaaS (Metal as a Service), part of Ubuntu 14.04 LTS and Ubuntu OpenStack, was used to deliver the bare metal servers, storage and networking. The solution is available today and is the most scalable, automated application for deploying OpenStack in hyperscale environments.

“This record validates that the SeaMicro SM15000 is well suited for massive OpenStack deployments,” said Dhiraj Mallick, corporate vice president and general manager, AMD Data Center Server Solutions. “The combination of Ubuntu OpenStack and the SeaMicro SM15000 server provides the industry’s leading solution to build cloud infrastructure that is highly responsive and ideal for on-demand services.”

Ubuntu OpenStack 14.04 provides SeaMicro SM15000 integration with support for the system’s RESTful application programming interface (API). AMD’s SeaMicro SM15000 is the only dense server that natively provides RESTful APIs without requiring a separate management application, while accelerating automation and simplifies management by creating standard interfaces to provision compute, storage and networking resources. The SeaMicro SM15000 provides the most flexible, scalable and resilient data center infrastructure in the industry. 

“We have raised the bar once again and firmly established Ubuntu OpenStack as the fastest and most reliable way to build a public or private cloud,” said Mark Shuttleworth, founder of Ubuntu and Canonical. “Ubuntu OpenStack stands out with its performance and sophisticated tools to provision, build and manage hyperscale clouds.”

AMD’s SeaMicro SM15000 Server

AMD’s SeaMicro SM15000 system is the highest-density, most energy-efficient server on the market. In 10 rack units, it links 512 compute cores, 160 gigabits of I/O networking and more than five petabytes of storage with a 1.28 terabyte high-performance supercompute fabric, called Freedom™ fabric. The SM15000 server eliminates top-of-rack switches, terminal servers, hundreds of cables and thousands of unnecessary components for a more efficient and simple operational environment.

AMD’s SeaMicro server product family currently supports the next-generation AMD Opteron™ (“Piledriver” core) processor, Intel® Xeon® E3-1260L (“Sandy Bridge”), E3-1265Lv2  (“Ivy Bridge”), E3-1265Lv3  (“Haswell”) and Intel® Atom™ N570 processors. The AMD SeaMicro SM15000 also supports the Freedom Fabric Storage products, enabling a single system to connect with more than five petabytes of storage capacity in two racks. This approach delivers the benefits of expensive and complex solutions such as network attached storage (NAS) and storage area networking (SAN) with the simplicity and low cost of direct attached storage.

Supporting Resources

  • Learn more about AMD’s SeaMicro SM15000 here
  • Download AMD’s OpenStack Reference Architecture here
  • Become a fan of AMD on Facebook
  • Follow AMD Server on Twitter

About AMD

AMD (NYSE: AMD) designs and integrates technology that powers millions of intelligent devices, including personal computers, tablets, game consoles and cloud servers that define the new era of surround computing. AMD solutions enable people everywhere to realize the full potential of their favorite devices and applications to push the boundaries of what is possible. For more information, visit www.amd.com.

 

Contact:

Kristen Lisa
AMD Public Relations
(512) 602-6020
Kristen.lisa@amd.com

 

May 01, 2014
AMD’s Revolutionary SeaMicro SM15000™ Server Named 2014 Silver Edison Award Winner

SAN FRANCISCO — May 1, 2014 AMD (NYSE: AMD) today announced that its SeaMicro SM15000™ server received a 2014 Silver Award from the internationally renowned Edison Awards™ in the Applied Technology, Research and Business Optimization category. Competing finalists in the category included AT&T Nanocubes by AT&T Inc. and Archetype IQ System by Ipsos InnoQuest. The distinguished awards – inspired by Thomas Edison’s persistence and inventiveness – recognize innovation, creativity and ingenuity in the global economy. The awards honor the best in innovation, and finalists are judged by a panel of more than 3,000 senior business executives and academics from across the nation from a wide variety of industries and disciplines.

The Edison Award win further validates that the AMD SeaMicro SM15000 server is the highest density, most energy-efficient server in the market. AMD’s SeaMicro SM15000 reimagines the data center server and brings unprecedented efficiency to allow companies to introduce new services faster and at lower cost. It delivers the highest computing density, reducing racks of servers to the size of a suitcase, while reducing data center power consumption by up to 75 percent.

“It’s an honor to be recognized by the esteemed Edison Awards and to be in the presence of so many innovative companies that are creating state-of-the-art products,” said Young-Sae Song, corporate vice president, product marketing, data center server solutions at AMD.  “AMD strives for innovations that transform markets, while inspiring the technology of tomorrow – exactly what the Edison Awards honors and represents.”

The Edison Awards are among the most prestigious accolades honoring excellence in new product and service development, marketing, human‐centered design and innovation. Unique to the world of award programs, the Edison Awards are focused on the innovators as much as the innovations. Award winners represent game-changing products, services and excellence and leadership in innovation around four criteria: concept, value, delivery and impact.

 “It’s exciting to see companies like AMD continuing Thomas Edison’s legacy of challenging conventional thinking,” said Frank Bonafilia, Edison Awards’ executive director. “The Edison Awards recognizes game‐changing products and services, and the teams that brought them to consumers.”

The voting panel includes senior level marketers, scientists, designers, engineers, academics and past award winners. Winners were announced on Wednesday at the Edison Awards Annual Gala, held in the historic Julia Morgan Ballroom in San Francisco, Calif.

The Edison Awards is a program conducted by Edison Universe, a 501(c)(3) charitable organization dedicated to fostering future innovators. The 2014 Edison Awards are sponsored by Ipsos. For more information about the Edison Awards, Edison Universe and a list of winners, visit www.edisonawards.com.

Supporting Resources

  • Learn more about AMD’s SeaMicro SM15000 here
  • Learn more about the Edison Awards here
  • Become a fan of AMD on Facebook
  • Follow AMD on Twitter

About AMD

AMD (NYSE: AMD) designs and integrates technology that powers millions of intelligent devices, including personal computers, tablets, game consoles and cloud servers that define the new era of surround computing. AMD solutions enable people everywhere to realize the full potential of their favorite devices and applications to push the boundaries of what is possible. For more information, visit www.amd.com.

Contact:

Kristen Lisa
AMD Public Relations
(512) 602-6020
Kristen.lisa@amd.com

 

February 13, 2014
The WSJ: AMD Makes Good on ARM Chip Shift

Advanced Micro Devices since the early 1980s has doggedly stuck to the x86 chip architecture that IntelINTC +0.61% invented, which powered the original IBMIBM +0.89% PC and most that followed. On Tuesday, it made good on a pledge to back an additional horse in the computing race.

The Sunnyvale, Calif., company used a conference in Silicon Valley to unveil a chip for server systems that is based on technology licensed by ARM HoldingsARM.LN +1.35%, which has become the mainstay of smartphones and tablets. AMD is one of a number of companies planning to move such chips into servers, on the theory that their low cost and low power consumption can bring big economic benefits.

To read the original WSJ article, click here.

Pages