Many SysAdmin’s have been in this boat before me, and many will follow. We have
all been the new face at a company. You feel like you’re ready to take on the
world and make a difference. It’s your time to make the business run faster,
with fewer resources, lower its operating costs while increasing productivity
and resources for internal and external customers.
You want everything to be streamlined, fast and with all the bells and whistles
to make everyone happy, much like James Bond when he hops in his Austin Martin
Vanquish. The problem is Q ran out of resources and gave you a 1989 K-Car Wagon.
Sure it’s running and pretty good on gas, but plastic wood paneling has no
place on a car in chase across the arctic, or in your server room.
When I started at the CodeProject everything was, for the most part, running
perfectly fine. Servers were chugging along with some new, some old, some fast
and some slow hardware. So why fix it if it’s not broken? Well broken is in the
eye of the beholder, and just because you just put a new BlueTooth radio in your
K-Car, it’s still…well… a K-Car.
I will be spending some time writing a series of articles that will hopefully
guide you through some of our successes and failures. Over the last year and bit
we went from an old school non-virtualized, non-clustered off the shelf, buy a
server, slam it in the rack and pop in the Windows DVD setup to a mix of Linux
and Windows, Virtualized Servers, SAN’s, SQL Failover Clusters, Load Balancers
and more. In the end we have reduced our Electricity requirements by two thirds,
cut the number of physical servers down to less than half, and reduced our
monthly Licensing budget by 30 percent yet we have increased the number of
services while increasing or maintaining performance on all fronts.
Sounds great right? Well fasten your seatbelt. It’s a long and winding road
ahead. Things are going to break, services will go down, staff will complain and
you will encounter resistance. With proper planning, outages and other concerns
can be mitigated.
The first thing you need to do is to make sure you have your team and everyone
involved aware that there will be struggles but the end goal is well worth it.
Once you have everyone’s blessing (or those who can complain safely out of the Country…aka Chris in
Australia) you are ready to dig in.
Where to Start
So what was my vision? From the earliest days at The Code Project I quickly
realized that we needed two key things.
- A SQL Failover Cluster. Waking up by an emergency phone call at 2am
because our SQL server lost a piece of hardware and Time Zone X is about to
wake up is not acceptable in my world of happy nap time.
- Virtualization. With a quick poke around during production hours I saw
that much of the servers we had were running off of old Pentium 4’s and first gen
Xeon’s yet were only using a small fraction of the CPU’s they had. Although
the services they were running were happy on this hardware these servers
were starting to fail due to basic hardware failures like fans and Hard
Drive’s. These failures were causing service outages that were again
interfering with my beauty sleep.
BTW no matter what ring tone you use it sucks waking up in the middle of the
night.
With these two targets in mind I knew I would need to have some sort of shared
storage. Yes this means a SAN. I wasn’t yet sure about what type, model or
capacity I needed, just that I needed one. I also knew I wanted a fast enough
local network so that multiple services could all share lines in a virtual
environment without slowing down.
Onto the Inventory
You need a complete and accurate inventory of what you have. You don’t need to
detail every single piece of everything but there are certain pieces of hardware
that can save lots of money. Some pieces of hardware haven’t really changed too
much in the last few years and some can still be reused in your new vision.
Take note of the following key items in your inventory.
- You first need a physical inventory of all server hardware. List every
Server, with its CPU type, quantity and number of cores, how much memory is
installed, Hard drives with makes, models (Take specific note as to whether
or not its Single Port, Dual port, SAS, SATA etc…), form factor(2.5” or 3”),
spindle speed (RPM’s), physical size (in GB), take note of the NIC’s in your
servers, the number per server, and are they integrated or add on cards.
- You will also need the soft information from each machine. You will need
to map out if and when your individual servers are busy or idle, how much
CPU and Memory each uses during peak and non-peak times, when the backups or
any resource intensive jobs run, How much Hard drive space is allocated in
what partitions, how much free space you have or need.
- You should also make a list of network gear. Switches, routers, hubs
(GASP!!!), firewalls etc. Is it gigabit throughout? What about your cables?
Are they all Cat 5E or higher?
- The last piece of information is the Software info. OS’s, services,
applications, service pack levels and take note of any key dependencies, eg.
Web Servers 1-6 use Databases on SQL Server 1 and 2.
Ideally much of your hardware is from one vendor, if it is this can save you a
fair bit on the various bits and pieces as you build and spec out your new
hardware plan. Most of our newer hardware was from one vendor with the older
stuff being from another.
Once you have all of this information you will need to come up with your own
plan of what can be amalgamated and what can’t. I pretty much decided anything
running off of less than a Xeon would be the first to be virtualized. Mostly
because this is the hardware that was giving us issues and since it was still
running off that old hardware it should virtualize fairly easily. From there I
looked at what services are not heavy in disk IO as this can be a major bottle
neck. I had made my list written in very light pencil, and pressed on.
I tend not to like to virtualize domain controllers. The number one reason for
this is during a panic situation like a power outage if the Domain controllers
aren’t up first or down last a lot of stuff won’t work properly and you can end
up chasing your tail around rebooting everything while you sit and stare at the
Windows Login screen as it tries to connect to a Domain controller that isn’t up
yet. This 5 to 20 minute timeout is agonizing with angry people tapping there
feet behind you. If you are familiar with virtualizing you can setup priorities,
startup delays and startup orders for virtual machines, but chances are not
everything will be virtualized, so this isn’t always going to work.
Don’t Forget about Growth
This is where your companies 3 year plan comes in handy. You have that in your
hip pocket right? No? You missed that meeting because you were fixing something?
I know, I know, you don’t have it, but get it. Or make one up. Sit down with the
key decision makers, and ask them if they foresee any mass hiring, service
changes, new products or whatever tends to shape your server farm. The biggest
mistake you can make when virtualizing is going too small or buying hardware
that isn’t going to be expandable enough for the future.
If you have gone through this far you should now have a fairly good idea of what
you want to virtualize and what you may be able to consolidate. In our case we
had roughly 40 physical servers. 5 or 6 SQL servers, a dozen or so web servers,
a half dozen mail servers and the rest were split up between DNS, Domain
controllers and a mess of miscellaneous servers.
My initial plan was to buy two new big honkin SQL server’s for a SQL cluster and
use the existing SQL servers as Virtual Host’s, connect them all to a SAN. Then
convert as many of the other Physical Servers to Virtual Servers as I could. I
saw 2 or possibly 3 SQL Servers as prime candidates for virtual server’s, they
had lots of cores and lots of ram.
Disks are a very expensive part of putting in a SAN so my main goal for cost
cutting would be to re-use as many disks as I could. The SQL servers and various
file servers had lots of small and fast disks. Most of our servers were using
various 2.5” disks some single port, some dual, some 10k and some 15k however
all were from the same manufacturer.
I also figured we could move some services around to
a cheaper means of storage (eg. iSCSI
NAS’s etc..) and re-use those more expensive disks for the SAN as well. In all I
was able to scavenge 8x300GB 10k disks, 10x148GB 10k dual port SAS disks, and
numerous 72gb 15k disks. In the end we ended up with a cheap, moderately
performing NAS for mass storage such as backups, and a nicely powered and fast
SAN for high performance tasks such as SQL storage.
Fort the SQL server side of things, I needed to know how much physical space I
needed for Logs and Data, as these will need to reside on shared storage. Since
a lot of SysAdmins wear many hats one of the first things you need to know is
that SQL Databases don’t always use as much space as what is on the disks. So
finding this out from Windows Explorer isn’t going to cut it.
Here is a little SQL query that you can run on your SQL servers to see what you
have in terms of space required and used.
DECLARE @command varchar(max) =
'USE [?]
select DBName = left(a.NAME,40),
[Size MB] = convert(decimal(12),round(a.size/128,2)),
[Used MB] = convert(decimal(12),round(fileproperty(a.name,''SpaceUsed'')/128,2)),
[Free Space MB] = convert(decimal(12),round((a.size-fileproperty(a.name,''SpaceUsed''))/128,2)),
FullPath = left(a.FILENAME,70)
from dbo.sysfiles a'
EXEC sp_MSforeachdb @command
This will show you how much each log and data file is using, there paths, names,
and how much free space they each have. When building a SQL server you
want to avoid as much as possible any auto growth, however you don’t want to
consume so much space you are wasting it. Make your plans accordingly.
After running the numbers from my inventory of how much space I needed to store
the VM’s and the SQL data/log files this all worked out surprisingly well with
what disks we already had. I’ll go into the nitty gritty details of what’s going
where with regards to the disks in the later articles.
Off I went on my merry way, happily unplugging things, shutting things down,
moving things around, installing SQL cluster’s, going through test phases with
HyperV, VMWare and XenServer. Eventually we ended up with Two Virtual Servers
running about 20 VM’s in production (One recycled from an old SQL server and one
a brand new
server) both running Xenserver 6.0, a pair of SQL Servers in a failover cluster
(One original, One new), with all four of these connected to DAS SAN (reusing disks
from other servers), and only 14 other Physical servers remain, several of which
are slated to be virtualized in the next few months.
In the end I have found myself with a bit more time on my hands, and am able to
concentrate more on productivity gains, helping staff make better use of their
resources, improving functionality, tweaking the servers to get peak
performance… and of course… write this lovely little article.
I'll follow up with specifics on what I did with the SQL servers, how we
consolidated the webservers, our storage balancing act, our network, and, of
course, backups.
In the words of Little Nicky….