One of the first steps when onboarding a client to Agio’s DBA Management Service, is to dig deep into their environment, ensuring everything is configured according to best practices and industries standards. We do this to establish a baseline of the current status and to start discussions for steps needed to get the client in line with industry standards. Most issues we find permeate through SQL Server environments regardless if it’s a mature enterprise SQL Server environment with thousands of databases across 50 servers or a boutique shop with a single database on a single server. Here I will discuss some of the most common issues we encounter and the potential problems they can cause, and in future blogs I’ll explain how to resolve each of these issues, but for now let’s get started.  

  1. Incorrect number of tempdb data files

Let’s imagine a scenario where you’re moving for a new job. Your new company provides you with four movers to assist with packing, loading, and unloading the truck. They also provide you with a single truck.  You estimate it will take four truck loads to move all your stuff to the new location. So after you load the truck, only one of the movers comes with you to your new home to unload the truck and the other three stay at your old place idle and awaiting your return so they can load the truck again.  Not a very efficient use of all your resources, right? A better scenario would be to rent three more trucks so all your movers can be working at the same time.  This is the same scenario you’re in when you don’t properly configure your tempdb datafiles. If you have only one truck (tempdb data files) but four movers (CPU Cores), ¾ of your resources will be idle while one CPU Core does all the work.

Tempdb is often the most heavily used database on a SQL Server and improving the performance of tempdb means you will generally improve the overall performance of your SQL Server instance. The consensus is to have 1 tempdb data file for every CPU core, up to 8 cores. By increasing the number of tempdb data files to match the number of cores, you’re ensuring SQL Server instance is configured to maximize the server’s resources and increase performance.

  1. Linked servers using passthrough accounts (as opposed to permissioning both sides)
See also  Think Switching IT and Cybersecurity MSPs is a Big Lift? Think Again

I had a client call me a few months back asking why a user is still able to access data on a database even though they have combed through all the Database Users and verified the user doesn’t have access.  In doing my own due diligence to make sure nothing was overlooked, I also verified that the user didn’t have access on the Database. I then did a screenshare with the user and was able to quickly identify she was accessing the data by traversing over a linked server. When I looked at the linked server, I was surprised to see it was configured to use an account that has sysadmin access for anyone who traverses it. I then circled back with the client and let them know the reason the user had access.  After conversing about why the linked server was configured to use a hard-coded account, I learned it was configured to get around a Kerberos Authentication they were having and couldn’t resolve. In my experience, this is the #1 reason for bad SQL Server security. Someone with admin rights was troubleshooting an issue and elevated permissions are given to normal users as a work around. This was an eye opener for the client when I informed him that the user not only had access to read on the database, but she also had the ability to alter and delete data as well.

  1. Improperly configured backups

Taking backups of your databases is a no brainer that everyone understands. Occasionally we run across clients with databases not backed up, but that’s usually because they don’t have any safeguards to alert them to stale backups. More commonly I see databases in Full Recovery without any scheduled tlog backups or incorrect assumptions on how their data strategies work. What amazes me the most is how many times I’ve come across backup jobs that take a full backup and then will take a tlog backup immediately after. No other tlog backups taken throughout the day until right after the next full backup. This backup strategy isn’t ideal, as you are paying for the performance hit and the tlog size overhead, but not getting the benefit of point-in-time restore. All calories but no cake.

  1. Everyone is a sysadmin

Most of our client’s environments start small. A few developers working on an app who need a backend database. For access, they just give themselves sysadmin (sa) permissions and continue to work. When it comes time to add users, they start with the best of intentions by just granting Read Only access to the users. Then a power user needs elevated access and instead of following the rule of most restrictive access, they make the poweruser a sysadmin and be done with it. Now this power user has access to grant other users the same permissions, and she does. Before you know it, everyone is a sysadmin on the server. Now you have developers testing their code in prod, users creating redundant tables, and devs creating new databases to run resource heavy applications, but what takes down your environment is a user accidently running code on the wrong server because they weren’t paying attention.

  1. No documented cybersecurity policies
See also  Inside the Largest IT Outage in History: Why Sweeping it Under the Rug Won’t Cut it

Frequently overlooked but one of most important things you can do for your SQL Server environment is to document your policies and practices.  There needs to be a document that states “this is how we are going to configure linked server security” or “this is how we are going to permission user access on our database.” Having it written down means everyone administrating your databases is on the same page and performing security related tasks under the same guidelines. Equally important is that it gives the DBAs a document to point to when a user asks for access outside of the policy. Being able to tell a user that such an action is “against the company security policy” limits pushback from the user and provides them guidance on how they should build their applications to interface with the data.

These were the top five most common issues we find when evaluating a client’s environment. Now, the list could’ve easily been 10 or 20 items long, but these are the critical five specifically based on the impact they have when resolved. Increasing your tempdb data files will boost efficiency and properly configuring your backups will improve your environments reliability. It’s no mistake that the three remaining items are related to security – it’s an area often overlooked when accidental DBAs are trying to get an application up and running. Now is the time to define your security policy, lock down your linked servers, and start kicking users out of the sysadmin role.  Once these five items are resolved, your environment will be more efficient, reliable, and secure.

Learn More