What is the worst database accident that happened to you in production? [closed]

D

18

6

For example: Updating all rows of the customer table because you forgot to add the where clause.

What was it like, realizing it and reporting it to your coworkers or customers?
What were the lessons learned?

Devisal answered 15/8, 2008 at 11:13 Comment(0)

H

12

I think my worst mistake was

truncate table Customers
truncate table Transactions

I didnt see what MSSQL server I was logged into, I wanted to clear my local copy out...The familiar "OH s**t" when it was taking significantly longer than about half a second to delete, my boss noticed I went visibily white, and asked what I just did. About half a mintue later, our site monitor went nuts and started emailing us saying the site was down.

Lesson learned? Never keep a connection open to live DB longer than absolutly needed.

Was only up till 4am restoring the data from the backups too! My boss felt sorry for me, and bought me dinner...

Hexyl answered 15/8, 2008 at 11:28 Comment(2)

yep i have nearly done this before. Definatly always close connection to live as soon as you can. – Await 23/11, 2008 at 21:57

The first thing I did when I read this was close my open SSMS connection to the live database server... – Nettle 28/5, 2010 at 14:17

W

7

I work for a small e-commerce company, there's 2 developers and a DBA, me being one of the developers. I'm normally not in the habit of updating production data on the fly, if we have stored procedures we've changed we put them through source control and have an officially deployment routine setup.

Well anyways a user came to me needing an update done to our contact database, batch updating a bunch of facilities. So I wrote out the query in our test environment, something like

update facilities set address1 = '123 Fake Street'
    where facilityid in (1, 2, 3)

Something like that. Ran it in test, 3 rows updated. Copied it to clipboard, pasted it in terminal services on our production sql box, ran it, watched in horror as it took 5 seconds to execute and updated 100000 rows. Somehow I copied the first line and not the second, and wasn't paying attention as I CTRL + V, CTRL + E'd.

My DBA, an older Greek gentleman, probably the grumpiest person I've met was not thrilled. Luckily we had a backup, and it didn't break any pages, luckily that field is only really for display purposes (and billing/shipping).

Lesson learned was pay attention to what you're copying and pasting, probably some others too.

Waw answered 15/8, 2008 at 13:17 Comment(0)

S

6

A junior DBA meant to do:

delete from [table] where [condition]

Instead they typed:

delete [table] where [condition]

Which is valid T-Sql but basically ignores the where [condition] bit completely (at least it did back then on MSSQL 2000/97 - I forget which) and wipes the entire table.

That was fun :-/

Stroman answered 15/8, 2008 at 11:20 Comment(1)

Certainly not on SQL Server 2000. There's no SQL Server 97 - the predecessor was SQL Server 7. – Metametabel 9/11, 2008 at 17:22

L

6

About 7 years ago, I was generating a change script for a client's DB after working late. I had only changed stored procedures but when I generated the SQL I had "script dependent objects" checked. I ran it on my local machine and all appeared to work well. I ran it on the client's server and the script succeeded.

Then I loaded the web site and the site was empty. To my horror, the "script dependent objects" setting did a DROP TABLE for every table that my stored procedures touched.

I immediately called the lead dev and boss letting them know what happened and asking where the latest backup of the DB could be located. 2 other devs were conferenced in and the conclusion we came to was that no backup system was even in place and no data could be restored. The client lost their entire website's content and I was the root cause. The result was a $5000 credit given to our client.

For me it was a great lesson, and now I am super-cautious about running any change scripts, and backing up DBs first. I'm still with the same company today, and whenever the jokes come up about backups or database scripts someone always brings up the famous "DROP TABLE" incident.

Lylalyle answered 15/8, 2008 at 15:46 Comment(0)

E

6

Something to the effect of:

update email set processedTime=null,sentTime=null

on a production newsletter database, resending every email in the database.

Ethe answered 23/11, 2008 at 22:20 Comment(0)

C

3

update Customers set ModifyUser = 'Terrapin'

I forgot the where clause - pretty innocent, but on a table with 5000+ customers, my name will be on every record for a while...

Lesson learned: use transaction commit and rollback!

Cale answered 15/8, 2008 at 14:20 Comment(0)

M

3

I once managed to write an updating cursor that never exited. On a 2M+ row table. The locks just escalated and escalated until this 16-core, 8GB RAM (in 2002!) box actually ground to a halt (of the blue screen variety).

Mussolini answered 15/8, 2008 at 14:39 Comment(0)

R

3

We were trying to fix a busted node on an Oracle cluster.

The storage management module was having problems, so we clicked the un-install button with the intention of re-installing and copying the configuration over from another node.

Hmm, it turns out the un-install button applied to the entire cluster, so it cheerfully removed the storage management module from all the nodes in the system.

Causing every node in the production cluster to crash. And since none of the nodes had a storage manager, they wouldn't come up!

Here's an interesting fact about backups... the oldest backups get rotated off-site, and you know what your oldest files on a database are? The configuration files that got set up when the system was installed.

So we had to have the offsite people send a courier with that tape, and a couple of hours later we had everything reinstalled and running. Now we keep local copies of the installation and configuration files!

Recovery answered 15/8, 2008 at 18:2 Comment(0)

V

2

I thought I was working in the testing DB (which wasn't the case apparently), so when I finished 'testing' I run a script to reset all data back to the standard test data we use... ouch!
Luckily this happened on a database that had backups in place, so after figuring out I did something wrong we could easily bring back the original database.

However this incident did teach the company I worked for to realy seperate the production and the test environment.

Vocational answered 15/8, 2008 at 16:0 Comment(0)

G

2

I don't remember all the sql statements that ran out of control but I have one lesson learned - do it in a transaction if you can (beware of the big logfiles!).

In production, if you can, proceed the old fashioned way:

Use a maintenance window
Backup
Perform your change
verify
restore if something went wrong

Pretty uncool, but generally working and even possible to give this procedure to somebody else to run it during their night shift while you're getting your well deserved sleep :-)

Gidgetgie answered 24/9, 2008 at 16:58 Comment(0)

V

2

I did exactly what you suggested. I updated all the rows in a table that held customer documents because I forgot to add the "where ID = 5" at the end. That was a mistake.

But I was smart and paranoid. I knew I would screw up one day. I had issued a "start transaction". I issued a rollback and then checked the table was OK.

It wasn't.

Lesson learned in production: despite the fact we like to use InnoDB tables in MySQL for many MANY reasons... be SURE you haven't managed to find one of the few MyISAM tables that doesn't respect transactions and you can't roll back on. Don't trust MySQL under any circumstances, and habitually issuing a "start transaction" is a good thing. Even in the worst case scenario (what happened here) it didn't hurt anything and it would have protected me on the InnoDB tables.

I had to restore the table from a backup. Luckily we have nightly backups, the data almost never changes, and the table is a few dozen rows so it was near instantaneous. For reference, no one knew that we still had non-InnoDB tables around, we thought we converted them all long ago. No one told me to look out for this gotcha, no one knew it was there. My boss would have done the same exact thing (if he had hit enter too early before typing the where clause too).

Victualage answered 23/11, 2008 at 21:59 Comment(0)

R

1

I dropped the live database and deleted it.

Lesson learned: ensure you know your SQL - and make sure that you back up before you touch stuff.

Robers answered 15/8, 2008 at 11:15 Comment(1)

dropped and delete at the same time.. but why you reacted so hard to the poor production database ;-) – Granese 21/3, 2011 at 11:46

G

1

I discovered I didn't understand Oracle redo log files (terminology? it was a long time ago) and lost a weeks' trade data, which had to be manually re-keyed from paper tickets.

There was a silver lining - during the weekend I spent inputting, I learned a lot about the useability of my trade input screen, which improved dramatically thereafter.

Gelsemium answered 15/8, 2008 at 11:22 Comment(0)

B

1

Worst case scenario for most people is production data loss, but if they're not running nightly backups or replicating data to a DR site, then they deserve everything they get!

@Keith in T-SQL, isn't the FROM keyword optional for a DELETE? Both of those statements do exactly the same thing...

Bhang answered 15/8, 2008 at 11:30 Comment(0)

B

1

The worst thing that happened to me was that a Production server consume all the space in the HD. I was using SQL Server so I see the database files and see that the log was about 10 Gb so I decide to do what I always do when I want to trunc a Log file. I did a Detach the delete the log file and then attach again. Well I realize that if the log file is not close properly this procedure does not work. so I end up with a mdf file and no log file. Thankfully I went to the Microsoft site I get a way to restore the database as recovery and move to another database.

Bibliophile answered 15/8, 2008 at 13:23 Comment(0)

M

1

Updating all rows of the customer table because you forgot to add the where clause.

That was exactly i did :| . I had updated the password column for all users to a sample string i had typed onto the console. The worst part of it was i was accessing the production server and i was checking out some queries when i did this. My seniors then had to revert an old backup and had to field some calls from some really disgruntled customers. Ofcourse there is another time when i did use the delete statement, which i don't even want to talk about ;-)

Mousterian answered 22/8, 2008 at 5:23 Comment(0)

T

1

Truncate table T_DAT_STORE

T_DAT_STORE was the fact table of the department I work in. I think I was connected to the development database. Fortunately, we have a daily backup, which hasn't been used until that day, and the data was restored in six hours.

Since then I revise everything before a truncate, and periodically I ask for a backup restoration of minor tables only to check the backup is doing well (Backup isn't done by my department)

Tripody answered 31/8, 2008 at 7:38 Comment(0)

S

1

This didn't happen to me, just a customer of ours whos mess I had to clean up.

They had a SQL server running on a RAID5 disk array - nice hotswap drives complete with lighted disk status indicators. Green = Good, Red = Bad.

One of their drives turned from green to red and the genius who was told to pull and replace the (Red) bad drive takes a (Green) good one out instead. Well this didn't quite manage to bring down the raid set completely - opting for the somewhat readable (Red) vs unavaliable (Green) for several minutes.. after realizing the mistake and swapping the drives back any data blocks that were written during this time became jyberish as disk synchronization was lost) ... 24-straight hours later writing meta programs to recover readable data and reconstruct a medium sized schema they were back up and running.

Morals of this story include...Never use RAID5, always maintain backups, careful who you hire.

I made a major mistake on a customers production system once -- luckily while wondering why the command was taking so long to execute realized what I had done and canceled it before the world came to an end.

Moral of this story include ... always start a new transaction before changing ANYTHING, test the results are what you expect and then and only then commit the transaction.

As a general observation many classes of rm -rf / type errors can be prevented by properly defining foreign key constraints on your schema and staying far away from any command labled 'CASCADE'

Scandal answered 23/11, 2008 at 20:48 Comment(0)

Recommended topics

Hot tags