From an unchecked rm -rf to almost starting World War III, we’ve experienced them. To celebrate the holiday Michael Carducci and NFJS attendees share their experiences.
Michael C.: You’re listening to The No Fluff Just Stuff Podcast. This is your host Michael Carducci and we are in Reston, Virginia. And I love the topic of this week’s episode. We got a very special request from the audience. It is the end of October and so we are doing a special Halloween episode. We’re talking about developer horror stories. And I feel like all these stories definitely have to start out with an rm -rf story because I think we all have one of these.
Well, one of the things that I’ve done in the past, and this is a cautionary tale for anybody listening right now, if you are taking a return value and concatenating, get together with an rm -rf, check your return value. Apparently, if you get an empty string for that path that’s supposed to be returned, and you plug that in with an rm -rf, it is going to do all kinds of unpleasant … Let me just put it this way, it’s gonna ruin your day. And it’s funny, throughout the entire keynote last night, I told a lot of stories about just dumb things that I’ve done. I was about to livestream this on Facebook, and then I realized at least two, maybe three of the people that I talked about are my friends on Facebook. And I don’t know if they need to be reminded of the dumb things that I’ve done or in at least one situation the dumb things they’ve done. Now, I’m joined today with one of the attendees, somebody who would prefer to remain anonymous. Going by your handle TechnoLust, go on and say a few words.
TechnoLust: Hello Reston.
Michael C.: And I don’t know if you wanna start with the story or I can start with one of my stories. I did promise-
TechnoLust: Let’s start with yours.
Michael C.: Okay. I did promise I would tell the story that revolves around the big red button in the data center, and this was my very first job. When I was younger, everybody knew I had a passion for technology. I was that kid in school that didn’t really hang out with other people. I spent all my time in front of the computers, and this was actually in England. So, the computers they had at the time were these Acorn RISC OS PCs, and I’m not even sure if that’s still a thing anymore. I don’t think so. I wasn’t even sure it was a thing back then, but those are the computers they had and I always had a lot of fun with those.
My form tutor was very kind and made sure that I got a placement in some kind of technology job during my year 10 work experience. And there’s a company whose name I’m not gonna mention. I feel like that would be a terrible idea. I was excited about it. I shouldn’t have been. It was pretty much as mundane and dirty as working in a auto-garage or anything else. So I worked in what they lovingly referred to as the machine room. I don’t think we had data centers back in those days. This was just a really big room with lots and lots of computers. There were a couple of line printers, which were just massive. They had a big laser printer that was unbelievably huge. They had Unisys mainframe, and one of my jobs every morning … Well, the first thing I had to do when I came in every morning, there was about a three foot pile of paper that came out of the line printer and I had to go through and separate every single print job. Find a cover page, separate it. I got a ton of paper cuts.
One of those print jobs was a report that ran every single night, and it was all the scratch tapes that I had to pull. There was a library of shelves. There was over 25 shelves, and all the scratch tapes were there. There’s a couple of, I’m just gonna say, older people in the room nodding along and they’re like, “Oh yeah, I remember doing this kind of stuff and …” And I would have to pull these scratch tapes and put them on a carousel, and they were just available just in case we needed the scratch tape for the mainframe. They were reel to reels. One of my jobs, I had to clean reel to reels. I had to lubricate the line printer. I thought I was gonna be programming and stuff like that. It was a really kind of crappy job.
The first day we got the tour, they take us around and they’re showing us around, these are the three hundred and something [ANT 00:04:08] servers, bunch of Uniq servers, there was the mainframe, a bunch of other little bits and pieces. They show us around and they give us the grand tour. And they said, “By the way, one of the things you really should know about is the Halon system. If you hear a sound like this, you’ve got three minutes to get out. If you hear this sound, you have one minute to get out. If it’s a constant tone, you’re dead basically. You’re not getting out anymore.” Good to know. Halon is, I guess, is kind of a pretty serious way to put out the fire. This was a locked room. This was a secure room because it’s a technology room. There’s all this equipment and everything else there.
We went through the day and I’m doing my mundane stuff and there two of us, there two interns. The other intern, about midday, said, “You know what? I’m finished with this little task. Can I go to lunch?” “Yeah, sure go ahead. See you in an hour.” And he walks across this big noisy room to the door. He pushed on the door and it doesn’t open. He pushes again. He tried pulling. He tries the other door, doesn’t open. And he yells across all of these servers and all the noise and the AC and everything else that’s going on in that room, he says, “How do I get out?” And the man who was on the phone, he’s talking to somebody. He’s trying to restart a crashed mainframe job and there’s all these other things going on. He leans over the phone, he says, “Push the button.” And the guy looks around and doesn’t see a button. There’s a weird doorbell looking thing that used to say press to exit but that had long since worn off. He didn’t press that. Next to it there was a green box. Break glass to exit. He didn’t press that. Next to that there was a red glass box, it says break glass in case of fire. Thankfully, he didn’t press that.
Another two or three feet away from the door and about a foot and a half higher was a big red button. The only button he could see, and he pressed it. The entire room went dark. There was this big sound of a thousand fans spinning down. The man who was on the phone trying to restart the crashed mainframe job, he swears, he says, “Oh.” Because his terminal went dark. And then he realized why, and he swore again. And then he realized that this means that the mainframe had a hard shutdown and he swore again. Then he realized what that’s going to mean. He swore again. Then he realized that all the other servers had the same thing happen to them. He swore again, and it was just wave after wave of realization. Meanwhile the other intern was just standing there, still pushing on the door trying to figure out … Didn’t even realize, didn’t even register what had happened. One of the few times I’ve ever seen a grown adult have a complete meltdown. It was pretty terrifying. I didn’t spend any more time in the machine room that week, and I never saw the other intern again.
Now, I’m assuming he got sent home, but all the other staff in the machine room were pretty upset about this. So, could be a Halloween story. I don’t know. But there’s so many of these. These are great things that we’ve all experienced. You know we’ve all done something dumb. One of my signature moves was as a database administrator. It was my job to go in to production, and the production database, and run some queries to clean up the database and get everything working again. And I’d go in. There was some spurious data in there that I had to go and delete and I’d write some fairly convoluted delete statement. Delete from table. Interjoin other table. Where convoluted criteria, whatever that looks like, and I would hover over the run key and then I would think, “Wait a minute. I should probably check this before I run this.” So I would add a new line, delete from, and then I would add another line below that, and I’d say select star from. And then I’d have the interjoin, and the where, and everything else and I would highlight my select statement and run it for records. Perfect. Select all, F5, ship it.
And I would wait, and wait. This is taking an awfully long time to delete four records. And then it would hit me right around the time the query analyzer thing would say, “Query completed, 1.9 million records affected.” You see that particular flavor of SQL would interpret delete from table, as its own SQL statement, and then write below it it would run select star from table, and everything else. This is when I learned about point in time recovery and full recovery model and things like that. I think I really set a bad example for other people on my team because I was a little senior at that time and I should have known better and I still did these things anyway. So somebody saw me do this once and they said, “Oh, you’re in trouble.” I’m like, “Haha, not my first rodeo.” So I’d go in there, and I’d restore to a point in time about five minutes ago. I just with we could do this in real life. And I would do restore to a point in time for about five minutes ago and then they’re like, “Wow. You’re pretty awesome.” I’m like, “I know.” Right? That’s what you want to engender in your team when you’re doing stupid stuff like that when you’re just like live deleting data in production.
I got a phone call one day about six months later from that same person, calls me up from the middle of the night. I don’t answer of course ’cause I’m not on call. That’s somebody else’s problem, but that person calls again, and again, and again. I finally answer it and I said, “What?” He said, “How do I do a point in time recovery?” “You idiot. Don’t do what I do.” We do a screen share. We jump on TeamViewer, whatever we were using at the time, LogMeIn, I don’t know. I see his script. I said, “So, tell me what happened.” He says, “I deleted the wrong data.” I said okay. He says, “So I decided to back up the transaction log.” I said, “Okay. Let me see.” And I see a statement. It says, “Backup transaction log with truncate only.” So he just nuked the transaction log. The backup job was gonna run in about 10 minutes. So we lost 23 hours and 50 minutes of data.
Michael C.: Yeah. Somehow this was my fault I don’t know. But Techno Lust I don’t know, I wanna hear your story. I talk too much as it is. Come a little closer so we an all hear you.
TechnoLust: Okay. Hello? Can you hear me?
Michael C.: Yes.
TechnoLust: Great. One of my most scary incidents as being a developer was when I was working on a project. I won’t tell which agency or exactly what the name of the project was. But basically think of records that come from smaller entities are uploaded to slightly larger entities that are uploaded to slightly larger entities. And these records can be used in courts of law. So kind of important that the records are high and tight as we say. So, kind of similar to a story that you shared last night. Some updates are going out to some of the various entities and stuff and then we do. We kind of waited for … Obviously you try to do your best, but bugs … People find bugs in everything.
Michael C.: Customers usually.
TechnoLust: Customers find bugs.
Michael C.: Pesky customers.
TechnoLust: We get one call that says, “Hey. Kinda noticed that we got these records.” I can’t remember if it was an ID or gate or something. But they said that there was like a … Something that was the same that shouldn’t have been the same. Some kind of numbering scheme that shouldn’t have been the same, but that’s an issue because these numbers shouldn’t be the same for these disparate numbers. So that was one. You put it in the backlog and doesn’t seem to be that much of issue. It was only like one or two records so kind of low priority. That’s fine.
A few days later, another. It’s a completely separate, lab, office, whatever calls and says, “Hey, we have these records. They’ve got the same numbers. We kinda wanna make sure something’s not …” Right. I was like, “Oh okay.” Again, I’m not the one taking the calls. These are just kind of tickets as they’re coming in, telling the developers, “Oh. Here’s what’s going on.”
So basically about a week or so goes by and we notice okay we’ve now got 10 offices calling with this information. Okay, so now it’s actually raised up to a fairly high level saying, “Okay. You know why it’s important that we can’t have these types of collisions. This type of information. We’ve gotta check to see what’s going on.”
Michael C.: Sure.
TechnoLust: If we happened to be called into court and this basically calls the data into question. So now I’m like, “Okay. Well, alright let’s try to delve in.” So it actually took us quite a while to try to figure what was going … It was kind of all hands on deck.
Michael C.: I know those.
TechnoLust: Yeah, right. And trying to figure it out and just agonizing, agonizing over it. Again, it was sometime ago, but I’m the type of person that … Even though again, it was actually legacy code. Nothing I touched but still I’m now on the project.
Michael C.: It always is.
TechnoLust: It always is someone else’s code. But I’m basically, practically, losing sleep over it because again m I gonna be the one called into court about this and about what’s going on.
Michael C.: Wow.
TechnoLust: Yeah. It’s pretty much how serious it was. At least, I again, I was probably overexaggerating it, but that’s just the way I am. So, luckily one time, one of the training officers again … So this is how all hands on … Someone who had experienced training other people how to use the software, she was pulled in to look at code, or look at records. That’s how all hands on deck it was. And she happened to notice. We’re pretty much like, “Oh. This happened on this date.” Or something, there was again … I can’t remember the exact detail, but we finally found the thing that it was. So it was like, “Oh, okay. But we’ve gotta push an update out to all the labs and all the offices and such.” It’s not gonna engender trust, but basically we just kinda really wanna explain what is going on.
Now, I like movies so I kind of … Basically I … Because again, the particular training officer … Again we had a conference for our customer coming up and she’s like, “You know people are gonna talk about this and ask about this. They’re gonna wonder okay is our data safe. Is everything’s fine?” [inaudible 00:16:18].
Michael C.: It’s probably fine.
TechnoLust: It’s probably fine. Right. But she’s like, “How do I explain this so that they can understand?” So I think at the time, the movie A Perfect Storm had just come out. I basically put it like that. Again, I can’t remember the exact details but basically I said, “This thing happened, and then this happened. And you put them together, you got Mark Wahlberg.”
Michael C.: Going down with the ship.
Michael C.: Actually, he didn’t go down with the ship. Oh, I just ruined it. I’m sorry.
TechnoLust: Beware of spoiler alert.
Michael C.: Yeah. I’ll put that in. Well, that move came out a long time ago.
TechnoLust: Yeah, the statute of limitations.
Michael C.: If you haven’t seen it yet it’s on you.
TechnoLust: I agree.
Michael C.: Yeah that’s right.
TechnoLust: But please no spoilers for Stranger Things. I’ve got …
Michael C.: Oh yes. SO I watched the first five episodes and oh my God.
TechnoLust: Okay, alright. I know. I know.
Michael C.: I’m kidding. I’m kidding. I haven’t. I’ve been working all weekend.
TechnoLust: But unfortunately because it was a government agency, she couldn’t actually put out. I sent her a nice slide showing the ship going up the [inaudible 00:17:16]. The huge wave at the end and Mark Wahlberg and George Clooney. I’m like, “You gotta put this in the slide.” She’s like, “Uh. Did you get permission from Universal Studios or whatever?” I was like no.
Michael C.: It’s fair use.
TechnoLust: Right. That’s what I thought. But she’s like, “No. This is …”
Michael C.: Parody.
TechnoLust: Right, yeah. So no. That’s when I kind of got … Well not the [inaudible 00:17:40] that I have now. But they basically describe me as … They were very thankful that came through, came with the little code that would update all the offices, all the software for all the agencies and offices. And they basically said, “Well. You did a great job.” But then I was like, “I’ve been pulling my hair out, been losing sleep.” And they’re like, “Well, you’re like a duck.” And so apparently like all the time I was showing my face and everything was just nice and calm. But apparently I was like frantically pedaling under the water, beneath the surface.
Michael C.: You are remarkable composed even right now.
TechnoLust: Right. Thank you. I just remember that was again, one of the biggest things where I was scared for my job, scared for national security, whatever it was. The wrong person was gonna go to jail or maybe the right person wasn’t gonna go to jail because the records might have been wrong. But again, it ended up being okay. The integrity of the data was fine. It didn’t affect the integrity of the data. So it ended up being … There was a light at the end of the tunnel, but again pretty scary time that … You know it took us quite a while just to find that one little thing. Again, perfect storm of things that occur that … Two separate incidents would have been fine by themselves but when they come together it creates a scary, perfect storm thing.
Michael C.: I’m totally gonna use that Perfect Storm screen grab in this podcast episode artwork. So that’s just gonna be there. That’ll be a testament to that.
TechnoLust: So if George Clooney though calls you, make sure you tell him I told you to do that.
Michael C.: Alright, alright. And he can talk to you personally.
Michael C.: That’s right.
TechnoLust: Yes, please. Give him my number and everything.
Michael C.: Is it @TechnoLust? Is that how he gets a hold of you.
TechnoLust: Pretty much, that’s my handle.
Michael C.: Alright. And it’s funny you tell these stories. Just having these conversations. All this stuff has just come flooding back to me. I’m thinking now about the time that I thought I was in some serious legal trouble. I made a small mistake that resulted in a very large class action lawsuit being filed. It was quite terrifying when I realized it was some dumb thing that I had done. So you just telling the story and I’m just like, “Oh my gosh. I forgot about that one.” And there are so many of these. This might be a super long podcast, I don’t know. We are gonna have to cut it off at some point soon.
No, I worked for a company, and we sent out alerts. I’m not gonna talk about the nature of the alerts because I don’t really wanna connect any of the clients or my history or anything else. Fortunately, I speak under a pseudonym and I do this because it allows me to be as honest about all the dumb things I’ve done in the past as I am. But there are enough people who know me and can connect me back to the real companies and things like that. So I’m not gonna mention any names or even what type of data, but we sent out alerts. On average, we’d send out about a million, million-and-a-half alerts a day. Some of these were email alerts, some of these were text message alerts, SMS alerts. They were different triggering alerts. And we just signed a new client. They had a whole bunch of people who subscribed to these different alerts, and we were moving from their system to our system.
We had to get all this data imported and staged and everything else like that. So they just sent us a dump. Here are all the alert subscriptions we had. We had to do migrations so we could port that into our format so we knew okay this is an XYZ alert and this is an ABC alert. We had the SMS phone numbers. We had the email addresses and all that stuff ready to go. Now, when we imported this, we were importing it before the go live date. So fortunately, our system had a flag enabled or disabled. Their system had a flag too enabled or disabled. But we imported everything as disabled. That way you can get all the data in, do some validation, do some testing, make sure this is all hunky dory, and then ship it. Throw the switch. Update all of those alerts from disabled to enabled.
So we did this, and they started going out, and a couple days later somebody’s attorney contacts that company. A few of those alerts, and by a few, it was a lot of them, they were disabled in the file they sent us. But we made everything disabled and then everything became enabled. You know how sometimes you change your phone number and then somebody else gets your phone number? Well this particular person got a text message about some kind of alert and thought I had something to do with her estranged ex husband and this constituted abuse or something, called her attorney, the attorney tracked it all down, figured oh these alerts are coming from here, contacted them with a subpoena and they tried to explain what was going on just to kind of level it all out. And they said, “No, look. It was just a misunderstanding. It looks like this is a new number and they get recycled. Somebody held the old number. We accidentally turned it on.” And the lawyer started doing some math. About this number of phone numbers recycle every year. And we send out this number of text messages, so the class size is about yay big, and we’re looking at damages of about X, so we could just multiple X by the class size, get my 50% of that. This sounds like a pretty good deal. This is a pretty good Wednesday.
And so a class action lawsuit was filed because I made a mistake ’cause it didn’t occur to me to notice that … I don’t know what happened. I’m not privy to the details of the lawsuit. They told me not to worry about it. How do you not worry about a eight, nine figure class action lawsuit that’s kinda sorta your fault? I was pretty stressed out about it for quite a while. And I went in and I reworked our enabled/disabled flag to make it a tinyint instead of a bit, a boolean. And so it became a bit [mask 00:23:41]. So we had several reasons. So we would just, bitwise and or or, and then if it was zero or non-zero, then it was disabled or enabled. We could have different reasons for turning it on, turning it off. Which I think is a terrible design. I think if you’re doing bitwise operations on database fields, you’re probably doing something wrong. But it was easy enough to get that to work with all the existing code. But I wanna open it up. I don’t know if you have another story?
Michael C.: I’m sure you do but not necessarily one you wanna share.
TechnoLust: That’s the biggest one.
Craig: Well, I’ve got two of them.
Michael C.: Say what?
Craig: I’ve got two stories for you. [inaudible 00:24:18].
Michael C.: Are they horror stories?
Craig: I would say they are both horror stories.
Michael C.: Would you mind coming up and telling the mic. Is that cool? Everybody give him a round of applause everybody. Thank you. Alright we’ll just switch over. Put those on. You’re gonna wanna get kind of close to the microphone. What’s your name? Can I introduce?
Michael C.: Craig? Please to meet you.
Craig: I’m Craig.
Michael C.: Alright, and we’re joined by Craig who also has a couple of stories to share. Craig, please. Introduce yourself.
Craig: Okay. I am a software engineer. I solve problems usually with computers, and I’ve been doing this for a long time.
Michael C.: I usually just create problems with computers but that’s just me.
Craig: Well, yeah but they don’t pay people to do that intentionally.
Michael C.: Intentionally. That’s the keyword there. But anyway please go ahead. I’d love to …
Craig: Well, before I was doing programming I was in the Air Force doing computer operations. One of those rooms with the big red switches that you remember.
Michael C.: Oh yes. I will not ever forgot.
Craig: Yes. Well, that big room had four mainframes. They were set in a pair so it was basically a [preprocess 00:25:28] server and an operational system. Two sets of those because one of them was gonna be live and one was available to either do hardware maintenance or testing, stuff like that. And the application that we were doing was basically … We were looking at satellites that were looking for very bright spots in Asia. If something that wasn’t bright got bright all of a sudden, they’d let somebody at NORAD know about it.
Michael C.: Understandable.
Craig: This was the 80s, pre-Berlin wall coming down. So it was kind of a …
Michael C.: Some international tensions I get it.
Craig: Yes. Well, okay so we had an AB switch to indicate to the folks back in America, which system was getting data from.
Michael C.: I’m already getting a little horrified about how truly …
Craig: You ever seen the movie …
Michael C.: … horror story this is gonna be.
Craig: … You ever see the movie War Games?
Michael C.: I was waiting for it.
Craig: At the end where they’re showing all of the launches, the smoke ’em if you’ve got ’em scenario?
Michael C.: Yep.
Craig: They had a test tape for that.
Michael C.: Oh.
Craig: And the AB switch was in the wrong position, so they sent the test data, everybody smoking ’em off live to NORAD. And the B-52s launched.
Michael C.: Oh wow.
Craig: Yes. The next day there was a four star general in the middle of a desert in Australia wondering why he was standing in a desert in the middle of Australia and a long line of people trying to explain why.
Michael C.: That might be the most amazing and most horrifying story I’ve ever heard.
Craig: After that, touching that AB switch was a two man procedure just like in a silo.
Michael C.: Do both have a key? Is that …
Craig: No, but it was … Okay, you both have to understand which way this is gonna go. If either person disagrees you don’t touch the switch.
Michael C.: Wow.
Craig: So that’s the really scary story.
Michael C.: Alright, I don’t know if you should have led with that one. That’s a finale right there.
Craig: Yeah, well the other tone I was thinking of, this one happened a few years later. I was doing a data conversion. Basically, a company had a home bred system for managing sales of their publications and we were converting to our system. Now, their system oddly enough was written in MUMPS, which you don’t want to deal with anything that was written in MUMPS but …
Michael C.: It sounds like a disease.
Craig: Massachusets University Medical Programming System.
Michael C.: Did they not do some due diligence when they came up with this? I’m gonna call this system Salmonella.
Craig: I’ve worked on some systems that had some really strange names. Many of them came out of government funded … So the MUMPS system was actually what the VA used to write all the software for Veterans’ stuff. But that’s actually not really relevant to this exercise. But in this case, the company … They dumped off for us 800,000 records to represent all their sales data, and we were doing a simple okay concert their records into our format and then ingest all the data one month at a time to simulate all of their previous sales so that we could post all of our inventory and all the sales in our system. Simple enough procedure. We’d noticed that about 400,000 of these records were marked deleted, which seemed a little odd. But we said okay if we see deleted we’ll ignore this record. And we went and ran the conversion and the inventory numbers were good to within 50 cents, which was more than good enough for these folks. But the sales numbers were all over the map. There was just no relations whatsoever to what they thought their sales for any particular product was and what we were seeing.
I spent literally six weeks going over every line of the conversion software trying to figure out what the sales posting software was doing that was not … It was completely all over the map. We had basically at one point had everybody for our company and all of the sales executives and operations people from our customer. All of them trying to figure out what’s going on. Then their ops manager thought of something and he pointed it out to us and said, “Rerun all of this but don’t ignore the deleted records.” And all of a sudden all of the sales numbers came out even. Turns out they had an operational problem. What they did was they basically … They were taking orders for journals. So they would have a room full of data entry people and they’d say, “Okay, today we’re doing all the orders for this particular publication.” And they’d set ’em up and everybody would do all their stuff, and then they might find out that oh, all those orders that we entered for that day, they had the wrong price. So they would go and delete all those orders and then re-key them.
So the end result was that they’d delete a thousand orders, but the deleting a thousand orders didn’t back them out of inventory, didn’t back them out of the sales numbers on their system. So they were inflating their sales numbers every time they did this and they’d been doing it for years. The vice president in charge of sales who had signed off on these sales numbers every month should have been fired.
Michael C.: Let me guess. He got a bonus. Does he work for Equifax now?
Craig: I don’t know about that, but he must have known somebody or blown somebody because he kept his job.
Michael C.: I’ve got to mark this as explicit now.
Craig: But those are my twp stories for tonight.
Michael C.: Oh my goodness.
Craig: I have many other stories, but those are my two for tonight.
Michael C.: Well, thank you very much Craig. I really appreciate it. Everybody give him a round of applause. I know we’re going a little bit over on time, but I’m curious can anybody top the brink of nuclear winter story?
Speaker 4: Probably not.
Michael C.: Probably not. Well I do wanna wish a safe and happy Halloween to anybody out there listening who’s going to be celebrating. It has been an absolute pleasure to be here in Reston, Virginia. It’s a really great crowd. How about one more round of applause everybody. We’ve got a couple more stops on the 2017 No Fluff Just Stuff Tour. Next week we’re gonna be in Chicago. Right after that, we’ve got Denver. We’ve got experience. [inaudible 00:33:03] Clearwater, Florida. We’ve got the G3 Summit in Austin, Texas. All the tour dates for this year and next year are online, nofluffjuststuff.com. I wanna thank you all for listening. I’ll see you on the road. [inaudible 00:33:14] drive