Blog

horizontal line graphic

Subscribe to Kevin Southworth's Blog  Subscribe to my RSS feed | Categories | Search

on-site hacking

Wednesday, January 24, 2007 @ 1:46 AM :: 239 Views :: 0 Comments ::
Categories: Software Development

Something caught my attention on the show "Excavators" on the Discovery Channel tonight.  I don''t watch this show regularly, but sometimes I just leave the Discovery Channel on in the background at home and then pay attention when something interesting comes on.

Anyway, the show is about these HUGE excavating machines that are used in massive open-air mining operations to shovel earth into really big trucks that haul it away and process it.  I don''t know all the exact specs of the machine because I just caught the last third of the show, but if you stood next to one of its caterpillar tracks your head would only be about halfway to the top of the track!

In addition to all of the mechanical complexities of the shovel, it also had an extensive amount of electrical sensors and circuitry (I think they said something like 1 million feet of electrical wire!) all controlled by a central computer.  The operator of the rig has a monitor with a custom GUI on it that is used to control the digging operations and the movement of the rig in general.

Now, the show was mainly about the assembly of this thing at the dig site (since they''re too big to transport in one piece) and how they had these teams of mechanical engineers and electricians assembling this thing on-site over a period of about 8 months.  The show really focused on the problems that arose in the last 7 days before the "launch date" (the date in which it was scheduled to start digging).  Once the electricians had run all the wires and connected everything, it was time to do some test runs with the digger and the caterpillars, and this is where it caught my attention.

Since the entire excavator is controlled by a central computer (called the "Centurion" system) it makes sense that they would have the software guys on-hand.  The crew ran into a couple very major problems:

  • A software bug was preventing the main drum (the part that raises and lowers the digging arm) from turning/moving at all
  • A software "typo" (as described by the software engineer working on-site) was preventing the excavator from moving its caterpillar tracks (preventing any movement at all of the rig)

What was astonishing to me was the WAY in which these guys went about debugging:

  1. Plug in a laptop to the excavator''s computer system
  2. Open up some kind of IDE (couldn''t tell what it was or the language from the TV screen)
  3. EDIT THE CODE DIRECTLY ON THE EXCAVATOR
  4. Compile
  5. Test if the code change fixed the specific problem (it did)
  6. Fix other problems that arose as a result of the first code fix (there were several, including one that could cause one of the motors to catch on fire!!)
  7. Call it good and continue

Now, these guys were in a time crunch with a $10 million piece of equipment and they had to get it done, but I have to say: Where''s the rigorous QA process?

At my place of work, we try to avoid the scenario of making code changes on a live/production system because you never know the full ramifications of any change you make.  Even after extensive testing in a staging environment, critical bugs can still arise.

It was just kind of a shock to me that:

  1. These guys had such serious/critical bugs that they didn''t know about (they gave me the impression that they had just written the code that morning!)
  2. They (and their customer) were totally comfortable with making changes right on the live system

This kind of thing makes me think a lot harder about industries where software plays a much more critical role, such as medical monitoring systems and equipment...  "Sorry about the defibrillator that fried your heart, turns out we put that damn decimal place one spot too far to the right...."

 

Rating
Comments
Currently, there are no comments. Be the first to post one!
Click here to post a comment