
CDR: Lift-off. The clock’s running.
LMP: Three seconds.
CDR: I got a yaw program -
LMP: Six seconds.
LMP: There’s 10 seconds.
CMP: Clear the tower.
CDR: Roger. Clear the tower. I got a pitch and a roll program, and this baby is really going.
CMP: Man, is it ever!
CC: Roger, Pete.
CMP: Twenty seconds.
CDR: That’s a lovely lift-off. That’s not bad at all.
CMP: Everything’s looking great. Sky’s getting lighter.
CDR: Okay.
LMP: Thirty seconds.
CDR: Looks good.
CDR: Roll’s complete.
I24P: This thing moves, doesn’t it?
CC: Roger, Pete.
CMP: What the hell was that?
CDR: Huh?
CMP: I lost a whole bunch of stuff; I don’t know - -
What the hell was that? That was lightning.
40 years ago last Sunday, November 14th, perhaps the second greatest engineering debug effort took place in NASA history. Second behind Apollo 13 was Apollo 12 and we hardly know about it. (Well maybe we could debate its place among Skylab and Solar Max Missions.) No great Ron Howard movie for this effort. Only a few months removed from the first lunar landing, Apollo 12 began its trip to the moon. As the rocket rose on a plume ionized exhaust, it created a big expensive lightning bolt. It only took 36 seconds into the mission before disaster came calling and then again 16 seconds later another bolt struck the rocket. The combination of the two strikes knocked out the astronaut’s attitude indicator, showing the orientation of the rocket. Telemetry systems that report the rockets status to the ground also began returning meaningless data and other systems began to quickly drop off-line as voltage issues ran through the rocket. Without telemetry data, ground control had no way of determining rocket status, and the mission director was slowly making a case in his head to abort the mission. In an abort, escape rockets would separate the small command module from the top of the whole rocket, and once it reached a safe distance the rest of the rocket would be exploded on purpose, thus bringing a quick end to man’s second attempt at the moon. Mission Director, Gerry Griffin was seconds away from issuing an abort command, without telemetry he had little choice but to save the lives of the three astronauts inside.
Then a surprising calm voice called out over the radio.
‘Try SCE to aux”
That voice was John Aaron. Aaron had seen the problem before and remembered a quick fix. He had seen a similar problem in the lab with the Signal Conditioning Equipment (SCE) system and by switching its power to auxiliary the problem was corrected.
So surprised by the un-scripted suggestion, the flight controllers asked him to repeat the request.
And a second time John said “Try SCE to aux”. Alan Bean inside the command module knew the switch and moved it the AUX position and telemetry was restored. NASA was able to verify that the rocket was on track and gave a “go” for the 1st staging while the astronauts began resetting the rocket’s power generation system. A few minutes later most systems were back to functional status and the astronauts joked about that being an interesting “simulation” they were given. In the highly scripted NASA world, such quick action and thinking is rarely needed. Thankfully 40 years ago this weekend, when it was needed - it was there.
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 17.11.2009
No comment »

I just finished Linked: How everything is connected to everything else. http://en.wikipedia.org/w/index.php?title=Special%3ABookSources&isbn=0452284392
The book is about scale free networks. It was extremely interesting especially since I had previously re-read Jeff Hawkin’s “On Intelligence.” I haltingly remember one little factoid for Linked ; that website with opposing views are rarely linked to together and that website with similar views tend to link together. This is seemingly a new introduction into our world’s forms of media and communication, the condensation of ideas into rarely interactive opposing networks. Since my job as CTO is to predict the future, I often wonder what this portends for us. The Tower of Babel story comes to mind… broken into opposing cultures and languages, the world’s inhabitants were lead into confusion and later this confusion lead to warring nations. Far from bringing the world together, the internet may indeed force us apart, into increasingly polarized volatile communities – that is a recipe for conflict. This conflict will happen on the virtual boundaries of the internet and not any current geo-political boundaries. If one looks for it you can see today at every level on the internet from large political discussions to minor sports team fans pages; an increase in polarization. I am beginning to fear the exploitation of that polarization even more the nuclear war with the USSR that I feared as child. Fortunately as a child, I had several people which shared my fear which help me understand there was a barrier to it happening – I don’t see the same barriers to prevent polarization and it happily goes on. The second law of thermodynamics is intact and the future is not a pretty as we think…
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 03.09.2009
No comment »

Found here…
Could a future organized DOS attack lead to a real war between two countries? it is an interesting thought experiment. What do you do about countries with weak internet rules that are harboring an almost privateering aspect to the web? I certainly don’t have answers but it is interesting to think about.. Something I first got on to 16 years ago is now so core to our existence that we have to think about saber rattling over someone else messing with it. Wonder what the next 16 years will bring?
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 23.05.2008
No comment »


The following is slanted toward holistic efforts, if you’ve have a surgical problem then instrumentation and knowledge alone should lead to an “a ha” moment and the fix. If you have not reached the fix with instrumentation for surgical problems add more instrumentation or knowledge. You can continue with hypothesis and test, because trying different things will be building your knowledge but it is generally a step that can be by-passed in surgical debugging.
For the holistic debug, this is really just the scientific process with a predefined goal of solving the problem. Using the information gathered in instrumentation make a hypothesis the root of the problem, apply a fix that corrects the hypothesis and test the results.
Read/Try
Frequently, you will know one or two components involved in the error. Print their datasheets (yes - print them) lay them out before you with all applications schematics and your schematics – look for differences. Read the datasheet again and again, finding areas of the datasheet that may pertain to your error. I love block diagrams of chip internals, study them, sometimes these pictures contain more information than written text.
Good Board/Bad Board
When a problem exists on one set of boards but not others – this is a gold mine. Add this as a data point BUT NOT THE FIX! Look at component date code, PCB revisions, build dates etc. Again be careful, See Phase IV Understand
The Change
Frequently in holistic debugging, you will hit on something that changes the problem. Often the change is small - a board that was failing 25% now fails 5% of the time. This is not the fix either but you are certainly on the right track. Look for the changes and understand them
Eliminate variables.
With every hypothesis and test, you get a result. Even if that result is “that change didn’t make a difference” this information should be tracked and documented to avoid needless retries. Eventually, you may arrive at “this is the only thing that could be wrong” by process of elimination. Can you eliminate a variable, circuit or component? Be careful, difficultly of a fix is not a reason for elimination… a missing pull-up on a un routed BGA is difficult to patch, but it is not a variable that can be eliminated – eventually you must go there. Many times I have helped fixed a problem by listening to what patch another engineer has said would be too difficult to try – this fact alone maybe the reason I identified the problem that started this blog.
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 22.05.2008
No comment »

Results for the Regatta are posted…
Wind prediction was way off - white caps and almost a blow out on Sunday.
Had one good race where I was in the hunt… not sure what I did so I chalked it up to “blind squirrel finds acorn.” I think my letting out the bridle to reduce weather helm may have helped. One moment of inattention rounding the last mark and generally a bad last up wind leg (I was first around last downwind mark) cost me a 1st. I think lack of local knowledge killed me on Sunday.
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 22.05.2008
No comment »

Going to a regatta in Memphis this weekend. I sail on a lightning class sailboats. I am not very good. Unfortunately winds look light this week, I hate going out for a “bob” hopefully wind forecast will pick up.
Working on next installment of Debug Posts, sorry I have neglected this - coming soon.
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 14.05.2008
No comment »

Let me break debugging into phases… ( this will take several blogs)
Phase I – Discovery
I borrow this term from the legal world because in affect that is what it is… This phase is about the exploration of variables. If it is a customer problem, first you need to replicate the problem as exactly as possible. After this, explore the variables that make the system fail. I have a desire to put in several causes here, but that would be nearly impossible - witness the origin of the word “debug”
Sometime this phase goes so fast it is almost not worthy of calling is a phase, but other times some extra time in discovery is need to understand all the variables in play.
Phase II – Instrumentation
This is really an extension of discovery with the required addition of some techie tool - if it ain’t there. That tool could be an LA, Oscope or Trace Debugger - whatever - Holistic tools for holistic problem (Oscopes, voltmeters, thermocouple) and surgical tools for surgical problems (LA, JTAG, ICEs). Sometimes I see engineers get stuck in discovery without any instrumentation, because of not knowing where to begin. If the board fails at 50C, they start doing tests to see if the ramp rate of the temperature is also a variable. All the while doing this without any sort of instrumentation attached to the device under test. They then run several tests to see if there is a correlation between some new variable when a failing variable is already known. Maybe ramp rate is a variable, but getting additional instrumentation is going to give you more clues. More clues equals more theories, and more combinations and reductions of those theories. The human brain is an amazing device, but it relies on inputs, and the more the better. In the world of electronics that means instrumentation. It may be a fear of the LA or some other test tool or frightening resolution that getting out the LA or Oscope is an indication that this will be a tough problem that leads to lack of instrumentation. Many times I had to say “you ain’t gonna fix that through osmosis – get something on it to see what’s going on” to a younger engineer. This seems like a trivial step, but I am always amazed by the inertia against instrumentation.
Don’t get lost in discovery without instruments: Yes some discovery is possible without it, but rarely is the problem fixed without instrumentation.
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 09.04.2008
No comment »

I firmly believe that all problems with any failing electronics fall into two categories – the first I call a surgical problem, and the second I call a holistic problem.
The first, surgical, is easily identifiable: It is a problem that, given the correct set of inputs, will always fail. Perhaps it is in software code, a logical error in an FPGA, incorrect register bit masking or incorrectly connected signals in schematics. These are the easiest to debug by the person knowledgeable about that part of the failing designs, and these are typically the kind of debugging efforts that are involved in prototypes.
Read all »
Posted under Joe McDevitt by joemcdevitt 02.04.2008
No comment »

Well, I have neglected my blog due to travels and other tasks. The final task included me being asked to assist Engineering with a perplexing problem on one of our newest CPU boards. One thing that DTI has always been gifted with is an abundance of excellent debuggers. I have visited many technical companies over the years and have worked numerous issues, and I have been shocked at the inability of their engineers to comprehend even the basics of debugging. At DTI — being in Mississippi — we don’t get the best silicon vendor assistance, so when problems arise, we largely don’t even bother to get them involved during debug; it’s a time consuming task that rarely bears fruit. Many silicon vendors have stated to me that they hate when DTI files a problem report with them, because they know the problem is theirs. The combination of lack of silicon vendor support and location means we grow our own debuggers here, and they are, in my mind, some of the best. I think I am pretty good at it myself (mainly because I made a lot of mistakes and have seen virtually everything…), so over the next few weeks perhaps I will share some of my own personnel theories on debugging.
Including but not limited to:
-Holistic versus Surgical Debugging
-Debugging – The Process
-A word on Datasheets
-Debugging and Documentation
-Types of Hardware Issues and Basic Reasons They Occur.
- A word on Credit
-Managing the Debug – A Guide for the Smart (and Not-So Smart) Manager of the Debuggers
-When all Else fails try this… (Joe’s Big Ass Cap Theory and others)
So stay tuned….
- “Experience is the name everyone gives to their mistakes.”
Oscar Wilde
-
-
Joe McDevitt
- CTO
Posted under Joe McDevitt by joemcdevitt 31.03.2008
No comment »

I went to a Scope meeting and to the MVACEC show a few weeks ago. I could tell that many vendors were nervous about the future of ATCA. The telecom manufactures didn’t reduce those fears, mentioning “Soft NEBS” and 1U/2U boxes continuously. So a minor message was that the cost of ATCA is too much (same message from last year’s show). I remember one telecom manufacture last year stating that on the CPU node vendors are asking a 500% markup over BOM. I’m not sure what BOM that dude was using, but he is orders of magnitude off. Sure, there are 1U/2Us that are cheaper than ATCA, and at 1G Ethernet (3.1 Option 1) speeds the difference may be significant. But if you install on 10G Ethernet, now ATCA is much more cost-competitive.
DTI’s 10G ATCA switch single piece price is one third the list price of a Cisco switch offering NEBS and 10G Ethernet. Two switches are needed for redundancy, so that is around $20K savings, if you deploy ATCA at 10G. That is still a $14K saving when you add in the chassis and shelf managers. ATCA CPU nodes are cheaper than their 1U NEBS brethren. If your solution doesn’t need 10G speeds then maybe ATCA is not right for you, but at 10G that story radically changes.
Joe McDevitt
CTO
Posted under Joe McDevitt by joemcdevitt 27.03.2008
No comment »