The Yorktown Affair

 

compass.gif (28218 bytes)

"The USS Yorktown had to be towed into the Naval base at Norfolk, Va., because a database overflow caused its propulsion system to fail, according to Anthony DiGiorgio, a civilian engineer with the Atlantic Fleet Technical Support Center in Norfolk."

I now have the DiGiorgio article from the Prodeedings of June 1998. It is summarized in another place. The original is worth reading as well [click here to go to Proceedings and drill to June 1998 Issue, look for DiGiorgio], because DiGiorgio's point was not that the Yorktown OS is wrong, but that NavSea doesn't really understand what it is doing.  I also found another secondary source that quotes it. Incidentally I note that this page is now (January 1999) one of the major hits on a search on the Yorktown Affair. I hope to reopen the discussion. See also Geeks Bearing Gifts.

DISCUSSION OUTLINE

This remains an on-going discussion; germane contributions welcome.

flag.gif (12885 bytes)

The Chaos Manor discussion of this began with a report from Talin on the Open Developer's Conference. The Yorktown Discussion followed Talin's report. I have put the relevant portions of that discussion here. We have ranged rather farther than just the Yorktown incident, and it's all very relevant. If you are concerned about the future of the Navy, this is all worth reading.

Talin began:

For example, I’m sure that many of you have heard of the sad story of the USS Yorktown. For those that haven’t: The Yorktown has been completely networked using Windows NT as part of the Navy’s "SmartShip" program, to try and reduce the manpower requirements for large warships. However, because of an operator incorrectly entering a zero in an entry field, the system crashed and somehow corrupted the central database—as a result, the ship was dead in the water for two hours, and had to be towed back to port.

My comment was:

I had not heard that the Yorktown was towed back to port, and I suspect that's a bit of exaggeration; as I understand it, the ship was crippled by a "division by zero" error, and out of service for half an hour or so while the NT system was restarted. This is serious enough, of course.

But it turns out I was wrong, and the following on-going discussion ensued. My comments are in Times New Roman; I have used the Arial font for other participants.

Erich Schwarz [schwarz@cubsps.bio.columbia.edu]

Dr. Pournelle,

You write, in response to Talin’s special report on Open Source Development:

"I had not heard that the Yorktown was towed back to port, and I suspect that’s a bit of exaggeration; as I understand it, the ship was crippled by a ‘division by zero’ error, and out of service for half an hour or so while the NT system was restarted."

I’m sorry to have to tell you that there’s no exaggeration at all. As Gregory Slabodkin reported for GCN on July 13, 1998:

"The [Yorktown] had to be towed into the Naval base at Norfolk, Va., because a database overflow caused its propulsion system to fail, according to Anthony DiGiorgio, a civilian engineer with the Atlantic Fleet Technical Support Center in Norfolk."

The full article is available at:

http://www.gcn.com/gcn/1998/July13/cov2.htm

As for the time its control systems were dead in the water:

today’s (8/27/98) _Wall Street Journal_, on page B6, reports the downtime as *two and a half* hours.

If I were an enemy commander, I would pray to Marx or Allah to be given an NT-run Navy ship to fight. You can kill a lot of U.S. sailors in 2.5 hours if their ship won’t move without a tugboat.

--Erich Schwarz

Serious indeed. Division by zero isn't precisely a new problem. NT seems to have moved far too much of itself out of the totally protected areas in into areas where applications can mess things up.

I am beginning to wonder if LINUX wouldn't be better. Or Boy Scouts with flashlights…

Now a somewhat more optimistic view from Captain Ron Morse, USN

 

Ron Morse [rbmorse@ibm.net]

Jerry, I haven’t said much about the USS Yorktown business because I don’t know a lot about the specific incident. Please keep in mind, however, that the "Smart Ship" program (of which Yorktown is the lead), is philosophically an "X" program in the best sense of that term.

Much of what is happening on the ship is being done "in-house" or with the help of a small group of resident experts available to them, and their orders are to think _hard_ about all the things they do and try to find better and more efficient ways to them. Not all of them work, quite obviously, but that’s the beauty of the program...when they have a failure they write it up as something they _know_ doesn’t work and go off and try something else that might work better.

Now, getting your engineering system’s panties in a knot and having to be towed back home is embarrassing, but hardly the end of the world, especially when your charter is to go out there and try your best to break your best ideas to see if they hold up.

I know that a lot of the interest in this specific incident is because an NT system blew up and there are people who take great joy reading about NT systems blowing up. I can guess what’s behind that, of course, and I certainly don’t need to take up your time with that.

For me, the "smart ship" program is one of the most exciting things we’re doing today. It is an honest effort to get the Admirals and the bureaucrats out of the way and let a crew and some engineers prove they really do know how to make it work better. It looks to me like they’re on to something, even if not everything they do works as they would like.

Off to watch some Tomahawk video...

Regards

Ron Morse

 Which seems very reasonable to me; if the mission is to test the system, then "if it ain't broke you didn't push hard enough."   More as I learn more, but I feel better about the whole thing now.

And now this. We have come a long way from Talin's original report, but it all seems relevant.

 

Erich Schwarz [schwarz@cubsps.bio.columbia.edu]

Dr. Pournelle,

I read Capt. Morse’s comments with great interest. If the real point of the Smart Ship program is to be an X Program, with failures as part of the improvement process, no sensible person could object -- that’s certainly how one progresses in research science.

One point, though. Capt. Morse writes: "I know that a lot of the interest in this specific incident is because an NT system blew up and there are people who take great joy reading about NT systems blowing up."

In fairness I think it should be admitted that there’s more to it than *just* anti-Microsoft religious fervor. Many system administrators and serious hackers do not share the views of (e.g.) Richard Stallman, yet shun NT—not because they’re zealots, but because in plain painful fact it just *isn’t* the industrial strength system that UNIX is and Linux may well have become. The best documented review for this point that I’m aware of is at:

http://www.kirch.net/unix-nt/

 

Sample quote: "Hotmail, The Microsoft Corporation[:] This free Web-based e-mail service runs a mixture of Sun Solaris and FreeBSD. Apache 1.2.1 is the Web server software. After Microsoft purchased the company in December 1997, they tried to migrate to NT, but ‘. . . the demands of supporting 10 million users reportedly proved too great for NT, and Solaris was reinstated.’"

There are many other such data in Kirch’s painstakingly detailed review.

If Microsoft itself can’t get its own alleged "successor" to UNIX to do a job that it can get done with Solaris/FreeBSD, would it be too cultish of me to suggest that the Navy also consider using some such UNIX mixture? Surely, providing proper computational infrastructure for a major warship is at least as important and potentially demanding a job as providing the Hotmail service to civilian consumers?

--Erich Schwarz

Do understand, I have nothing to do with the Navy's decisions; all I can do is provide a forum for discussion. This site does seem to be read by a number of people, some of whom do have some influence. We'll continue this until it gets dull.

You ask good questions. I'll see if Microsoft wants to comment. Now for something just a little different:

 

 

Claud Addicott [caddicott@iname.com]

Dear Dr. Pournelle,

Although you never mentioned it in your novels, I believe the true cause of the collapse of the First Empire was a computer glitch similar to the one that happened on Yorktown.

A descendant of Bill Gates formed a software company that developed an operating system to run all Imperial Navy ships (actually it was an adaptation of existing systems).

Although the system never really worked right, all Imperial Navy ships, and soon all Imperial Government agencies adopted it (Upon orders from the Emperor, whom many believed was under the control of the far wealthier and more powerful Mr. Gates). Eventually all other software companies were either out of business or relegated to small niches markets.

In time, a new version of the software (actually version 5.0, but called IOS 2600, due to the company’s poor reputation for software versions ending in ".0") was released. The company promised improved performance. Instead, it caused the Navy nightmares. There were frequent crashes and the system wouldn’t recognize certain Navy hardware (Alderson Drive, weapons, navigation, ...).

At also created havoc in other Imperial agencies, causing financial panic.

Seeing the opportunity, many dissident faction within the Empire started the Secession Wars, which the O/S crippled Navy was unable to suppress.

Eventually, the first Empire collapsed.

 

Claud Addicott

No comment! I will have to ask the historians of the First Empire!

TALIN adds:

Jerry,

I understand the point about X-programs, and in general I think the idea of automating a ship using "dual-use" technology, as opposed to lovingly hand-crafted milspec one-offs is a good idea. Nor am I out to trash NT in particular, and in fact my original post does not in fact explicitly point a finger at NT, although there’s certainly a mild implication of guilt by association. And while I am a Linux fan, I’m not sure that I would go so far as to recommend any specific operating system for military operations.

The real point of my two juxtaposed paragraphs—one describing the Yorktown, the other the submersible—was to point out the difference that openness makes. You’ve consistently stated that one of the most important characteristics of a space vehicle is that it be "savable"—that if something goes wrong, there should be appropriate contingencies. I believe that a similar argument can be made for military systems. With something as complex as a computer system, something is bound to go wrong eventually, I don’t care what OS you are using. So it seems to be that the best solution is one that can be fixed on the spot. I assume that Navy engineers are competent to repair even the most complex of on-board systems. It seems sensible to me, therefore, that as much of the ship as possible should be accessible for repair. Going with a system which is closed-source is the equivalent of buying a motor or engine with the maintenance hatch welded shut.

Of course, there may not be time in a battle to be able to debug a database problem, which means that finding the bug before the battle begins is optimal. There are two strategies for doing this: First, to insure that the source code being used is spread far and wide throughout the military and commercial sectors, so that enough eyeballs are looking at it to make for an effective "peer review"; And secondly, to have an infrastructure which can test, identify potential problems, fix them, and distribute fixes in a timely manner. I believe that both of these things can be done more effectively in a system which is open than in one which is closed.

Part of this argument is based on my own experiences as a computer programmer in the USAF in the late 70s. What I spent my 4 years doing was writing COBOL programs to query failure-analysis databases at SAC headquarters. Each time a part would fail on a B-52, a record would be entered on to punch card, which would be put on mag tape, the tapes would then be sent to SAC HQ to be merged with all the other tapes from other bases. The "users" who were using my programs (who I rarely saw) used these queries to determine which parts needed to be improved or upgraded in order to reduce the potential failure rate. Of course, B-52’s don’t have on-boad machine shops, but I’m sure they did have some maintenance contracts with real teeth in them; And I suspect that you could get an upgraded part for a B-52 more easily and rapidly than you could get a new version of Windows NT, given that they have so many _other_ customers to worry about. But this last is only speculation, I don’t have any real knowledge of what Micrisoft’s reputation is with regards to mission-critical service contracts.

Then again, perhaps all of this is wishful thinking, based on watching too many Star Trek films:

Spock: "Doctor, would you care to assist me in performing surgery on a torpedo?"

McCoy: "Fascinating."

:-)

Talin (Talin@ACM.org) -- Systems Engineer, PostLinear Entertainment.

[http://www.sylvantech.com/~talin]

"The only mind-altering substance I use is breakfast."

====

 

Tom Weaver [taweaver@thegrid.net]

Re Yorktown:
As a ex-navy nuke [8.35yrs.] and a current Reactor Operator I have some definate opinions of computer controls and integrated systems.
Management likes them because they can be maintained cheaper than many separate dedicated controllers and they do a better job of integrated plant control. But the biggest reason is that they are more predictable and controllable than operators. However they don't know anything and can't react outside their programing. They also are not real good at recognizing garbage in their inputs or outputs.
The best operators are better than any computer, however a well written computer control system can be tweaked to work as better than most operators on a 24-hour-a-day basis. They key words are tweaked and well written. Time and money must be spent to do this. If a mistake is made in choosing the original approach finding someone with the courage to scrap it and start over with something else is rare. Usually downsizing is involved. Career blighting is not a popular activity. Courage in everyday work is a lot rarer than on the battlefield.
Have you ever told a boss that you won't do something that would save a million dollars because it is a non-conservative decision and have it stick? How likely is it for an Ensign to say that to a Captain or an Admiral? Yes a good one would take it and love you for it. Are there many left in the PC Navy?
The Navy has always been a very Political service and has gotten more so.

O's have a deep seated distrust of sailors. Like most management they have to trust them to do the work, have few if any ways of checking them, and are utterly dependent on them. Paper work was invented as a way to reassure themselves that something is getting done.
Tom Weaver

The Navy is a system designed by geniuses for execution by idiots. Isn't that from the Caine Mutiny? I wonder how true it was.

Now I wonder about the whole thing:

Rod Montgomery [monty@sprintmail.com]

In a letter to the "Comment and Discussion" department, published in the Aug 98 _Naval_Institute_Proceedings_, page 22, Captain Richard T. Rushton, then-CO of _Yorktown_, categorically states, "The _Yorktown_ was never towed as a result of any Smart Ship initiative. During my command, we lost propulsion power twice while using the new technology. Each time, we knew what caused the interrupt and were underway again in about 30 minutes. The September 1997 incident was caused by incorrect data insertion by a well-trained crewman. The _Yorktown_ returned to port using two FFG-7 emergency control units that specifically had been requested by me, and supported by other commands as a risk reducer. We knew there were some risks in the engineering development model propulsion-control system installed under a rapid prototyping development effort. The bottom line: The data field safeguards found in production-level systems were not installed yet in the _Yorktown_ by intention, until complete wring-out was accomplished."

That’s one paragraph of a two-and-a-half-column letter. Alas, although Naval Institute publishes parts of _Procedings_ online, this letter is not available online. But your local public library should be able to get you a copy. Or you might be able to borrow a copy from a relative. 8-)

My general impression from the letter is that _Yorktown_ was somewhere between an X-ship and a prototype when it suffered the casualties, and that the CO took prudent precautions—having emergency propulsion installed—to minimize adverse effects of expected breakdowns.

And while I have no great enthusiasm for NT, I must admit this sounds less bad than the initial report.

But the debate is worthwhile, I suppose. Still--

And more:

The following is from Roger, a Navy program office contractor:

-----------------------------------------------------------------------

"For the record, The USS Hopper is operated with NT systems (some of my report is on this web page; I also did a column, which I can’t post here yet due to copyright ownership: BYTE had exclusive rights for a period before I get them back). I saw and heard of no problems with the Hopper while I was aboard, and she had just completed a voyage from Virginia to San Diego through the Panama Canal. [JEP]"

Actually, the NT based systems in Hopper when you rode her have little to do with the operational missions of the ship. NT is the standard the Navy is using for administrative systems. Virtually all of the combat system and all of the machinery controls (related to propulsion and navigation) are still of various Milspec varieties. The commercially derived systems in Hopper at that point that are tied to the combat system were all Unix based. There is currently a strong trend towards NT in non-real time command and control systems, but for applications that control ordnance, a Unix variant or a real-time OS such as HP RT or Wind River Systems VxWorks is used.

On the "Smartship" front, the government is in the midst (I’ve lost track of where since my company didn’t, in the end, bid on the project) of contracting for the production version of the control system that was in Yorktown. The Navy’s Request For Proposals specified use of an NT environment as a requirement for the production version of the control system, so we’re beyond the X project stage.

And, finally, the designed by geniuses to be run by idiots quote is from the Caine Mutiny. Having been both, I don’t think I got much smarter when I stopped running them and started designing them (but I’ve always loved that quote anyway).

 

And this:

Ward Gerlach [wgerlach@earthlink.net]

OK....now that I've seen some of the debate, and presuming that most of what is claimed by Mr. Morse AND that Capt. Rushton's comments are accurately quoted, then I'm gently peeling myself from the overhead.

"X" programs are, in general, a Good Thing, especially the part about "run it 'till it breaks, then fix it so it won't break in that place, then run it 'till it breaks again!". The problem I have with this is that USS Yorktown is a front-line Naval vessel, expected to go in Harm's Way at the drop of hat. If I'm a Sailor, and I know that the power plant has severe potential problems, I'm not going to be real happy in this situation.

During the Vietnam years, when I went WestPac on a 1944-vintage repair ship (USS Jason, AR-8, retired in 1996), my big worry was GQ being called for real - but the main bits and pieces (engines and guns) always worked, with 1944 technology. She might have been fat and slow and old - but she could steam, and we could fight her if we had to.

But then we have this

Rod Montgomery [monty@sprintmail.com]

In view of Ward Gerlach’s comments—about _Yorktown_ being "a front-line Naval vessel, expected to go in Harm’s Way at the drop of hat"—I’m going to stretch the limits of "fair use" and copy most of another paragraph from Captain Rushton’s letter:

"The _Yorktown_ never missed an operational commitment, nor did she suffer a mission-degrading casualty during the Smart Ship assessment period. During that time she certified to deploy under the normal fleet training and assessment process. ... She went on to execute a five-month Caribbean deployment that included extensive Smart Ship assessments by the Operational Test and Evaluation Force and Navy Manpower Analysis Center. Both organizations evaluated the _Yorktown_ as fully capable in meeting the required operational capabilities in a projected operating environment. ..."

Sounds to me like _Yorktown_ was ready to go in Harm’s Way when she needed to be. _No_ ship fresh out of overhaul, let alone a ship being used as a testbed for new, unproven systems, is going to be combat-capable except by unbelievably good luck.

Mayhap ‘twere better to build from scratch testbed X-ships for things like Smart Ship. Or mayhap not. Even the aero-spacers don’t always do that—there are lots of highly-instrumented-but-otherwise-almost-stock airplanes flying around in test programs.

Bottom line for me, I think, is, how many experiments get done?

_Of_course_ most of them will "fail", in the sense of having breakdowns. If we knew the stuff was going to work perfectly, why ever would we be testing it?

And it sure seems to me that taking an existing ship for a testbed is going to get an experiment like Smart Ship going a lot sooner than waiting for a new hull to come off the ways!

Having said all that, I must agree with Erich Schwarz, that the Navy would do well to try a Linux-based Smart Ship, too. But I hope Mr. Schwarz doesn’t imagine that that one won’t have breakdowns, too?

The issue should not be breakdowns in a known-unreliable system under test. Making that the issue is only going to discredit the issue-makers.

The issue should be, which pattern for building software—the closed, secretive, Microsoft pattern, or the open Linux pattern—is more likely to produce software of the quality we need?

Wasn’t it Edward Teller who questioned the value of classifying anything other than short-lived military-operational stuff? Was it not Vannevar Bush who suggested, all those years ago, that the big advantage an open society would have, over a closed society, in a Cold War, would be that the open society would see its mistakes, and so be able to correct them?

How do that question, and that suggestion, apply to an operating system noone can effectively inspect, other than (maybe) its own developers?

And we need to get back to a _reasonable_ set of MilSpecs. The problem with MilSpecs is not that we don’t need any at all, but that—like all things bureaucratic—they get bloated. So—like all things bureaucratic—they need expiration dates, after which they need to be rewritten _from_scratch_ by people who know what it’s like to be out at the sharp end, and who have some clue what really matters out there.

The old MilSpecs specify _recipes_ in excruciating detail. We’re well rid of the resulting straightjacket on innovation.

But that doesn’t mean we need no MilSpecs at all. A gun’s gotta shoot, a plane’s gotta fly, a ship’s gotta float, and an operating system’s gotta be robust against application crashes.

Once we’ve pruned back to a reasonable set of MilSpecs, we’ll still need inspectors to make sure the vendors conform to them. That means being able to take the covers off the jet engine, to make sure the oil’s really not leaking. And it means being able to inspect the source code for the operating system, to make sure it’s really built robustly.

If the Microsofties were Motie Engineers, maybe we wouldn’t need that— just trust ‘em, you wouldn’t be able to understand what they’re doing anyway, and they never make mistakes. But they aren’t, so we do.

At one time I had to live with 27 linear feet of ASPR's -- Armed Service Procurement Regulations, an enormous set of looseleaf books revised daily to weekly, that took literally 27 feet of space in our unit office; compliance was impossible, and indeed the only compliance I know of was that when contracting officials visited us they checked to see that we had inserted the weekly revisions in the proper places and thrown away the papers replaced. It took a junior secretary about half time to keep them up.

Something was broken in the weapons design and procurement cycle even then; I'm told that even though the Aspers are now on several CDROMS rather than on 27 feet of paper, they're even more onorous; and far more important than weapons effectiveness. But I need not rewrite Strategy of Technology here.

potofgol.gif (580 bytes)

David G.D. Hecht [David_Hecht@email.msn.com]

Just a comment or two on the issue of MILSPECS and procurement regs. On the MILSPEC issue, we were beginning to work those issues at NAVAIR when I was still there back in the 1994-1995 timeframe. Don’t know how far we’ve gone since, but I do know that the OSD types at one point wanted to go almost whole hog in the other direction: if you wanted to include ANY MILSPECS in ANY procurement document you had to get written authorization from the Undersecretary of Defense for Acquisition! Shades of when John Lehman first came in as SECNAV and wanted to personally approve every ECP! Fortunately we were able to talk them into a more nuanced position.

As to the procurement regs, they seem to be on about a ten-year cycle of growth and pruneback. When I first came to work for the Navy back in 1981 we were still using the DAR (the new, improved name for the ASPR, given to us by someone in the Carter Administration who thought "procurement" was a dirty word [literally!]). In 1984, we switched over to the FAR (Federal Acquisition Regulation), a simplified and cleaned-up version of the ASPR/DAR which was then applicable to all US Government agencies, not just DOD. About a year later the DOD came out with the DOD FAR Supplement (DFARS) which were about as big again as the FAR.

So then in 1994 there came an initiative to completely revise the FAR/DFARS to incorporate commercial practices, which was done via the Federal Acquisition Streamlining Act (FASA) of 1994 and the Federal Acquisition Reform Act (FARA) of 1995. As part of that, for example, they essentially repealed the Brooks Act and now allow computers and software to be bought according to the same rules as everything else, instead of by their own set of really arcane rules, the FIRMR (Federal Information Resources Management Regulations). Don’t know how that is going but I do know that my friends back in the procurement business are mostly happier at the greater flexibility and simplicity of the regs.

It seems to me that MILSPECS and procurement regs in general are a product of the natural tendency of people to think that by making lots of very detailed and comprehensive rules, they can eliminate the need for human judgment and therefore the capacity for human error. Of course that is not possible, but in a bureaucratic environment, already highly structured, it must seem like a natural approach.

The problem with getting rid of rules and specs altogether is, however, one that cannot be ignored: one of my friend works as a contracting officer at the US Mint, where they have a statutory waiver to the FAR and have a set of procurement regs that are only a few pages long. Sounds great, doesn’t it? Well, unfortunately what it means is that the program managers ask for stuff to be done in ways that are contrary to any reasonable person’s good business judgment because there are no rules to stand in their way. And, unlike in the real world where that sort of thing will eventually put you out of business, the US Mint is liable to go on and on no matter how much cronyism and other foolishness gets done in their procurements. So there are, alas, no clear-cut answers to these issues, I fear.

v/r, dh

And I'm still listening.

potofgol.gif (580 bytes)

Mr. Schwarz doesn't give up so easily:

Erich Schwarz [schwarz@cubsps.bio.columbia.edu]

Dr. Pournelle,

Rod Montgomery cites the April 1998 _Proceedings_ of the U.S. Naval Institute. Intrigued, I checked its web site for other contents of the 4/98 issue. Lo, I found a full-length article about naval computation in that very issue, by Lt. Cmdr. Eric Johns. Its title, ominously, is "Beware of Geeks Bearing Gifts." The full text is at:

http://www.usni.org/Proceedings/Articles98/PROjohns.htm

-------------------------------------------------------------------

A striking excerpt:

"The military is a niche market; commercial systems may do similar things, but will they do everything necessary, on the scale required for military application? ... Consider vehicle tracking systems. There are commercially available systems that receive GPS updates from vehicles and plot their locations. But the military might need updates every 1-2 minutes instead of every 10-15 minutes. Those minutes could mean the difference between bombing the retreating hostile tank column and bombing our own advancing tanks ...

"For as-required strategic and nontactical applications, a PC-based system may be adequate. For real-time tactical applications the processing requirement is less likely to be met in the near future. One issue is the suitability of software to meet the needed data and security loads. Military weapon systems are enterprise systems that have large numbers of users and require high levels of security and reliability. At present, these capabilities are available in UNIX-based systems but not in Windows 95 or Windows NT. In other words, the pure PC environment is not ready for the demands of real-time tactical operations."

-------------------------------------------------------------------

In other words, as James Carville might put it: "It’s the robustness, stupid." Granted that the _Wall Street Journal_, normally considered a reliable journal of record, got the details wrong; granted that a civilian contractor may have been also been wrong. The real question isn’t whether NT can be made to sort of work in peacetime testing, but whether it should have been chosen over enterprise-scale commercial UNIX in the first place, and whether it makes sense for the Navy to be sinking time and effort into NT when NT may be qualitatively inferior to UNIX.

For instance, suppose a latter-day Rickover or Hopper decides that cheap supercomputing is desirable for Navy ships—for any number of reasons: to run missle-defense systems, to allow fast probability estimates of enemy disposition given complex partial data, to allow nearly-unbreakable fast cryptographic transmission ... whatever. If he or she wants to build a strong cheap supercomputer today, there really is only *one* choice: Linux in a Beowulf configuration [1]. For that reason alone it might well repay the Navy to consider at least having parallel NT and UNIX facilities on its ships.

Finally, I concur fully with Talin’s point about open source in weapons; given a choice, and given the increasingly science-fiction-like quality of war weapons that you yourself have extensively predicted, it is difficult to see *any* advantage in an OS that cannot be patched by Navy computermasters in the heat of battle, and by definition a closed OS will be less amenable to such agility than an open OS.

Erich Schwarz

[1] This has the amusing GenX name "Extreme Linux," but it really is a professional system for serious scientific use. Red Hat sells a CD for it for $30. A description of it is available at www.redhat.com.

Pot of Gold.gif (580 bytes)

David Cefai [davcefai@keyworld.net]

Dear Jerry,

Isn't the infallibility of Unix becoming something of a cult thing? I run a system using Unix SystemV/68K release 7. Frankly the thing is an absolute dog and has put me off UNIX for life. Maybe there are better versions out there but they would have to be _lots_ better to come up to Windows NT!

It seems that the Yorktown's systems went down because of a zero divide error. Pardon my innocence but if we know that an input field is going to be used as a divisor isn't it normal to do some sanity checking? It looks like a programming error not an NT failure.

I'm not evangelising NT. It may not be the best OS to run a ship. However the UNIX guys seem to be where the Apple guys were not so long ago. MAC OSs would save the world!

Regards

David Cefai

I think we now a lot to think about. I'm waiting for the Microsoft contribution and I am told that someone in the Pentagon is also preparing a statement for us. I'll believe in both of them when I see them.

bottle01.gif (14064 bytes)

And this from a serving officer aboard a cruiser:

The Navy brass is sold on the idea of the Smart Ship program. They want to reduce manpower on ships since sailors are the most expensive part of a ship. At the same time they are reducing the amount of time a junior officer has aboard ship for his first tour from 36 to 24 months and then his next assignment is 18 months. After this he goes ashore and off to Department Head school. At the same time we are trying to rebuild out supply system to have parts on hand only when they are required ("just in time").

I think the combination of these is going to get sailors killed. Officers will be Department Heads in charge of events they have never seen before because they were no aboard ship long enough as Division Officers.

Combine that with having /just/ enough people onboard and combat gets really interesting. Combat is Stochastic and not Deterministic. Therefore, you cannot tell me that we can accurately predict just how many people and what parts we need when the shooting starts. We can make a best guess but we are now replacing the attitude of err on the side of too much for err on the side of bringing a lot less...

I think that all programs used in the Navy in the future should be written in the Linux system of its mentality. You want to sell code to the government, send the source code. I'm very fearfull of the eastereggs that exist in current programs. A /lot/ of the programs written for Bill Gates are written in India and soon China. Why? The programmers are plentiful and cheap. But I'm not so sure they won't slip something in there that will shut us down at the wrong time or deliver all the passwords...

I also think that code written in open source processes are better because they can be checked by a number of people. My current thesis regarding the use of air defense systems is going to be written in Java. I'm in a working group where we each write code and then as a group we go over it and improve it. The results are better than what the proffessor who wrote the orignal simulation could do.

I fear that the Navy is ready to plunge into the Smart Ship program and make excuses to reduce costs. The USS Yorktown is not a cruiser in that it has been noted by those around her that she cannot do more than two missions at a time. A cruiser is supposed to be manned and ready to do at least three missions, a destroyer one or two...

So if we want a bunch of battle destroyers running around, be prepared for the consequences.

flag.gif (12885 bytes)

And from someone who doesn't mind being identified:

 

 

JimDodd [jimdodd@tcubed.net]

Jerry,

As background, I enlisted in the USN in Feb ‘61, and retired as a liutenant commander in March of ‘83. Along the way I qualified in subs (enlisted), went nuke (electrical operator and reactor operator). Then qualified for an officer program. Got a degree in physics at Vanderbilt at gov’t expense. Then went into destroyers just in time for Viet Nam. Combat tours off the coast and as an advisor (Bronze Star w/Combat V). Qualified as OOD Fleet steaming, Dept Head, XO and for Command. Plus designation as Surface Warfare Officer. Did a degree in Comm Engineering at the PG school. Then did R&;D at the Point Loma Navy Las (variety of names for that facility) in over the horizon targeting. Last tour was Navy’s Test Director for sub launched Tomahawk missile system (anti-ship, land attack and nuclear variants). After retiring I did 10 years as a defense contractor in C3I Navy &; Army tactical systems). Nowadays I own part of an engineering consulting biz in San Diego that sells mostly to state &; local govt’s.

The idea of reducing manning on a warship is really foolish. The Navy executives obviously don’t know or can’t remember who maintains the ship, and who does damage control when the ship is hit. I came up from below, and I chipped a lot of paint and cleaned bilges and other necessary "hands" work. I also did Damage Control School, and had a Repair Locker on a cruiser. All those billets the Navy is reducing through automation are the guys needed to maintain the ship, and to put our the fires and stop the flooding when you get hit. And you will get hit if you go in harm’s way.

I knew I would never make Captain or Admiral because I understood the impact of decisions on the troops, and I objected to a lot of the stupidity. Looks like it is still going on.

...regards...jim dodd

====

I remain interested and also very busy. I sure with I had a staff of editors. Anyway:

w2james@bigfoot.com

Dear Dr. Pournelle,

 I don’t know if you are still following the Yorktown NT incident, but I found a site that had a number of fairly interesting articles discussing this subject. If you are interested, try going to www.gcn.com and doing a search for "Yorktown". This is the site for the Government Computer News Magazine.

Sincerely,

Bill Hamilton

That is a good site for more on this, and I will do some wrapup essay when I get a bit more time.

======

New Mail:

I read with fascination your journal on the USN Yorktown.

As a hardened NT Engineer, I design and implement commercial Network systems for a variety of organisations big and small (even including some work for the RAF!) I have, and always will work on the principle that computers and computer software will fail. By their very nature, computers are complex systems that are designed and built by humans, which are by our very nature fallible. Why anyone could think that there is this utopian computer system out there that is infallible is quite beyond me. The trick with mission critical systems (both in the commercial organisation and on the battlefield) is that you have to expect them to fail, no matter how much you test or push a system, the fact that it has been built by humans means that there will be flaws that will not be discovered until the bomb drops (so to speak). I am now going to use two horrendous buzz-terms , Fault Tolerance and Redundancy. Its only recently that the technology has advanced enough to allow total fault tolerance to the degree of total and automatic failover. Obviously, if a fork-lift is put through a fibre - optic cable disconnecting a hub from the main network, then the consequences are no-where near as severe as someone putting a 1000lb'er through the same cable on a active warship, disconnecting the reactor from the control room. You just have to plan to the nth degree the tolerance required, preferably such that the ship will sink before the computers will give out. If this means weaving the ship with cable or implementing the control system on three different platforms then so be it! If you want a computer system to survive under fire then you have to give it the right bullet-proofing.

I have to degree with the general consensus though, I wouldn't put NT where someone's life could be at risk as a direct result.

As it turns out, it seems that NT wasn't to blame directly for the failure, so that's me in a job for a bit.

You don't have to reply to this, I'd just like to get that of my chest!

Thanks for your time,

Glen Kemp MCP

January 11, 1999

 

Brian Dugle [dugle@win.bright.net]

Dear Jerry-

I read the Yorktown Affair with interest. I was a fighter pilot in the Air Force until I retired (O-6) four years ago. Although I much preferred flying jets, I did have to put in a total of six years in the Pentagon mostly in the Acquisition community. The lamentations of those Navy folk who don’t like the reduction of ship manpower struck a note, but the realities of reduced budgets mean reduced manpower slots.

Like most officers, I spent time attending various staff colleges over the years. The studies we did at Ft Leavenworth (USArmy Command and General Staff College) and at Maxwell (Air War College) made the point to me. People are very expensive. We (the military) would all like to have more but peacetime the nation is not willing to pay.

Many sci-fi stories have addressed the social implications of the need for a common defense, citizen’s responsibilities, etc. One of the prices of an all-volunteer force is higher personnel costs. A draft or a universal service requirement might make some difference, and a two-year man (or woman) could fill the "numbers" requirement alluded to, although not much more.

This may be quite off the subject of using NT in critical military systems, but it really all comes down to the best use of the available dollars. I will be the first to admit that DOD does not always make the best choices, but allowing use of commercial systems is going in the right direction. Hiring the talent to keep up an OS in-house would not be cheap. To the extent that NT works, the cost of an operating system and occasional updates is miniscule compared to a staff to write and maintain a custom system.

Finally, I believe all information systems can be described fundamentally as databases. A few professionals writing and administering a commercial database system with customized interfaces (such as Visual Basic) makes sense to me. The front end is not so complicated that users cannot understand, improve, or even change it themselves. The back end understands redundant storage, fault tolerance, and hgih availability. This leverages the commercial systems to reduce manpower needs yet provide the potential for highly effective end systems. NT may or may not be the best choice, but it seems to me to be a viable candidate.

Brian Dugle

All good points. The price of Empire is high. Wealthy republics need a lot of defense else they are tempting targets -- at least history has always shown that to be. The temptation to meddle everywhere is strengthened the more military power one has. And of course Empire can look profitable although it seldom is given the increased defense commitment costs. We face that dilemma now, although I doubt many of those in Washington understand its nature.

==

 

 

Subject: Yorktown and spin-offs

I’ve read with great interest the comments about how the Navy appears to be attempting to downsize its warships.

One comment I wound up responding to in an APA I belong to had to do with the government adopting various practices "in order to run like a business". It occurred to me as I was pondering that situation that only governments and nonprofit agencies would ever have to worry about how to run themselves like a business. Businesses don’t. Whatever a business does is, by definition, "running like a business".

Rather than worry about running like a business, businesses worry about running at a profit. Their job is to maximize their useful product (i.e., what their customers pay for) while minimizing their inputs. If "just in time" supplying affects the product/input ratio favorably, a business will adopt it or continue using it. If it turns out to affect one particular business unfavorably, it will discard or refuse to adopt the practice. Or it will leave the marketplace.

When government agencies start adopting practices from business in order to "operate like a business", I assume that someone in charge has his priorities muddled.

............Karl Lembke <karl@annex.com>

Interesting observation. Thanks.

 

 

Dear Jerry,

I don’t know if you’re still looking for contributions on the Yorktown thing or not, but as a person who has worked in 24x7 mission critical systems (healthcare, where if a system goes down, people can die (and have, but not on my shift as far as I am aware)), I have a particular interest in this incident.

I’m not a NT or Linux bigot; I helped write and test the Matrox Millennium driver for XFree86, and I earn a decent living as an BackOffice security &; admin dude, and I used to program Macs, so I’m not one eyed about platforms. Each has a place in the computing world’s toolkit.

However, with this incident, many of the ad hominem attacks against NT seem to repeat endlessly, in some cases pointlessly so.

Microsoft, contrary to popular belief, does license the source code to various third parties. If having the kernel and binary source trees mattered so much, this can be arranged, and probably within the budget of the armed services, particularly if they made it a requirement. Several universities in the US have NT’s source code. Various ISVs also have the source tree - such as Insignia, who produce a partial emulator for Windows 95 and NT (SoftWindows) for various Unix platforms and the Mac, and provide the x86 compatibility bits for NT itself on PowerPC, Mips and Alpha.

NT has a real time scheduling class, but I’d agree with anyone who thinks that VxWorks or QNX would do a better job at this than NT or Solaris (about the only Unix OS with RT extensions) due to higher latencies than VxWorks or QNX. Particularly in fire control systems. Fewer lines of code == less chance for things to go wrong.

The Kirch site has many technical inaccuracies. I have pointed these out to him, but he seems unwilling to correct them. I personally have a problem with inaccuracies being used as ammunition for advocacy flame fests, as they will bite back eventually. As a practising NT/Exchange/SQL 7 admin, there’s enough real ammo for him to use against Windows NT without resorting to half truths or factually incorrect items.

For those who think that Extreme Linux is very trendy, I’d direct them to DCE and its direct descendant, DCOM. DCOM does everything that Beowulf can do, and more. It’s been part of NT for some time now. We use it at my current site to replicate a database between remote servers and provide a seamless distributed data store for potentially thousands of client workstations to manage elections. Needless to say, it’s very trendy.

I agree wholeheartedly with those who wrote in saying that there’s nothing like fault tolerance and redundancy to keep critical systems operational. I worked with a real time heart monitor that had two of everything and backed up its neighbours. Now that’s redundancy!

If Linux had the same testing quality that Windows NT is subjected to (among other things: every module has a test harness, there are strict test metrics that must be met before code hits the streets, and each programmer is assigned at least one tester), it would be of even higher quality. Sure, open source has helped fix a number nasty bugs, but Linux has a long way to go in the quality stakes.

With mission critical systems, different rules are in place:

 

These rules have not changed in over 35 years since the mainframe boys in the sixties started working on change management. I’d say that no major general purpose OS besides Tandem’s and some HA IBM offerings pass these requirements.

Andrew van der Stock

ajv@greebo.net

Smart Ship is an important concept; given recruitment problems in the Navy, more so than ever. And test cruises are for the purpose of testing…

==

Subject Smart Ship 11f

From: Jim Dodd [jimdodd@tcubed.net]

Dear Jerry,

As a retired Naval Officer who served in submarines (enlisted) and destroyers (commissioned) I would like to comment on the Smart Ship theory. If the Navy is really going to use computers and other automation to counter the effect of poor recruiting and retention results (ie too few crew members), then two other problems also must be solved. These are routine cleaning and maintenance, and damage control party manning.

Ships are like boats, holes in the water into which one pours money. Ships just need more money poured than do boats, a lot more money. A lot of this takes the form of labor. I am sure your son can tell you about the amount of plain old cleaning, chipping and painting it takes to keep a war ship going just sitting alongside the pier. I haven’t heard of any miracles in maintenance since I retired in 1983.

Many of the junior folks who aren’t on the ship now typically have a General Quarters assignment to one of the Damage Control Repair Parties. These are the guys who plug the holes from damage, and keep the ship from sinking. (Don’t worry about the fires, the flooding will put them out!)

==And that's where it stands in April, 1999.