Monday, December 21, 2015

UNCERTAINTIES OF THE QZ8501 INVESTIGATION

Hello World,

After a wait for almost a year after the QZ8501 accident, the final investigation report was released recently. The report indicates the cumulative effects of mechanical/system failures and pilot action as the cause of the accident. However, the report attributes pilot-action as the major cause of the accident. I looked up a few documents and the story seems to be something more than what’s being thrown at us in the form of official conclusion. In this post, I wish to look into the uncovered/ignored aspects of the investigation. Based on my understanding of the facts [as released] and further research, there were system inadequacies that forced the flight crew to attempt over-riding system-driven flight protocols and their efforts could not be completed on time, due to which the aircraft went down with the crew and passengers.

Findings of the Investigation:

Exhibit A:
Oddity 1:
Cracking of a solder-joint [of both channels] leading to a loss of electrical continuity indicates the electrical side of failure I had already warned of in my previous post on the accident. If there is no electrical supply to the system, the system will remain inactive unless it is powered by a back-up power line. 
The ambiguity that stands out to me is:

Was it ‘one’ solder joint that connected both channels [A&B]? 
Or
Was it ‘one’ solder joint for channel A and ‘one’ solder joint for channel B? [Both channels connected to the RTLU separately]

If it were two solder joints, one for each channel, then it should be two separate failures in which case the relationship between the two needs to be ascertained. 

If it was just one solder joint that connected channel A with channel B, then it is clear that the system did not have the needed electrical redundancy, indicating a serious design flaw considering the key flight-critical status of the equipment [the flight came to an end because this failed].

Irrespective of the real nature of the finding, the questions that remain are:
Why is ambiguity being installed in the very beginning of an accident report? 
Also failures such as these are usually a sequence of events. 
If the investigation could go the level of solder-joint failure, what led to the failure of the solder-joint? 
What type of load on the joint increase that it had to fail? 
Why is that side of the failure not being discussed in the report?

Oddity 2:
An ‘unresolved repetitive fault’ occurred 4 times during the flight and the responses registered indicated that the 4th response was not in accordance to that of the message. 

The question that stands out is:

For the first three times, the repetitive fault did not subside or revert based on the ‘message-compliant’ responses from the flight crew. 

Why is this not being discussed in the report?

If the procedural response fails to provide the relief for a crisis situation, the failure needs to be attributed to the ‘Non-fail-Safe’ nature of the system [a design flaw]. If the flight crew did not get the result of ‘message-compliant’ responses, then it is natural for them to resort to out-of-procedure efforts to resolve the crisis as the flight of the aircraft was in deterioration when such off-procedure input was given by the flight crew.

Why hasn’t the report indicated the ‘state of vulnerability’ of the platform?

My Findings 

Exhibit B1:


I came across this patent where the inventor has granted the assignment to Airbus Operations SAS [Assignee on the patent]. Now Airbus is the manufacturer of QZ8501 that went down. This patent, deals with the process for limiting the steering angle of control surfaces. 

The movable parts of an aircraft [the airframe to be specific], visible from the outside, apart from the doors and landing gear are the control surfaces [These are found on the wings, tail-plane and tail-fin]. These are used to control the aircraft’s flight at all times. 

This patent covers the process to control steering angle for control surfaces, specifically the rudder [the one on the tail fin that stands upright on the tail-end of an airplane].

Here’s Exhibit B2:


This description shown above clearly indicates the significance of the technology covered in this patent. Engine failure is being used as an example of abnormal flight condition and the observation describes the way an aircraft will behave when an engine fails. 

As per this observation, the rudder, the control surface on the tail-fin of an airplane will be required to bring back the aircraft to the flight line when an engine fails and the aircraft gets destabilized.  Through this observation, the patent implies, that the rudder will need higher steering clearance so it can produce the force necessary to bring the aircraft back to its flight line [control the destabilization faced by the aircraft].

Here’s Exhibit B3:


As shown in the figure above, the patent moves on to describe the traditional system’s inability to restrict the pilot from sending several commands. This indicates the intent of this technology/process as something related to restricting pilot activity in operating control surfaces under certain ‘abnormal conditions.’

Here’s Exhibit B4:


The patent then describes the outcome of such abnormal conditions when the pilot is allowed to send multiple commands to the rudder, will lead to dangerous failure modes. The patent specifically mentions that the tail-fin may break under these conditions. 

Now look at this picture below.

Exhibit C:

http://redwiretimescom.r.worldssl.net/wp-content/uploads/2015/01/redwire-singapore-air-asia-qz8501-black-box-1.jpg

The tail part was recovered separate from the rest of the airplane. Now there could be multiple theories for how the tail might have separated from the aircraft. However, the wreckage captured in the image directly reflects what the patent describes as a worst case scenario.

Now read this.

Exhibit D:


The Airworthiness Directive issued by FAA indicates the regulator’s acceptance of a finding that under certain conditions the allowable load limits on the vertical tail plane can be reached and possibly exceeded. The directive, as specified by the regulator, is valid for all Airbus model A318, A319, A320 and A321 series airplanes. The directive also mentions that the directive is valid from Dec, 29, 2015, indicating that the finding and directive have happened in the recent past. 

Such findings and directives going out so recently indicates that the A318, A319, A320, A321 series airplanes have so far been flying in a state of vulnerability and they have been lucky to escape such accidents simply because of the low probability of such failures. 

So the aircraft can fail under certain conditions. This is something that is always thrown out of the ‘Consideration Box’ used for any air accident investigation. The ideal case scenario of all-aircraft-are-safe is being thrust into our minds through carefully planned press releases and cover-up activities….All after hundreds of human beings went down into the ocean along with the aircraft.

Sadly the story doesn’t end here.

Exhibit E: 


EASA had issued a Proposal for an Airworthiness Directive dated 23rd July, 2014, indicating the need for a correction, failing which the aircraft will stand vulnerable to lose its tail fin during during flying conditions. The image above indicates that this directive was deemed applicable for a wide range of Airbus aircraft including Airbus A320-216, the one that went down.

The proposal also says Airbus has developed modifications within the Flight Augmentation Computer [FAC] to activate a conditional aural warning within the Flight Warning Computer [FWC] to prevent pilot-induced rudder doublets. 

So the European regulator was aware of such a ‘condition of vulnerability’ that Airbus aircraft were under and proposed an Airworthiness Directive [AD]. Irrespective of whether the AD was implemented or not, the fact that Airbus aircraft had vulnerabilities including that of losing the tail-fin during specific flight conditions. As I have always pointed out, the probability of occurrence of any event should have no bearing on the risk perception of the same. Potential impact, in this case is, loss of aircraft and therefore it should have higher priority. For some reason, frequency and probability of occurrence is being used as a key criteria for prioritising any risk-mitigation effort.

Further research back into the past reveals this:

Exhibit F: 


‘Safety First,’ The Airbus Safety Magazine dated January, 2005 featured an article on the need for enhanced pre-flight checks involving risk conditions, with one of them being the failure of the Rudder Travel Limiter Unit [RTLU]. The article, as you can see the image above, classifies it as an ‘event of undue rudder travel limitation.’

2005 is long back and even then, there had been vulnerabilities with respect to the RTLU in Airbus aircraft. This indicates that Airbus aircraft, like any other aircraft has always stood vulnerable to abnormal flight conditions, including those that concerned the RTLU, the system which failed during the QZ8501 accident.

Conclusion

Based on the oddities and interpretations I derive from the observation of the exhibits presented above, this is what I think happened with QZ8501.

The aircraft, like any other had remained vulnerable to specific abnormal flight conditions and the supplier’s effort to mitigate this risk [concerned with the RTLU] resulted in restricting the pilot’s capacity to take control of the aircraft. 

What was deemed as too-much-freedom for error resulted in a change that took too-much-of-necessary-capacity from the flight crew during those specific abnormal flight conditions. 

So when the aircraft went into what was deemed a ‘very-low-probability’ scenario, it deviated away from its dedicated flight-line and it had to be recovered. The flight crew had responded as per procedure three times to recover the aircraft but realised that the risk-mitigation change was not allowing them to do the same. The 4th time, the flight crew had no other choice but to try to disconnect the controls from the flight computer that was implementing the ‘pilot-restricting’ control criteria. Unfortunately, they couldn’t achieve the recovery in time and the aircraft went down with the crew and passengers.

While the nature of the abnormal flight conditions is still kept out of our minds through ‘official’ statements comprehensively covering obscurity and generality, we can recall what the patent describes as a possible abnormal flight condition: engine failure. This is why I wanted to know if the engine part of the wreckage was recovered and if yes, the details of the engine wreckage inspection. 

Summing up, many events must have occurred in a certain unfortunate sequence that led to system failure and the eventual loss of the aircraft QZ8501. We may never come face-to-face with the truth since the truth will stand in the way of a multi-billion dollar market that hangs on the ‘perception of reliability’ the aircraft brands thrust into the operators’ minds. However, we can be sure that solder joint failures leading to electrical discontinuity don’t occur out of the blue just like that. Also pilots are not fools to try to disconnect the flight computer unless the situation demands such an effort. 

When a report says, someone lost their life because of a knife entering their back and that the victim had by some means consciously maintained proximity to a sharp knife during the event, it is absolutely obvious that someone might have stabbed the victim. Just because the report doesn’t use the word stab doesn’t mean the victim absolutely walked into a knife protruding out of something uncertain [in this case the hands of the assailant]. Just my thought.


Regards,