LAN Overview
 
Cabling
Domain
    Architecture
        History
    Server Hardware
        DL360G4p
            Server 2008r2
        DL360G5
    Server Roles
        Applications
        DC
        DFS
        DHCP
        DNS
        Issues
Power
Virtualisation
VOIP
Windows Event IDs
 
 

 

 

Windows Server Domain

 

DL360G4p

 

Overview

HP QuickSpecs

DL360G4p User Guide

DL360G4p Maintenance and Service Guide

Disks The G4 was available with Ultra320 SCSI (2), SATA (2)and Serial Attached SCSI (4) drive bays. All of my G4s are fitted with 2 x Ultra320 SCSI disks as shown here. 
Cooling Cooling for the CPUs is provided by 4 x dual fans fitted to the removable tray at the front right of the server, the tray also houses the power button and diagnostic/status LEDs.

These fans have been the most frequent cause of faults/failures on my G4 servers and the issues are described in more detail below.
Cooling  Cooling for the Power Supply Units is provided by 3 x dual fans fitted in a quick release frame between the disk drives and the PSU conditioning board.  
Ports Rear view, showing the two hot-plug PSU bays (empty), two PCI-X expansion slots (one half length), USB, VGA, PS/2 mouse & keyboard connectors, 2 x Gigabit RJ45 network connections and RJ45 Integrated Lights Out (ILO-2) management port.
Power SCSI & SATA Models:
460W Optional (1 + 1 redundant) power supply

I use redundant PSUs in my G4s that have redundant network connections, that is, all servers except the Domain Controllers. Windows server does not support dual-homed NICs for DCs, so they have single point failures already, e.g., the NICs and  Ethernet switches, so introducing another single point of failure isn't really an issue.
     
     
     
     
     
     
     
     
     
     
Problems
1. Cooling 
I have been running between 4 and 5 DL360G4p servers for the past few years and, as would be expected, I have found them to be very reliable. Even the hard disks that were not new when I got them have proved to be very reliable.

In a couple of machines, I use Smart Array Controllers which have Battery Backed Write Cache capability, I have had to replace a number of these batteries. They are typically installed in pairs and the replacement batteries are quite expensive. When they do fail, the Write Cache is disabled, but the Array Controller's cache continues to operate in a degraded Read Only mode, so other than a small performance hit, there is no issue running in this mode.

The most common problem that I have seen is with the system cooling fans. Things are probably not helped by the fact that my servers are not running in a properly cooled room, so that fans often run flat out to keep the system temperatures within limits. With a few servers running this can be VERY noisy but probably also works the fans harder than they would in a cooler environment.

The G4p has two sets of fans, one set provides cooling for the CPU zone and the other set provides cooling for the power supply zone. The CPU zone fans are mounted in a removable tray with 4 fan units installed and the PSU zone fans are mounted in a quick release frame with 3 fan units installed. Each fan module incorporates two individual fans. Some redundancy is provided in that a single fan in each zone can fail without stopping the server from running.
The right hand side of the server incorporates the power switch (1 in the key below), UID (Unit Identification) switch (2), a USB port and 4 diagnostic LEDs (3 to 6).

The top LED indicates the "Internal Health" status (3)
The second LED indicates the "External Health" status (PSUs) (4)
The bottom two LEDs indicate the network adapter status (5 & 6)
The control panel provides some high level diagnostic information, but detailed status is reported on the system board LEDs.

3 = Internal Health LED. Amber = Degraded, Red = Failed
 
LEDs on the system board indicate single (Amber LED) or multiple (Red LED) fan failure(s) for each zone.

If a fan unit is faulty, HP only supports replacing the failed unit, in commercial use, it is not worth trying to go down to individual fan level and the diagnostics don't give any indication of which individual fan(s) have failed. After I had two faulty fan units, I spent a bit of time penny pinching to try and rescue the working fans from the faulty trays. This was a bit tricky when two fans had failed, the failed fans could be either in one fan module, or individual fans in two different modules. Even with the server running, being able to see fans rotating wasn't proof that the fan module had not failed.

I now keep a CPU fan tray and a PSU fan frame with n-1 fully working fans fitted so that I can test suspect fans individually to identify which one in a bank has failed.
2. System Board 
I have had 1 total failure of a DL360G4p, the system started overheating but did not report any fan failures. The internal temperature would get so hot that the system would shut-down and wait 5 minutes to cool down before restarting and the cycle repeating. There was not enough time to allow the system to restart and let me get into the HP System Management Homepage to determine which temperature sensor was detecting the fault but I suspect that it was the one on the system board, rather then the ones in the CPU and PSU fan regions.

The system eventually failed and did not boot, did not show any screen output and even the iLO processor was unable to respond over the iLO interface Ethernet port. Removing the option boards, swapping out the CPUs, RAM, PSUs and fans had no effect so I swapped out the system board (a £10 eBay purchase) which fixed the problem.

Note: even with the Maintenance and Service Guide (available at the top of this page), the on-board LEDs didn't provide any information on the source of the failure.

The "refurbished" system board that I bought allowed the system to start and appeared to run without problem. However, when I looked at the individual temperature sensors from the System Management Homepage, sensor 5 (system board) went from reading 0 degrees to reading 128 degrees in one step. On my other G4p servers, this value is typically in the mid twenty's. I suspect that 128 is the upper range value and this sensor had totally failed. Since the threshold value was configured as 41 degrees with a threshold type as "Caution", even though no warning was displayed on the front panel, the system fans went into overdrive to try and reduce this temperature. The vendor replaced the board without question and the system then ran at its normal noise level.

     
     

 

 

 

 

mailto: Webmaster

 Terms & Conditions