|
|
Windows Server Domain |
DL360G4p
Overview
HP QuickSpecs
DL360G4p User Guide
DL360G4p Maintenance and Service Guide
Disks |
The G4 was available with
Ultra320 SCSI (2), SATA (2)and Serial Attached SCSI
(4) drive bays. All of my G4s are fitted with 2 x
Ultra320 SCSI disks as shown here. |
|
Cooling |
Cooling for the CPUs is provided by 4 x dual fans
fitted to the removable tray at the front right of
the server, the tray also houses the power button
and diagnostic/status LEDs.
These fans have
been the most frequent cause of faults/failures on
my G4 servers and the issues are described in more
detail below. |
|
Cooling |
Cooling for the Power Supply Units is provided by 3
x dual fans fitted in a quick release frame between
the disk drives and the PSU conditioning board. |
|
Ports |
Rear view, showing the two hot-plug PSU bays
(empty), two PCI-X expansion slots (one half
length), USB, VGA, PS/2 mouse & keyboard connectors,
2 x Gigabit RJ45 network connections and RJ45
Integrated Lights Out (ILO-2) management port. |
|
Power |
SCSI & SATA Models: 460W
Optional (1 + 1 redundant) power supply
I
use redundant PSUs in my G4s that have redundant
network connections, that is, all servers except the
Domain Controllers. Windows server does not support
dual-homed NICs for DCs, so they have single point
failures already, e.g., the NICs and Ethernet
switches, so introducing another single point of
failure isn't really an issue. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Problems |
1. Cooling |
I have been
running between 4 and 5 DL360G4p servers for the
past few years and, as would be expected, I have
found them to be very reliable. Even the hard disks
that were not new when I got them have proved to be
very reliable.
In a couple of machines, I
use Smart Array Controllers which have Battery
Backed Write Cache capability, I have had to replace
a number of these batteries. They are typically
installed in pairs and the replacement batteries are
quite expensive. When they do fail, the Write Cache
is disabled, but the Array Controller's cache
continues to operate in a degraded Read Only mode,
so other than a small performance hit, there is no
issue running in this mode.
The most common
problem that I have seen is with the system cooling
fans. Things are probably not helped by the fact
that my servers are not running in a properly cooled
room, so that fans often run flat out to keep the
system temperatures within limits. With a few
servers running this can be VERY noisy but probably
also works the fans harder than they would in a
cooler environment.
The G4p has two sets of
fans, one set provides cooling for the CPU zone and
the other set provides cooling for the power supply
zone. The CPU zone fans are mounted in a removable
tray with 4 fan units installed and the PSU zone
fans are mounted in a quick release frame with 3 fan
units installed. Each fan module incorporates two
individual fans. Some redundancy is provided in that
a single fan in each zone can fail without stopping
the server from running. |
The right hand side of the server
incorporates the power switch (1 in the key below), UID (Unit
Identification) switch (2), a USB port and 4 diagnostic
LEDs (3 to 6).
The top LED indicates the
"Internal Health" status (3) The second
LED indicates the "External Health" status (PSUs)
(4) The bottom two LEDs indicate the network
adapter status (5 & 6) |
|
The control panel
provides some high level diagnostic information, but
detailed status is reported on the system board
LEDs.
3 = Internal Health LED. Amber =
Degraded, Red = Failed
|
|
LEDs
on the system board indicate single (Amber LED) or
multiple (Red LED) fan failure(s) for each zone.
If a fan unit is faulty, HP only supports
replacing the failed unit, in commercial use, it is
not worth trying to go down to individual fan level
and the diagnostics don't give any indication of
which individual fan(s) have failed. After I had two
faulty fan units, I spent a bit of time penny
pinching to try and rescue the working fans from the
faulty trays. This was a bit tricky when two fans
had failed, the failed fans could be either in one
fan module, or individual fans in two different
modules. Even with the server running, being able to
see fans rotating wasn't proof that the fan module
had not failed.
I now keep a CPU fan tray and
a PSU fan frame with n-1 fully working fans fitted
so that I can test suspect fans individually to
identify which one in a bank has failed.
|
2. System Board |
I have
had 1 total failure of a DL360G4p, the system
started overheating but did not report any fan
failures. The internal temperature would get so hot
that the system would shut-down and wait 5 minutes
to cool down before restarting and the cycle
repeating. There was not enough time to allow the
system to restart and let me get into the
HP System Management Homepage to determine which
temperature sensor was detecting the fault but I
suspect that it was the one on the system board,
rather then the ones in the CPU and PSU fan regions.
The system eventually failed and did not
boot, did not show any screen output and even the
iLO processor was unable to respond over the
iLO interface Ethernet port. Removing the option
boards, swapping out the CPUs, RAM, PSUs and fans
had no effect so I swapped out the system board (a
£10 eBay purchase) which fixed the problem.
Note: even with the Maintenance and Service Guide
(available at the top of this page), the on-board
LEDs didn't provide any information on the source of
the failure.
The "refurbished" system board
that I bought allowed the system to start and
appeared to run without problem. However, when I
looked at the individual temperature sensors from
the System Management Homepage, sensor 5 (system
board) went from reading 0 degrees to reading 128
degrees in one step. On my other G4p servers, this
value is typically in the mid twenty's. I suspect
that 128 is the upper range value and this sensor
had totally failed. Since the threshold value was
configured as 41 degrees with a threshold type as
"Caution", even though no warning was displayed on
the front panel, the system fans went into overdrive
to try and reduce this temperature. The vendor
replaced the board without question and the system
then ran at its normal noise level.
|
|
|
|
|
|
|
|
|
|
|