check HDD Smart RAID – MegaRAID

Salah satu cara untuk mengecek kesehatan hard disk adalah dengan menggunakan smart data. Berikut cara untuk konfigurasi smart data menggunakan ubuntu server dan automatisasinya

Cek RAID statusmore

Jika menggunakan hardware RAID, perintah dibawah ini akan menghasilkan jenis HW Raid dan type yang digunakan

$lspci | grep RAID
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

Jika pengecekan diatas tidak menghasilkan apa-apa, silahkan cek apakah server anda menggunakan software RAID

$cat /proc/mdstat

Jika Server anda menggunakan MegaRAID, silahkan gunakan step dibawah ini

Install megacli – more

tambahkan repository /etc/apt/sources.list.d/megacli.list

deb http://hwraid.le-vert.net/debian stretch main

kemudian sign dan install

wget -O - https://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -

apt update
apt install megacli megaclisas-status

Check Device ID

$ megacli -PDlist -a0 | grep '^Device Id:'
Device Id: 2
Device Id: 3
Device Id: 4
Device Id: 5

tampilkan status RAID

# megaclisas-status
-- Array information -->
-- ID | Type    |    Size |  Strpsz | Flags | DskCache |   Status |  OS Path | CacheCade |InProgress
c0u0  | RAID-10 |   3635G |   64 KB | RA,WT |  Default |  Optimal | /dev/sda | None      |None

short test smart

smartctl -t short /dev/sda -d megaraid,2
smartctl -t short /dev/sda -d megaraid,3
smartctl -t short /dev/sda -d megaraid,4
smartctl -t short /dev/sda -d megaraid,5
tampilkan hasil smart data
smartctl --all /dev/sda -d megaraid,2 > drive2.txt
smartctl --all /dev/sda -d megaraid,3 > drive3.txt
smartctl --all /dev/sda -d megaraid,4 > drive4.txt
smartctl --all /dev/sda -d megaraid,5 > drive5.txt
Kirim notifikasi ke mattermost
nano /etc/smartmontools/run.d/10mattermost
#!/bin/bash
# purpose send alert to mattermost

curl -i -X POST --data-urlencode 'payload={"text": "One of HD in server1 is having problem\nPlease check!"}' https://mattermost.net/hooks/xxxcodexxx

edit file smartd.conf

nano /etc/smartd.conf

DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner

# LSI MegaRAID
/dev/sda -d sat+megaraid,0 -a -s L/../../3/02
/dev/sda -d sat+megaraid,1 -a -s L/../../3/03
/dev/sda -d sat+megaraid,2 -a -s L/../../3/04
/dev/sda -d sat+megaraid,3 -a -s L/../../3/05

UPDATE 31-01-18

Ternyata ada caranya dengan menggunakan script

based on this article

using –nagios parameter will result one line result and exit code

$megaclisas-status --nagios
RAID OK - Arrays: OK:1 Bad:0 - Disks: OK:4 Bad:0

creating checkRAID.sh script to catch the exit code

#!/bin/bash
# purpose to check RAID and smart HDD 
# using megaclisas-status with one liner --nagios option

url="https://mattermost.net/hooks/myUniqueCode"
servername="myServerName"
logFile="/var/log/SMARTstatus.log"

# this megaclisas command must be in the end to catch the result for if comparison
(megaclisas-status --nagios) 2> /dev/null

if [ $? -eq 0 ]; then 
  #curl -i -X POST --data-urlencode 'payload={"text": "RAID and Hard disk in $servername are in good condition!"}' $url 
  echo "RAID and Hard disk in $servername are OK!" >> $logFile
  exit 0 
else 
  myoutput=$(/usr/sbin/megaclisas-status)
  myoutput=$(echo "$myoutput"|tr -d "\-'\`\"")
  summary=$(/usr/sbin/megaclisas-status --nagios)
  payload="\"RAID or One of HD in server $servername is having problem.\nSummary: $summary \n Details: \n $myoutput\"" 
  curl -i -X POST --data-urlencode "payload={\"text\": $payload}" $url
  echo "HDD error: $payload" >> $logFile
  exit 1
fi

Automation

adding the script to crontab to run at 4pm every day

$crontab -e

0 16 * * * /usr/local/bin/checkRAID.sh

untuk server yang menggunakan Software RAID, anda bisa gunakan script dibawah ini
beberapa code ini saya dapatkan dari sini dengan sedikit editing

#!/bin/bash
# function to check smart status on hard drive
# install the smartctl package first! (apt-get install smartctl)

url="https://mattermost.net/hooks/yourCodeUnique"
servername="myLinuxMachine"
logFile="/var/log/SMARTstatus.log"

if sudo true
then
   true
else
   echo 'Root privileges required'
   exit 1
fi

echo -n "$(date +%F-%H:%M:%S) " >> $logFile
for drive in /dev/sd[a-z] /dev/sd[a-z][a-z]
do
   if [[ ! -e $drive ]]; then continue ; fi

   echo -n "$drive " >> $logFile

   smart=$(
      sudo smartctl -H $drive 2>/dev/null |
      grep '^SMART overall' |
      awk '{ print $6 }'
   )

  isPassed=$smart | awk '{ print $1}'
  if [[ $isPassed -ne "PASSED" ]]; then
    #echo "not passed"

    # for summary report use below
    #smarterror=$( sudo smartctl -H $drive 2>/dev/null )
    # for detail report use below
    smarterror=$( sudo smartctl -a $drive 2>/dev/null )

    smarterror=$(echo "$smarterror"|tr -d "\-'\`\"")
    payload="\"HD in server $servername is having problem!\n$smarterror\""
    curl -i -X POST --data-urlencode "payload={\"text\": $payload}" $url
    echo "HDD error!$payload" >> $logFile
  fi

   [[ "$smart" == "" ]] && smart='unavailable'

   echo "$smart" >> $logFile

done
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: