Isilon Troubleshoot Guide

Error#1 : Node's Baseboard Management Controller (BMC) and/or Chassis Management Controller (CMC) are unresponsive. Hardware is no longer being monitored

Issue : The Baseboard Management Controller (BMC) and/or Chassis Management Controller (CMC) on S210, X210, X410, NL410 and HD400 nodes can sometimes become unresponsive. When this issue occurs, the affected node may produce an event (900010011)

Resolution:

Check below version are matching, if not you need to upgrade to the recommended

BMC (Baseboard Management Controller) firmware to version 1.25 and above 
CMC (Chassis Management Controller) firmware to version 02.05 and above
OneFS version to 8.0.0.4 or 8.0.1.1 or newer

How to check BMC & CMC firmware versions ?
IsilonCluster1-X# isi upgrade cluster firmware devices

Device Type Firmware Mismatch Lnns
---------------------------------------------------------------------------
BMC_S1400FP        BMC      1.25.9722               -         1-9,14-16,19
BMC_S2600CP        BMC      1.25.9722               -         13,17-18
BMC_S2600CP        BMC      1.20.5446               -         10-12
CMC_HFHB           CMC      01.02                   -         10-12
CMC_HFHB           CMC      02.05                   -         13
CMC_Yeti           CMC                           -         8
CMC_Yeti           CMC      00.0b                   -         1-7,9
CMC_Yeti           CMC      02.05                   -         14
CMC_HFHB           CMC      02.07                   -         17-18
CMC_Yeti           CMC      02.07                   -         15-16,19
......................
---------------------------------------------------------------------------
Total: xx

If Firmware is missing for particular Lnns, maybe that particular node is not responding, you may need to reboot the node.

How to Check BMC Version on Particular Node:
IsilonCluster1-8# /usr/bin/isi_hwtools/isi_ipmicmc -d -V -a bmc | grep firmware IPMI firmware version = 01.25

Reset BMC/CMC
- To reset the BMC on all nodes in the cluster, run the following command:
# isi_for_array -s /usr/bin/isi_hwtools/isi_ipmicmc -c -a bmc

- When this completes, reset the CMC on all nodes in the cluster by running the following command:
# isi_for_array -s /usr/bin/isi_hwtools/isi_ipmicmc -c -a cmc

- If the cluster contains HD400 or X210 nodes, also run the following command to reset the CAR:
# isi_for_array -s /usr/bin/isi_hwtools/isi_ipmicmc -c -a car

for upgrade the Firnware, refer the below KB Article Number 000466373
https://emcservice.force.com/CustomersPartners/kA2j0000000R5lGCAS


Isilon - The /var partition is near capacity


Issue: When the /var partition reaches 75%, 85%, or 95% of capacity, an event is logged and an alert is sent.

Fix: Rotate logs
If the /var partition returns to a normal usage level, review the list of recently written logs to determine if a specific log is rotating frequently. Rotation can resolve the full-partition issue by compressing or removing large logs and old logs, thereby automatically reducing partition usage.
Check the percentage of free isilon nodesOpen an SSH connection to the node that reported the error and log in using the "root" account.
Run the following command:
df -i |grep var |grep -v crash

Output similar to the following appears:
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/mirror/var0 1013068 49160 882864 5% 1650 139276 100% /var

If the %iused value is 90% or higher, reduce the number of files in the /var partition using one of the methods described below:
Remove files that do not belong in the /var partition.
On the node that generated the alert, run the following command to list files in the /var partition that are greater than 5 MB:

find -x /var -type f -size +10000 -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

In the output, look for files that do not typically belong in the /var partition. For example, a OneFS installer file, log gathers, or a user-created file.
Remove the files or move them to the /ifs directory. If you are unsure what to remove, contact Isilon Technical Support for assistance.
Determine if a process is holding a large file open

You can use the fstat command to list the open files on a node or in a directory, or to list the files that were opened by a particular process. A list of the open files can help you monitor the processes that are writing large files. See How to use the fstat command to list the open files on a node, 16648 .

If neither of the above tasks resolves the issue, continue with the following solution:
Limit the rollover file size and compress the file
Open an SSH connection on any node in the cluster and log in using the "root" account.
Run the following commands to create a backup of the /etc/newsyslog.conf file:
cp /etc/newsyslog.conf /ifs/newsyslog.conf
cp /etc/newsyslog.conf /etc/newsyslog.bak

Open the /ifs/newsyslog.conf file in a text editor.
Locate the following line:
/var/log/wtmp 644 3 * @01T05 B

Change the line to:
/var/log/wtmp 644 3 10000 @01T05 ZB

These changes instruct the system to roll over the /var/log/wtmp file when it reaches 10 MB and to compress the file with gzip.
Save and close the /ifs/newsyslog.conf file.
Run the following command to copy the updated file to all nodes on the cluster:
isi_for_array 'cp /ifs/newsyslog.conf /etc/newsyslog.conf'

If other logs are rotating frequently, or if the preceding solutions do not resolve the issue, run the isi_gather_info command to gather logs, and then contact Isilon Technical Support for assistance.

Ref: EMC KB Article 000471789

/var/log/isi_phone_home.log can grow without bound and fill up /var partition, causing issue with CELOG event generation and other operational issues with processes

Issue: /var/log/isi_phone_home.log can grow without bound and fill up /var partition, causing issue with CELOG event generation and other operational issues with processes

Cause: Log rotation does not work to auto rotate log file generated by isi_phone_home

Fix: if /var is getting too full (>85%) for any of the nodes, then run the following command:
# isi_for_array 'truncate -s 0 /var/log/isi_phone_home.log'

Ref: EMC KB Article Number 000516735

Isilon Health Check script

#cd to the Isilon Support Directory
IsilonCluster-1# cd /ifs/data/Isilon_Support

#Copy the Script from EMC through FTP
IsilonCluster-1# curl --disable-epsv -O ftp.emc.com/pub/rcm/Isilon/tools/IOCA
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  499k  100  499k    0     0  70871      0  0:00:07  0:00:07 --:--:--  233k

#Run the Script
IsilonCluster-1# perl IOCA

Output looks Similar like below:
Isilon On-Cluster Analysis                        0.1206
Live Cluster Analysis                             Wed Sep 19 10:43:08 2018
Cluster Name                                      IsilonCluster
Cluster GUID                                      XXXXXXXXXXXXXXXXXXXXXX
Node Count                                        6
Current OneFS Version                             8.0.0.4
Contact Information                               PASS
Email Settings                                    PASS
System Partition Free Space                       PASS
Drive Support Package (1.26)                      INFO
FCO F042415EE                                     PASS
FCO F031617FC/KB469133                            PASS
Highly Recommended Patches                        PASS
Node Firmware (10.1.6)                            INFO
ETAs                                              PASS
Hardware Status                                   PASS
BMC/CMC Hardware Monitoring                       PASS
Boot Disks                                        PASS
BXE Nodes                                         PASS
  DETAILS: 22 nodes have BXE interfaces: 1-4,9-26
Drives Health                                     PASS
Drive Load                                        PASS
Drive Stall Timeout                               PASS
Duplicate Gateway Priority                        PASS
Processes                                         PASS
IB Interfaces Active                              PASS
Memory                                            PASS
Mirror Status                                     PASS
Node Compatibility                                PASS
Access Zones                                      PASS (3)
OneFS Version                                     PASS
KB507031                                          PASS
Authentication Status                            PASS
Cluster Capacity                                  PASS
Cluster Encoding                                  PASS (utf-8)
DialHome & Remote Connectivity                    PASS
  DETAILS: Current Service States:
  DETAILS:    ConnectEMC Service is Enabled
  DETAILS:    RemoteSupport (isi remotesupport) is enabled
Critical Events                                   PASS
File Sharing                                      PASS
HDFS                                              PASS
SPN List                                          PASS
Cluster Health Status                             PASS
IDI Errors                                        PASS
Jobs Status                                       PASS
Jobs History                                      PASS
Licenses                                          PASS
LWIOD Log                                         PASS
Listen Queue Overflows                            WARN
  WARN: Listen Queue Overflows count over 50,000 on the following nodes: 1
NFS                                               PASS
Kernel Open Files Count                           PASS
Storage Pools                                     PASS
Cluster Services                                  PASS
SmartConnect Service IP                           PASS
Snapshot                                          PASS
SyncIQ                                            PASS
Cluster Time Drift                                PASS
Cluster Time Sync                                 PASS
Cluster Time Zone                                 PASS (America/Los_Angeles)
Upgrade Agent Port                                PASS
Upgrade Status                                    PASS
Node Uptime                                       PASS (100 days)



Physical Server with Unity Boot LUN got rebooted during Unity SP Reboot

Issue:  Physical Server with Unity Boot LUN got rebooted during Unity SP Reboot

Errors:
Warning Host 1076      User1  The reason supplied by user XXXXX for the last unexpected shutdown of this computer is: Other (Unplanned) Reason Code: 0xa000000 Problem ID: Bugcheck String:
Error SHost       1001      Microsoft-Windows-WER-SystemErr           The computer has rebooted from a bugcheck. The bugcheck was: 0x000000d1 (0xffffe8013242b000, 0x0000000000000002, 0x0000000000000000, 0xfffff8017ad60e81). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: DDMMYY-35046-01.

Cause: this is a known issue with Microsoft Security patch and PowerPath. one of the security patches installed which caused the server to bluescreen and crash

Fix:
This is a known  issue with PowerPath and Microsoft latest update for Windows 2012 servers.
This is due to the updates by Microsoft Windows 2012 R2 update KB3185279, KB3185331, KB3192404, KB3197875, KB3197874, KB3205401 recently.
https://support.microsoft.com/en-in/help/24717/windows-8-1-windows-server-2012-r2-update-history

Please find the KB article to fix the issue
490865 : Windows 2012 R2 server crash pointing to EMC PowerPath driver EMCPMPX.SYS https://support.emc.com/kb/490865

++++++++++++++++++++++++++++++++++++++++++++++++++++
OS Name             Microsoft Windows Server 2012 R2 Standard
Version 6.3.9600 Build 9600
Other OS Description      Not Available
OS Manufacturer             Microsoft Corporation
System Name    HostName
System Manufacturer    Cisco Systems Inc
System Model   UCSB-B200-M3
System Type      x64-based PC

  manfac: Cisco Systems, Inc.
   sernum: FCH1824J0E9
    model: Cisco VIC FCoE HBA
   descrp: Cisco VIC-FCoE Storport Miniport Driver
   symblc: Cisco VIC FCoE HBA FW:2.1(3d) DRV:2.3.0.20
 
 EMC powermt for PowerPath (c) Version 6.0 SP 2 (build 206)
             
*******************************************************************************
*                        Bugcheck Analysis                                    *
*******************************************************************************
Bugcheck code 000000D1
Arguments ffffe801`3242b000 00000000`00000002 00000000`00000000 fffff801`7ad60e81

RetAddr           : Args to Child                                                           : Call Site
fffff802`9d1e3ee9 : 00000000`0000000a ffffe801`3242b000 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffff802`9d1e273a : 00000000`00000000 ffffe001`a31b6010 00000000`00000000 fffff801`7ad6a00c : nt!setjmpex+0x37d9
fffff801`7ad60e81 : fffff801`7ad74dbf ffffe801`270008d0 00000000`00000000 7fffffff`ffffffff : nt!setjmpex+0x202a
fffff801`7ad74dbf : ffffe801`270008d0 00000000`00000000 7fffffff`ffffffff 00000000`00000000 : EmcpMpx!EmcpMpxLogPlatfEvent+0x3f99
fffff801`7ad75236 : fffff801`7ad74f50 ffffe001`a31b68b0 ffffe001`a5400480 00000000`00000000 : EmcpMpx!EmcpMpxLogPlatfEvent+0x17ed7
fffff801`7ad62958 : ffffe001`a1e0f010 ffffe001`a0be8010 ffffe001`a0bd3be0 ffffe801`270008d0 : EmcpMpx!EmcpMpxLogPlatfEvent+0x1834e
fffff801`7ad5c1c0 : ffffe001`a16c42a0 ffffe801`270007d0 ffffe801`270008d0 ffffe801`331ae810 : EmcpMpx!EmcpMpxLogPlatfEvent+0x5a70
fffff801`7ac21dcf : ffffe001`a15e2010 00000000`00000016 00000000`00000002 00000000`00000000 : EmcpMpx!PxDsmLamUnregister+0x1d14
fffff801`7ac082d0 : 00000000`00010000 00000000`00000004 ffffe001`a16c41b0 00000000`ffffffff : MPIO!DsmGetVersion+0xb2b
fffff801`7ac08b13 : 00000000`00000000 ffffe801`331ae810 00000000`00000007 ffffe801`269d8410 : MPIO+0x82d0
fffff802`9d13043e : ffffe001`a1c4d650 ffffe801`331ae810 ffffe801`27801a01 ffffe801`c00000c0 : MPIO+0x8b13
fffff801`7b28f5b3 : ffffe801`331ae810 fffff801`7b291a00 00000000`00010000 ffffe001`a16c41b0 : nt!IoCompleteRequest+0x2fa
fffff801`7b291574 : ffffe801`2713eba0 ffffe001`a16c41b0 ffffe801`331ae810 fffff801`7b28d60e : storport!StorPortNotification+0x2173
fffff801`7b28e360 : ffffe001`a1655010 ffffe001`40200382 00000000`00010000 ffffe001`a1739400 :

Other option is to upgrade PowerPath 6.0SP2 to PowerPath 6.3 which have all the fixes above.

InsightIQ : How to change the InsightIQ network Settings and host name

Purpose: Change the InsightIQ network Settings and host name

Steps:
1. Login to the InsightIQ Web Network Settings GUI with Port 5480
Link: https://<InsightIQ_HostName>:5480/#core.Login
Click on 'Network' tab > edit the settings > Click on 'Save Settings'














Isilon Script: iew the Status of the newly started SyncIQ Job with Amount of NETWORK BYTES transfered, Current THROUGHPUT, No of Workers Assigned, Current CPU UTILIZATION

Purpose: View the Status of the newly started SyncIQ Job with Amount of NETWORK BYTES transfered, Current THROUGHPUT, No of Workers Assigned, Current CPU UTILIZATION
Modify Attributes in Script:
1. <PolicyName>
2. <PolicyID>
3. <Replication Name>

# get the Policy ID for the Sync Policy Name
IsilonCluster1-2# isi sync policies view --policy=<PolicyName> | grep -i ID
                         ID: <bxxxxxxxxxxxxxxxxxxxxxxxxxxxa>

#get the Replication Name with the Sync Policy ID
Syntax: isi_repstate_mod -ll pol_id - (list reps in a directory)
IsilonCluster1-2# isi_repstate_mod -ll <bxxxxxxxxxxxxxxxxxxxxxxxxxxxa>
<bxxxxxxxxxxxxxxxxxxxxxxxxxxxa>_snap_rep_base - Replication Name
<bxxxxxxxxxxxxxxxxxxxxxxxxxxxa>_select_238749
Syntax (list all Worker Entries): isi_repstate_mod -wa pol_id rep_name - (print all work entries)


Script:
IsilonCluster1-2# while true;do print "*******START******";print "======================";date;print "======================";print "/-SYNC JOB VERBOSE VIEW";print "======================";isi_classic sync job rep -v <PolicyName>;echo " ";print "======================";print "/-NETWORK BYTES(look for a change every run)";print "======================";isi_classic sync job rep -v <PolicyName> | grep -A4 "Bytes:";echo " ";print "======================";print "/-SYNC JOB VIEW THROUGHPUT";print "======================";isi_classic sync job rep;echo " ";print "======================";print "/-MAX WORKERS";print "======================";isi_repstate_mod -wa <PolicyID> <Replication Name> | grep workitem | wc -l;echo " ";print "======================";print "/-CPU UTILIZATION";print "======================";isi statistics system list --nodes=all;print "======================";print "*******FINISH******";sleep 300;done

Output:
*******START******
======================
Thu Aug 30 08:30:58 PDT 2018
======================
/-SYNC JOB VERBOSE VIEW
======================
Policy name: PolicyName
    Action: sync
    Sync Type: initial
    Job ID: 1
    Started: Wed Aug 29 21:33:12 PDT 2018
    Run time: 10:57:46
    Status: Running
    Details:
        Directories:
            Visited on source: 117198
            Deleted on destination: 0
        Files:
            Total Files: 1708005
            New files: 1708005
            Updated files: 0
            Automatically retransmitted files: 0
            Deleted on destination: 0
            Skipped for some reason:
                Up-to-date (already replicated): 0
                Modified while being replicated: 0
                IO errors occurred: 0
                Network errors occurred: 0
                Integrity errors occurred: 0
        Bytes:
            Total Network Traffic: 8.6 TB (9482042338352 bytes)
            Total Data: 8.6 TB (9471597284558 bytes)
            File Data: 8.6 TB (9471597284558 bytes)
            Sparse Data: 0B
        Phases (1/3):
            Treewalk (STF_PHASE_TW)
                Start: Wed Aug 29 21:37:39 PDT 2018
                End: N/A
                Start: Wed Aug 29 21:33:24 PDT 2018
                End: Wed Aug 29 21:37:39 PDT 2018

======================
/-NETWORK BYTES(look for a change every run)
======================
        Bytes:
            Total Network Traffic: 8.6 TB (9483514244073 bytes)
            Total Data: 8.6 TB (9473067646801 bytes)
            File Data: 8.6 TB (9473067646801 bytes)
            Sparse Data: 0B

======================
/-SYNC JOB VIEW THROUGHPUT
======================
Name          | Act  | St      | Duration | Transfer | Throughput
--------------+------+---------+----------+----------+-----------
PolicyName | sync | Running | 10:57:53 |   8.6 TB |   1.8 Gb/s

======================
/-MAX WORKERS
======================
      36

======================
/-CPU UTILIZATION
======================
 Node   CPU    SMB  FTP  HTTP    NFS  HDFS  Total  NetIn  NetOut  DiskIn  DiskOut
---------------------------------------------------------------------------------
  All  6.4%   7.4M  0.0 457.1 395.1k   0.0   7.8M  73.6M  272.8M  328.4M   292.7M
    1  2.6%  67.8k  0.0   0.0   2.9k   0.0  70.7k   5.6M   19.5M  423.4k    52.4k
    2  3.2%  23.1k  0.0   0.0 391.2k   0.0 414.3k   6.3M   21.2M    5.3M   432.5k
    3  3.5%  45.6k  0.0 457.1   48.9   0.0  46.1k   5.7M   19.8M  379.2k   272.0k
...........................................
   24 14.8%    0.0  0.0   0.0    0.0   0.0    0.0   6.4M   23.3M   35.6M    11.8M
   25 15.1%   4.3M  0.0   0.0    0.0   0.0   4.3M   5.0M   31.3M   15.9M    12.1M
---------------------------------------------------------------------------------
Total: 26
======================
*******FINISH******

How to Collect Isilon Log on a Single node

Purpose:  If you want to collect the logs on single Isilon Node, Open the putty session and run the below command, and use WinSCP to download the log file from the Isilon Node.


ISILONCLUSTER1-10# isi_gather_info single node
Unlocking gather-status
Gather-status unlocked

This may take several minutes.  Please do not interrupt the script.

..............Information gathering completed..
..............creating compressed package...
Packaging complete...
Package: /ifs/data/Isilon_Support/pkg/IsilonLogs-<ISILONCLUSTER>-<YYYYMMDD>-<HHMMSS>.tgz
Uploading in progress.  If problems are encountered during the
upload process, the package will need to be sent manually.
Trying Passive FTP...
Uploaded Succeeded (FTP - Passive). File IsilonLogs-<ISILONCLUSTER>-<YYYYMMDD>-<HHMMSS>.tgz
Cleaning up temporary data... done.
Gather-status unlocked
ISILONCLUSTER1-10#

How to gather SPCollects for VNX1 or VNX2 Series array

There are a number of methods to gather SPCollects:
  • Start and retrieve SPCollects from each SP using Unisphere.
  • Launch Unisphere Service Manager either directly or from within Unisphere.  This approach has the advantage of automating the whole SPCollect gathering process and gathering diagnostic data too.
  • Start and retrieve SPCollects from each SP using Navisphere Secure CLI.

Unisphere


  1. Launch Unisphere and login.
  2. Select the VNX series array from either the dashboard or from the Systems drop-down menu.  Click System on the toolbar.
  3. On the right pane, under Diagnostic Files, select 'Generate Diagnostic Files - SPA'.  Confirm that it is OK to continue.  "Success" will be displayed when the SPCollect generation starts, but this only means the script has been started and will still take several minutes to complete. 
  4. Repeat step 3 for SP B immediately.
  5. It will take around 10-15 minutes to generate the complete SPCollect file.
  6. Still on the right pane, select 'Get Diagnostic Files - SP A'.  
  7. When the SPCollect file generation has completed, a file with the following name will be listed: <Array Serial Number>_SPA_<date/time (GMT)>_Code_data.zip 
  8. Sorting by descending order of date is a good way to find the latest SPCollect and the zip file will generally be over 10MB.  If the file has not appeared, press refresh every minute or so until the correct _data.zip file appears.
  9. On the right-hand side of the box, select the location on the local computer, where the SPCollects should be transferred to.
  10. On the left hand side of the box select the file to be transferred.  Note, if a file is listed that ends in runlog.txt, this indicates that the SPcollects are still running. Wait until the data.zip is created.  
  11. Repeat Steps 6-10 on SP B to retrieve its SPCollect file.

Unisphere Service Manager

  1. Log in to Unisphere client.
  2. Select the VNX, either from the dashboard or from the Systems drop-down.  Click System on the toolbar.
  3. On the right pane, under Service Tasks, select 'Capture Diagnostic Data'.  This will launch USM.  Alternatively USM can be launched directly from the Windows Start menu.
  4. Select the Diagnostics tab and select Capture Diagnostics Data.  This will launch the Diagnostic Data Capture Wizard.
  5. The Wizard will capture and retrieve SPCollect files from both SP and Support Materials from the File storage, which will then be combined into a single zip file.

Navisphere Secure CLI

Perform the following steps:
  1. Open a command prompt on the Management Station.
  2. Type cd "C:\Program Files\EMC\Navisphere CLI" - This is the default installation folder for Windows, but the path the file was installed to may have been overridden.  Other platforms, such as Linux, would have a different folder structure, but the commands are the same.  The CLI folder may already be in the path statement, in which case, the commands can be run from any directory.
  3. Type naviseccli -h <SP_A_IP_address> spcollect
  4. Type naviseccli -h <SP_B_IP_address> spcollect
  5. These commands start the SPCollect script on each SP.  Additional security information may also need to be specified, see KBA 483583, How to gather Service Data from a Dell-EMC Unity array.
  6. Wait a minimum of 10-15 minutes for the SPCollects to run, before attempting to retrieve them.
  7. Type naviseccli -h <SP_IP_address> managefiles -list                  
  8. This will list the files created by spcollect.  Check that a file with the current date and time in GMT has been created, ending with _data.zip.  If there is a file ending with .runlog instead, then the SPCollect is still running, so wait for a while longer before retrying this.
  9. Type naviseccli -h <SP_IP_address> managefiles -retrieve       
    This will display the files that can be moved from the SP to the Management Station.

    Example:
    Index Size in KB     Last Modified            Filename
    0     339       06/25/2013 00:45:42  admin_tlddump.txt
    ...
    10    24965     06/24/2013 23:39:53  APM0000000XXXX_SPA_2013-06-24_21-35-43_325146_data.zip
    11    41577     06/25/2013 00:17:17  APM0000000XXXX_SPB_2013-06-24_21-35-52_325147_data.zip
    ...
  10. Enter files to be retrieved with index separated by comma (1,2,3,4,5) OR by a range (1-3) OR enter 'all' to retrieve all file OR 'quit' to quit> 11

    This will pull the index number 11 (the most recent ~_data.zip file) from the corresponding SP and copy it to the c:\program files\emc\navisphere cli directory, with a filename of APM0000000XXXX_SPB_2013-06-24_21-35-52_325147_data.zip
  11. For information on how to get the SPCollect files to EMC Technical Support, see KBA 459010, Where do I upload Service Request related information for analysis by EMC Support.

EMC ViPR SRM 4.x - Isilon Collector Error: HttpRequestGroup::handleResponse(): Unable to connect to host (403) for request https://@{host}:8080/platform/1/zones in requets group Isilon-Zones.

Error:
WARNING  -- [2018-08-20 15:08:03 PDT] -- HttpRequestGroup::handleResponse(): Unable to connect to host <Host Name> (403) for request https://@{host}:8080/platform/1/zones in requets group Isilon-Zones. Server returned the following message: Forbidden
SEVERE   -- [2018-08-20 15:08:03 PDT] -- HttpRequestRetriever::execute(): Unable to retrieve stream on any configured request group!
SEVERE   -- [2018-08-20 15:08:03 PDT] -- AbstractJobExecutor::executeJobRunner(): Error while executing job ISILON2-CLUSTER-CAPACITY -> HttpRequestRetriever removing it from the queue
com.watch4net.apg.concurrent.JobExecutionException: Unexpected error when running step in job ISILON2-CLUSTER-CAPACITY -> HttpRequestRetriever
    at com.watch4net.apg.ubertext.parsing.concurrent.SimpleStreamHandlerJob.step(SimpleStreamHandlerJob.java:65)
    at com.watch4net.apg.concurrent.executor.AbstractJobExecutor$SequentialJob.step(AbstractJobExecutor.java:460)
    at com.watch4net.apg.concurrent.executor.AbstractJobExecutor.executeJobRunner(AbstractJobExecutor.java:130)
    at com.watch4net.apg.concurrent.executor.AbstractJobExecutor.access$500(AbstractJobExecutor.java:25)
    at com.watch4net.apg.concurrent.executor.AbstractJobExecutor$JobRunnerImpl.run(AbstractJobExecutor.java:287)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.watch4net.apg.concurrent.JobExecutionException: SimpleStreamHandlerJob produced a null StreamHandlerStep during execution!
    at com.watch4net.apg.ubertext.parsing.concurrent.SimpleStreamHandlerJob.step(SimpleStreamHandlerJob.java:51)
    ... 7 more

Cause: ViPR SRM Isilon data collection won't work when using non-admin credential with the AuditAdmin role. When you put this url https://Isilon_IP:8080/platform/1/zones in the browser and use the non-admin account.
You will get Authentication error as below.

{
"errors" :
[

{
"code" : "AEC_NOT_FOUND",
"message" : "Path not found: /1/zones%20in."
}
]
}

Fix: To fix this issue you have to add the Service account; which you are using to discover the Isilon in ViPR SRM to the "Security Admin Group" role in Isilon Cluster.
>> login into Console > Click on "Access" > Select "Membership & Roles" > Click on "Roles" > Select "SecurityAdmin" > click on "View/Edit" > Click on "Edit Role" > "+ Add members to this role" > Search for the account > Select the account > Click on "Save Changes"

Verify: https://<HostName>:8080/platform/1/zones
Output: if the output is like below, it means it is working

{
"zones" :
[

{
"all_auth_providers" : false,
"alternate_system_provider" : "lsa-file-provider:System",
"audit_failure" : [ "create", "delete", "rename", "set_security", "close" ],
"audit_success" : [ "create", "delete", "rename", "set_security", "close" ],
"auth_providers" :
[
"lsa-activedirectory-provider:<Domian>",
"lsa-local-provider:System",
"lsa-file-provider:System"
],
"default_block_size" : 27,
"default_checksum_type" : "none",
"hdfs_ambari_namenode" : "",
"hdfs_ambari_server" : "",
"hdfs_authentication" : "all",
"hdfs_enabled" : true,
"hdfs_keytab" : "/etc/hdfs.keytab",
"hdfs_root_directory" : "/ifs",
"home_directory_umask" : 63,
"id" : "System",
"ifs_restricted" : [],
"map_untrusted" : "",
"name" : "System",
"netbios_name" : "",
"odp_version" : "",
"path" : "/ifs",
"protocol_audit_enabled" : false,
"skeleton_directory" : "/usr/share/skel",
"syslog_audit_events" : [ "create", "delete", "rename", "set_security" ],
"syslog_forwarding_enabled" : false,
"system" : true,
"system_provider" : "lsa-file-provider:System",
"user_mapping_rules" : [],
"webhdfs_enabled" : true,
"zone_id" : 1
}
]
}

Reference: Dell EMC Knowledge Base Article: 000524394

Isilon Monitoring Client to Host Performance - Steps to capture the ISILON PCAPS for one client to analyze performance

####Monitoring Client to Host Performance######
Here are the steps to capture  the ISILON PCAPS for one client to analyze performance. Also please run the Wireshark on the host end too at the same time.

And copy isiperf_v3.sh script to the location in step 2.

1. Make the following directory:
# mkdir -p /ifs/data/Isilon_Support/$(date +%m%d%Y)

2. Open a second SSH session and start isiperf.sh (it will run for 10 minutes) in the second session, so it can run in the background while the additional below are performed.
# /bin/bash /ifs/data/Isilon_Support/isiperf_v3.sh -i 10 -e 5 -r 12 -d -g lwio,lsass,netlogon

check and ensure what node client is connected:
#  isi_for_array -s netstat -an | grep "10.252.194.19"

run capture command below on node client is connected to:

4. Start a packet trace with snaplength of 320 from the node.
# ifconfig | grep flags= | awk -F: '{print $1}' | egrep -v 'lo0|ib0|ib1' | while read ifcon; do tcpdump -s 320 -i "${ifcon}" -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/`hostname`.$(date +%m%d%Y_%H%M%S)."${ifcon}".pcap hsot 10.252.194.19 &; done

5. Start pcaket trace on client.

6. Reproduce the issue:

7. Stop packet trace and verify tcpdump has stoped :
#  isi_for_array "killall -2 tcpdump";sleep 2;  isi_for_array ps -auwx | grep tcpdump | grep -iv grep

9. Upload all the files to us:
# isi_gather_info --local-only --nologs -s "isi_hw_status -i" -f /ifs/data/Isilon_Support/$(date +%m%d%Y)

Script: isiperf_v3.sh
https://drive.google.com/open?id=1gC6ndNjdxW-gGAC0UKNztZqRYvb6PVXR



Active Directory (AD) - CMD to List the members in a AD Group

They are lot of commands to get to List the members in a AD Group. below step will work any workstation no matter what.

1. From Command Prompt
run the below cmd
>>Rundll32 dsquery.dll OpenQueryWindow
New window will open
enter the group name and click on 'find now'
below in the search result tab, you will see the Group Name, click on it.
new window will open with list of members :)


Other Commands:
>> net Group <AD-GroupName>
>> dsquery group -name <AD-GroupName> | dsget group -members -expand



Isilon Useful Commands


#####cmd to view the isilon Node hardware Serial Number
ISILON001-2#isi_for_array -s 'isi_hw_status | grep -i serno'
XXXXX001-1: SerNo: JXXXXX6120XXXX
XXXXX001-2: SerNo: JXXXXX6050XXXX

#####Check the time on isilon cluster
IsilonCluster1-2# date
Fri Aug 10 15:01:05 PDT 2018

#####Check the NTP Settings
IsilonCluster1-2# isi ntp servers list
Name          Key
------------------
10.xxx.xxx.xxx -
10.xxx.xxx.xxx -
10.xxx.xxx.xxx -
10.xxx.xxx.xxx -
------------------
Total: 4

#####cmd to view the isilon Node hardware details
isi_for_array -s 'isi_hw_status | grep -i product'
XXXXX001-1: Product: NL410-4U-Single-48GB-2x1GE-2x10GE SFP+-140TB-800GB SSD
XXXXX001-2: Product: NL410-4U-Single-48GB-2x1GE-2x10GE SFP+-140TB-800GB SSD

#####CLI command to show which nodes the worker/threads are using (Lnn=logical node number)
ISILON001-2# isi sync jobs reports view JobName_XX
Policy Name: 
JobName_XX
....
Worker ID: 0
Lnn: 1
....
Worker ID: 2
Lnn: 2


#####To view the boot drive wear life
IsilonCluster1-2#isi event events list | grep -i wear

#####View types of Jobs
IsilonCluster1-2# isi snapshot schedule list
ID   Name
------------------------------------
1    Tier-X
2    Tier-NL
3    OnBaseprdcopy_163099212
4    MHC_Infra_FileShare_Development
7    obdg$_Snapshot
8    obdg2$_Snapshot
9    obdg3$_Snapshot
10   obdg4$_Snapshot
11   obdg5$_Snapshot
12   obdg6$_Snapshot
13   obdg7$_Snapshot
14   obdg8$_Snapshot
15   obdg9$_Snapshot
16   obdiskgroups$_Snapshot
17   shc_Snapshot
18   uha-uswired_Snapshot
19   e-fax_Snapshot
20   Rad-Fax_Snapshot
21   RTA$_Snapshot
22   ProdClientFiles$_Snapshot
23   Processing_Snapshot
24   LabFaxes_Snapshot
25   ImageNet$_Snapshot
26   HimsRoiArchive_Snapshot
27   Billing-Healthlogic_Snapshot
28   EPIC_nonprod_Snapshot
29   EPIC_prod_Snapshot
30   msp_nfs_oracle
31   msp_ora_silver
32   longterm_cifs_msp_sql
33   longterm_cifs_stc_sql
34   MHC_Prod
------------------------------------

#####View Current Running Jobs
IsilonCluster1-1# isi job list
ID    Type           State   Impact  Pri  Phase  Running Time
--------------------------------------------------------------
16451 SnapshotDelete Running Medium  2    2/2    2m 46s
--------------------------------------------------------------
Total: 1

#####Delete a Snap
IsilonCluster1-1# isi snapshot snapshots delete --snapshot=<SnapShotName>
Are you sure? (yes/[no]): yes

#####Manually initiate the snapsnop clean up job
IsilonCluster1-1# isi job jobs start snapshotdelete
Started job [16451]

#####List all snapshots
IsilonCluster1-1# isi snapshot list

#####List all Shares
IsilonCluster1-1# isi smb shares list

#####Disable the FSA Snapshot Settings
IsilonCluster1-2# isi_gconfig -t job-config jobs.fsa.snap_based_mode
jobs.fsa.snap_based_mode (bool) = true

#####List the FSAnalyze Schedule
IsilonCluster1-2# isi_classic job schedule list --verbose | grep FSAnalyze
FSAnalyze       every 1 weeks on friday at 10:00 PM             07/27 22:00

#####Gather Logs
isi_gather_info
Package: /ifs/data/Isilon_Support/pkg/IsilonLogs-SHISOLPFCAP001-20180717-113405.tgz

#####CMD to Check Disk Drive Status
isi_for_array -sX isi devices list

#####Check where the client is connected
IsilonCluster1-2# isi_for_array -s netstat -an | grep "xxx.xxx.xxx.xxx"

#####Cmd to Check the ware life on Boot Disk
#isi_for_array -s isi_radish -a /dev/ad* | grep -E "Percent Life" | grep -v Used | awk '{print $1 $2 $3 $4 sprintf( "%d","0x"$9 )}'
IsilonCluster1-6:PercentLifeRemaining:99
IsilonCluster1-6:PercentLifeRemaining:92

#####Loop through the cluster nodes and grep for non-healthy drives using#####
isi_for_array "isi devices drive list| grep -iv healthy"

#####How to check the Data Target/Stored on which nodes for Isilon file / Share.
Way1:
ISILON-Cluster1-2# isi get -D /ifs/Dir1/Dir2/ | grep -i pools
*  Disk pools:         policy n410_140tb_800gb-ssd_48gb(2) -> data target n410_140tb_800gb-ssd_48gb:6(6), metadata target n410_140tb_800gb-ssd_48gb:6(6)
*  Disk pools:         policy 2
*  Disk pools:         policy n410_140tb_800gb-ssd_48gb(2) -> data target n410_140tb_800gb-ssd_48gb:7(7), metadata target n410_140tb_800gb-ssd_48gb:7(7)
*  Disk pools:         policy 2
*  Disk pools:         policy n410_140tb_800gb-ssd_48gb(2) -> data target n410_140tb_800gb-ssd_48gb:4(4), metadata target n410_140tb_800gb-ssd_48gb:4(4)
*  Disk pools:         policy 2

Way2:
#list polices
IsilonCluster1-2# isi filepool policies list
Name                 Description                                                 State
---------------------------------------------------------------------------------------
Policy1               Keeps Short Term Storage data in X nodes per GE requirement OK
Policy2               2-                                                           OK
Policy3               -                                                           OK
---------------------------------------------------------------------------------------
Total: 3
#View the polices Settings for 'Data Storage Target'
IsilonCluster1-2# isi filepool policies view --name=<Policy1>
                              Name: Policy1
                       Description: Keeps Short Term Storage data in X nodes per GE requirement
                             State: OK
                     State Details:
                       Apply Order: 1
             File Matching Pattern: Path == /Dir1/Dir2 (begins with)
          Set Requested Protection: -
               Data Access Pattern: -
                  Enable Coalescer: -
               Data Storage Target: x410_102tb_1.6tb-ssd_128gb
                 Data SSD Strategy: metadata
           Snapshot Storage Target: x410_102tb_1.6tb-ssd_128gb
             Snapshot SSD Strategy: metadata
                        Cloud Pool: -
         Cloud Compression Enabled: -
          Cloud Encryption Enabled: -
              Cloud Data Retention: -
Cloud Incremental Backup Retention: -
       Cloud Full Backup Retention: -
               Cloud Accessibility: -
                  Cloud Read Ahead: -
            Cloud Cache Expiration: -
         Cloud Writeback Frequency: -
      Cloud Archive Snapshot Files: -

Way3:
IsilonCluster1-2# isi filepool policies list --verbose --format=list
                              Name: Policy1
                       Description: Keeps Short Term Storage data in X nodes per Policy1 requirement
                             State: OK
                     State Details:
                       Apply Order: 1
             File Matching Pattern: Path == /Dir1/Dir2 (begins with)
          Set Requested Protection: -
               Data Access Pattern: -
                  Enable Coalescer: -
               Data Storage Target: x410_102tb_1.6tb-ssd_128gb
                 Data SSD Strategy: metadata
           Snapshot Storage Target: x410_102tb_1.6tb-ssd_128gb
             Snapshot SSD Strategy: metadata
                        Cloud Pool: -
         Cloud Compression Enabled: -
          Cloud Encryption Enabled: -
              Cloud Data Retention: -
Cloud Incremental Backup Retention: -
       Cloud Full Backup Retention: -
               Cloud Accessibility: -
                  Cloud Read Ahead: -
            Cloud Cache Expiration: -
         Cloud Writeback Frequency: -

      Cloud Archive Snapshot Files: -

#####To view the performance on node 10 for particular protocol
IsilonCluster1-2# isi_for_array -n10 'isi statistics client list --protocols=smb1'
IsilonCluster1-10:   Ops     In   Out  TimeAvg  Node  Proto          Class  UserName     LocalName                          RemoteName
IsilonCluster1-10: -------------------------------------------------------------------------------------------------------------------
IsilonCluster1-10: 230.2  14.5k 14.5k    216.4    10   smb1           read   UNKNOWN 10.xxx.xxx.xxx host1.domain.org
IsilonCluster1-10: 211.4   2.9M 10.8k    161.4    10   smb1          write   UNKNOWN 10.xxx.xxx.xxx                 host2.domain.org

#####How to check Net Bios is enabled on SMB 
IsilonCluster1-2# isi smb settings  global view
    Access Based Share Enum: No
  Dot Snap Accessible Child: Yes
   Dot Snap Accessible Root: Yes
     Dot Snap Visible Child: No
      Dot Snap Visible Root: Yes
 Enable Security Signatures: No
                 Guest User: nobody
                 Ignore Eas: No
       Onefs Cpu Multiplier: 4
          Onefs Num Workers: 0
Require Security Signatures: No
           Server Side Copy: Yes
              Server String: Isilon Server
       Support Multichannel: Yes
            Support NetBIOS: No
               Support Smb2: Yes


ViPR SRM 4.2 - SolutionPack for Oracle Database - Error: Cannot locate driver 'oracle.jdbc.driver.OracleDriver' !


Error:
Testing connectivity to Oracle instance and also checking the associated grants for the user(XXXXX:XXXXX)
Click to show/hide the full result
line 22: sql: command returned with a non-zero status (status: 1) !
Cannot locate driver 'oracle.jdbc.driver.OracleDriver' !

Resolution/fix:
> login to the SRM Collector via Putty, where 'Collector-Manager - oracle-database' is installed or where 'Data collection' & 'ASM Data collection' services are installed

> change directory to below path
SRMCollector~ # cd /opt/APG/Databases/JDBC-Drivers/Default/lib

>check weather jdbc drives or located or not
SRMCollector:/opt/APG/Databases/JDBC-Drivers/Default/lib # dir | grep -i ojdbc
-rw-r--r-- 1 root root 3692096 Apr 23 05:15 ojdbc6.jar
-rw-r--r-- 1 root root 3698857 Apr 23 05:15 ojdbc7.jar
-rw-r--r-- 1 root root 4036257 Jun  6 11:17 ojdbc8.jar

<If Not like above>

>Download JDBC Drivers from oracle Website
Link: http://www.oracle.com/technetwork/database/features/jdbc/jdbc-ucp-122-3110062.html
Note: you need to have a Oracle account to download and its free to register

>download below 3 files
ojdbc6.jar
ojdbc7.jar
ojdbc8.jar

>use WinSCP or any other tool to upload the files to the SRM Collector

>once uploaded and verify then provide the privileges
SRMCollector:/opt/APG/Databases/JDBC-Drivers/Default/lib # chmod 777 ojdbc6.jar
SRMCollector:/opt/APG/Databases/JDBC-Drivers/Default/lib # chmod 777 ojdbc7.jar
SRMCollector:/opt/APG/Databases/JDBC-Drivers/Default/lib # chmod 777 ojdbc8.jar
SRMCollector:/opt/APG/Databases/JDBC-Drivers/Default/lib # dir | grep -i ojdbc
-rwxrwxrwx 1 root root 3692096 Apr 23 05:15 ojdbc6.jar
-rwxrwxrwx 1 root root 3698857 Apr 23 05:15 ojdbc7.jar
-rwxrwxrwx 1 root root 4036257 Jun  6 11:17 ojdbc8.jar

> Now restart the services
SRMCollector:/opt/APG/Databases/JDBC-Drivers/Default/lib # manage-modules.sh service restart all

> now try to Discover the Oracle Instance
!! I Guess it should work !!

or
refer below EMC Link
https://www.emc.com/techpubs/vipr/solutionpack_for_oracle_database-2.htm


Isilon Error: isi_visudo does not edit the correct sudoers configuration file


Error: isi_visudo does not edit the correct sudoers configuration file.

Issue: When issuing the command isi_visudo to add sudo privileges to a user, the command opens and edits the wrong configuration file /usr/local/etc/sudoers instead of the correct one /etc/mcp/override/sudoers. This issue affects OneFS version 8.0.0.6+, 8.0.1.2+, and 8.1.0.1+

Resolution: As a workaround, issue the following command to force isi_visudo to edit the correct sudoers configuration file which is /etc/mcp/override/sudoers. The changes would propagate across all nodes in the cluster 
# isi_visudo -f /etc/mcp/override/sudoers

How to Collect API response generated on the Isilon Cluster

How to Collect API response generated on the Isilon Cluster

Process:
1. Use the below URL in the browser
2. Enter the Login Credentials
3. Prompt to Downloaded the File, ex: eventlists.json
4. Open the file in Notepad and view the generated last 1000 Events

https://<Isilon IP>:8080/platform/3/event/eventlists?limit=1000

ViPR SRM - License Upload - Error: An unexpected error occured: 'Unable to upload license. License is invalid or expires.'


Issue: Unable to upload the ViPR SRM License Ley from GUI or CLI
Error: An unexpected error occured: 'Unable to upload license. License is invalid or expires.'
Cause: This error is caused because the ELMS license SWID is referring to a product that is already using the other licenses.
Fix: Uninstall the conflicted ELMS license SWID

Steps to Fix Issue:
1. Upload the New License File (.lic) to the Frontend Server, maybe to tmp directory/folder; using tools like WinSCP.
Ex: file name: <SRMS_XXXXX_exp>.lic

2. Login to the Frontend Server with root Credentials
<FrontendServer>:~ # cd /tmp/
<FrontendServer>:/tmp # ls -l | grep -i SRMS
-rw-r--r-- 1 root    root       694 May 21 18:42 <SRMS_XXXXX_exp>.lic

3. Test the issue by trying to install the new License (Can be Skipped)
<FrontendServer>:/tmp # /opt/APG/bin/manage-licenses.sh install "<SRMS_XXXXX_exp>.lic"
Installing APG license key (SRMS_XXXXX_exp.lic)... SEVERE  -- [2018-05-22 11:47:40 EDT] -- BinaryDecoder$AbstractBagDecorator::getHashCodeBuilder(): ELMS license swid refers to a product that is already in use by other license(s).  If the current license is correct, please remove the installed license(s) that conflict,  otherwise contact SRM support about the current conflicting license(s). candidate swid: ELMSRM0XXXXXXX, candidate product: ViPR SRM, conflicting swid(s): [CB885L7VXXXXXX], installed license(s) swid-to-product-mapping: {CB885L7VXXXXXX=[ViPR SRM]}
 failed!
**** Unable to modify Java system preferences.
**** You may need to be root in order to perform this operation.

4. Uninstall the conflicted ELMS license SWID
<FrontendServer>:/tmp # /opt/APG/bin/manage-licenses.sh remove "ELMS Features: CB885L7VXXXXXX"
Removing license for ELMS Features: CB885L7VXXXXXX... done.
<FrontendServer>:/tmp # /opt/APG/bin/manage-licenses.sh remove "ELMS Features: ELMSRM0XXXXXXX"
Removing license for ELMS Features: ELMSRM0XXXXXXX... done.

5. After uninstalling all the license causing interruption for the fresh license we need to update the license-manager by using the following command:
<FrontendServer>:/tmp # /opt/APG/bin/manage-modules.sh update license-manager Default
Required dependencies, in processing order:
   [1]   java '8.0.151' v8.0.151
   [2] U license-manager 'Default' v5.7u2 => v5.7u2
> 1 not modified, 1 to update
> 2.2 MB space required / 346.9 GB available
 ? Enter the step to modify, 'yes' to accept them, or 'no' to cancel the operation [yes] > yes

Starting update of license-manager Default from v5.7u2 to v5.7u2...
 * Gathering information...
 * Module found in '/opt/APG/Tools/License-Manager/Default'.
 * It will now be updated using 'license-manager-5.7u2-linux-x64.pkg'.
 * Unpacking files...
 * Updating files... 100%
 * 19 files have been updated.
 * Finalizing update...
Update complete.

6. Now you can then install the fresh license using the following command
<FrontendServer>:/tmp # /opt/APG/bin/manage-licenses.sh install "SRMS_XXXXXXX_DD-MMM-YYYY_exp.lic"
Installing APG license key (SRMS_XXXXXXX_DD-MMM-YYYY_exp.lic)... done.

7. You can then verify the installed license by using the below command:
<FrontendServer>:/tmp # /opt/APG/bin/manage-licenses.sh check

Note: Moreover, if it’s a distributed environment you will need to go to the >> ViPR SRM centralized management > License Management > and then click on Synchronize so that the new license is copied on all the hosts.

Hope this would help :)


How to Generate Logs in ViPR SRM Using


Process Steps to follow:
  1. Download the Calypso zip file:
    1. https://support.emc.com/downloads/34247_ViPR-SRM              
  2. Place the Calypso zip file on the User interface host
  3. Extract the zip file to c:\calypso
  4. Right Click on the command line icon and select Run As Administrator
  5. Answer yes if prompted to continue – if asked
  6. Change to the c:\calypso directory
  7. Run the following batch file based on the type of installation of ViPR SRM
    1. For a Linux Front End
      1. Calypso.bat
    2. For a Windows Front End (Binary)
      1. Calypso_win.bat
  8. Enter the Frontend Hostname or IP
  9. Accept all possible defaults - this is preferable by support unless otherwise directed
  10. FE username – enter for root  Note: this is the username to log into the host itself
  11. FE password – Default is Changeme1!   Note: the 'C' in Changeme1! is a capital letter
  12. Enter the installation directory – default hit enter
  13. Press Enter to collect diagnostics for all hosts – Default hit enter
  14. Press enter to collect logs for the last 3 days – Default hit enter
  15. It may take 20-30 minutes for the information to be collected
  16. For Health reports hit enter to accept defaults
  17. For port 58080 – Default hit enter  -  if a secure port is used enter that 
  18. For HTTP – Default hit enter  -  if you are using a secure connection enter HTTPS 
  19. For Frontend UI portal username – if admin is used – Default hit enter
  20. For Frontend UI portal password – if changeme – Default hit enter
  21. Zip up the entire \out\ directory and upload to the case or location designated by the Support Engineer
  22. Notify the support engineer that the file is uploaded

To view a Video on how to install and run the Calypso Tool please click the following link:
https://community.emc.com/videos/174428

How to generate Diagnostic file from ViPR SRM GUI & CLI


Process for GUI:
1. To generate the Diagnostic file for the entire SRM server set, navigate to the centralized management page
In the left pane expand the tree and navigate to the 'Logs and Diagnostics' section > Select 'Diagnostic Files' > Click on "Generate Diagnostic Files" button (which will collect and compress all logs and configuration files for each server in the installation).
Once that function completes then click the "Download" button which will move the files to a location of your choice (default is Downloads directory).

Link to diagnostics window
http://<srm front end server>:58080/centralized-management/#/audit/diagnostics

Process for CLI:

Windows
From command prompt cd to:
/APG/bin/
Run: diagnostic.cmd
This will create DiagnosticFiles.zip

Linux
Run the following command:
diagnostic.sh
This will create the file DiagnosticFiles.tar.gz within the /tmp directory

If unable to save the diagnostics file to the tmp directory due to lack of disk space, you can change the destination by doing the following:
- Log into the host in question
- Edit /APG/Tools/APG-Diagnostic-Tools/Default/bin/diagnostics.sh
- Look for the entry in the file TARFILE="/tmp/DiagnosticFiles.tar"
- Replace the tmp directory with the directory location you wish to save the DiagnosticFiles.tar
- Rerun the script /APG/bin/./diagnostic.sh and the DiagnosticFiles.tar will be placed in the new location when it completes


VNX Best Practices

1. Storage Processor (SP) Cache Size:
Read Cache - 10% of the available Cache. (Recommended Min 200MB & Max 1024 MB).
   Note: read cache facilitates pre-fetching, so it doesn't need to be large. Increases read cache above recommended values only if if you know that you have multiple application with sequential read-intensive.
Write Cache - set remaining memory to write cache.

2. Cache Page Size: min amount of SP memory used to serve a single I/O request.
Default: 8 KB provide a good balance for both Block ad File Storage.
Increase Max 16 KB; in case your environment has large-lock I/O size.
Other sizes 2 KB and 4 KB is good for like database environments.

3. Cache watermark: controls the flushing behavior of write cache.
Recommended - Low 60% High 80% (maintain diff b/w low ad high of about 20%)

4. Physical drive placement: place highest performing drives in lowest numbered enclosures on each bus.

5.Hot Spares: allocate 1 hot spare or every 30 drives.

6. Drive Type:
FLASH: extreme performance / transactional random workloads.
SAS: General Permormance
NL-SAS: Archive Purpose / aging data.

7. IOPS: Calculated based on drive types


8. RAID Level:
RAID 1/0 - Heavy Transnational with >25% random writes
RAID 5 - Medium-High performance, sequential
RAID 6 - archiving, read-biased workloads

9. Maximum Drives in a Pool:


10. FAST Cache: Recommended to have size of Active Data set.
EMC has tools to determine the active data set, or else in general 5% of the capacity would be good