Health Checks - VCS/Expressway
đ This Is How We Do It đ
And here it is, my list of commands and checks that I run against Cisco VCS-C/E & Expressway-C/E servers for most changes that take place. Itâs useful output to collect prior to changing configurations, Upgrades, Restarts â anything that makes a change system-wide or otherwise has the opportunity to âbreakâ the system.
I pull this data to refer back to in the event that the change has disrupted endpoint registration status, inter-cluster communication, intra-cluster communication, MRA, zone connectivity, anything. Consider this output a âCYAâ for later. Something to refer back to and confirm all is as it was prior to the change, with applicable exceptions/changes seen in the post-change output.
Without further ado, letâs get to it.
CLI Health Check Commands [VCS/Expressway]
xstatus SystemUnit
Returns data such as the Serial Number, Product Name, Build info, some configuration options such as Encryption, Dual NIC, Expressway feature set, Registration count, Traversal Call count, device name, Release Key, Software Version, and TimeZone/DateTime configuration. It will also include the system uptime in âepochâ time (e.g. 12023815).
xstatus Hardware
Returns data such as the CPU speed, Core Affinity (if applicable), Memory Total, CPU Model Details, Network Adapters, Non-Traversal Call Limit, Logical Core Count, Physical CPU count, Physical to Logical core mappings, Registration, Traversal Call and Turn Relay limits.
xstatus
Returns an assortment of output also viewable via the GUI including but not limited to Intrusion Protection, Links, Zones, Media Statistics, Microsoft IMP, configured Options, Policies, Port usage, Resource usage, SIP Configuration and connectivity status, SIP service domains, SIP service zones, Zones, System metrics, TURN server status, Warnings, XMPP connection status and more.
xconfiguration
Returns configuration data for various elements including Zones, Transforms, Timezone, SNMP, SIP, Policies, NTP, Option Keys, Login configs and more.
configlog|eventlog|networklog 100
With this we will collect the last 100 entries in the Config, Event and Network logs. This can be expanded by increasing the â100â number in the command.
GUI Health Check Snapshots [VCS/Expressway]
It is worth noting that the screenshots that I collect (described below) are primarily redundant due to the output already being present in the âxstatusâ command run from the CLI. In that case, these would be optional. To ensure thoroughness and to ensure I have snapshots to refer back to of specific configuration and status pages, I collect these anyway. Some pages, such as the Status > Overview and Status > Unified Communications pages do not present output in the âxstatusâ command.
Status > Overview
Provides an overview of the VCS/Expressway you are logged in to.
Status > Alarms
This is similar to the âxstatus alarmâ output and contains data on the alarms that are, or have been raised on the system and not yet acknowledged and cleared.
Status > System > Information
This is similar to the âxstatus SystemUnitâ output. See the CLI command description for more detail.
Status > System > ResourceUsage
This is similar to the output from âxstatus Resourceusageâ and contains data on resource utilization on the Expressway server.
Status > Registrations > By Device
This is applicable to Cisco VCS-C / VCS-E only. Data is reported on the current registrations against VCS. Alternatively the CLI command âxstatus applications presenceâ could be used, although this is included in the base âxstatusâ command output.
Status > Calls > Calls
This is similar to the output from xstatus Calls which displays active calls on the system, and details about them.
Status > LocalZone
Here we collect information on connected/disconnected Local Zones. Alternatively the command âxstatus Zones LocalZone SbuZoneâ can be used, although this is included in the base âxstatusâ command output.
Status > Zone
In addition to taking a screenshot of the configured Zones, it is advised to also open the configuration page for the Zone with Type âTraversal Clientâ. Verify the status and take a screenshot. Be sure to do the same with the âUnified Communications Traversalâ zone.
Status > Unified Communications
It is always good to verify the status of the Unified Communications server connections, and refresh them after a change. If applicable, take a screenshot of the Advanced Status > View Federated Connections page. Primarily, the SSH tunnel status.
Status > Hardware
This is similar to the âxstatus Hardwareâ output. See the CLI command description for more detail.
System > Time
The data here is also available in the âxstatusâ base command, and is visible under a scoped âxstatus Timeâ command. We use this to verify system time and NTP configuration.
System > Clustering
This data is also gatherable by using the âxstatus clusterâ command, and is included in the base âxstatusâ command output. Here we see the status of the VCS/Expressway cluster.
System > External Manager
This data is also gatherable by using the âxstatus ExternalManagerâ command, and is included in the base âxstatusâ command output. This shows the status of configured External Manager. If unconfigured, status should show âInactiveâ.
Taking a Backup
Maintenance > Backup and Restore
To take a backup, navigate to the Maintenance > Backup and Restore page. Under Backup, regarding the Encryption Password, you are not required to enter an encryption password. If you do, document it well and do not lose it as restoration will require the use of the encryption password.
To proceed with a backup, click on âCreate System Backup Fileâ and, when prompted, save the file to an appropriate location and ensure it is stored according to your internal backup storage policies.
Verification Steps
Prior to moving into the full post-change data collection (re-collecting the same data as the pre-checks for comparison purposes) it is important to perform some quick verification steps. This will aid in resolving some odd behaviors or catching issues with day to day function quickly. Think of it as a spot check. If the spot check is good, weâll start full health checks.
Rediscover Unified Communications Servers
Any time work is performed on VCS/Expressway OR associated Cisco Unified Communications Server, Unity Connection and IM & Presence servers it is important to perform manual discovery of the associated UC servers and services. This can be performed via the Expressway-C/VCS-C webpage.
Configuration > Unified Communications > Unified CM Servers
On this page weâll want to click on the Unified CM Servers under Publisher Address and click the âRefresh Serversâ button. After this completes, review the TCP status and ensure it shows âTCP: Activeâ with no errors. The same steps should be completed for associated IM & Presence and Unity Connection servers under their applicable Configuration > Unified Communications > * page.
MRA Test, if applicable
Using a test Jabber account, or remote MRA enabled Cisco phone (or other applicable device), verify that MRA login is possible (from off-net, which means turning off your VPN before launching and connecting using Jabber). Once logged in via MRA, perform basic chat, presence status, and contact lookup tests. Test persistent and non-persistent chat rooms as applicable, as well as WebEx creation and calendar functions, and voicemail access. If your business uses Cisco Jabber to control Cisco Phones via CTI, ensure this is working as well. Additionally, test internal and external dialing. For Telepresence users, test sending and receiving Video calls. Include tests for any features not mentioned that are critical or relied on by your business relating to the UC devices connecting to, through, or utilizing the VCS/Expressways.
Finishing Up
At this point your pre-change health checks should have been completed and documented (saved to an applicable internal location, attached to a ticket, etc.), your change completed, and verification steps completed. If somethingâs wrong, youâll likely see it during the verification steps and testing steps. If you donât verify and test, itâs possible your users will catch the issue before you. We want to minimize those instances as much as possible with thorough testing.
If the aforementioned is true, it should be safe to proceed to post-change output collection that includes rerunning all previous health check commands run prior to the change, including gathering a second set of appropriately labeled screenshots. These help CYA to prove, to the best of your ability, the status of various configurations and connections. Something may go awry later on, youâll want to prove it was good and working after the change. Not to say the change (upgrade, restart, etc.) wonât introduce an issue that crops up later, but at least you can prove there wasnât a drop or miss on your part.
And thatâs it! Hopefully this is helpful, this process has been followed for x8.x through x12.x software versions. If there are any commands or GUI pages that you feel were missed, or that you include in your health check process feel free to pitch in by joining the NOC Thoughts Discord and posting in the #improvements-requests channel.