Testing Performance in Commvault
In a backup environment there can be many places where performance issues can arise. It touches a lot if not all of the servers in the environment, not to mention the impact to the network and storage. All of this data needs to be moved to and from point A to point B, C, etc…and this process can slow down anywhere along the way. The potential cause of backup performance degradation includes disk read, network, deduplication (if applicable), and disk library write throughput. This can be reversed in the case of a restore with deduplication removed from the equation.
However, when performance problems arise it is important to understand specifically where the source(s) of the problem arises. There are tools provided by Commvault that can help in this process. Even better, there are log files dedicated to performance analysis that are provided by Commvault. They can be analyzed to determine the source of the problem.
There are a couple of tools that Commvault provides that can help in determining disk and network performance. Often times, the Disk Library disk storage is the source of the problem, whether it be write and/or read. Disk Library throughput can be analyzed via the CvDiskPerf tool. Run this at multiple different times including during downtime, peak backup traffic, and auxiliary copy window. The results can be compared.
To test the read/write performance to a disk library run the following command on each MediaAgent for all or a subset of the mount paths on the disk library with the goal of accurately gauging performance. Using the default parameters ensure that the average read throughput of the disk is approximately 600 GB per hour and the average write throughput of the disk is approximately 700 GB per hour.
Example for a UNC mount path:
CVDiskPerf -PATH \\networkpath\Disk01 -username MyUser –password MyUserPassword
Example for a local NTFS mount path:
CVDiskPerf -PATH R:\CommvaultData
To test the network performance the process is even easier! This is because Commvault provides a workflow that does almost everything for us. The Network Performance Test Tool can test performance between MediaAgents for example to determine auxiliary copy performance as well as between client and MediaAgent for backups. It is a predefined workflow but if for some reason it is not deployed you can download it via the Commvault Store or Commvault Download Center.
Running the tool is basic and self-explanatory One item of note, is to make sure when logged into the CommCell you are logged in with a user that has an email address associated that you can access, as this is the email the workflow sends the results to. The results are a basic small table that is easy to understand. If using an Active Directory account this will be the email associated with your logged in user. If it is a Commvault native user account the email address can be changed or viewed via the user properties.
Now both of the above tools are great and provide more data points which is often necessary when fighting a performance battle. There is a better way that is easier, however, as Commvault provides a log which captures all of the performance metrics for us. It is called the CVPerfMgr.log. Navigate to the installation directory of the destination MediaAgent and collect the CVPerfMgr.log(s) located in the [Log Folder] directory. This assumes deduplication is enabled as well as Simple Data Transport (SDT). SDT is enabled by default
Let’s breakdown an example of a backup job for a VMWare virtual machine called MY-VM5. This example is from a VSA VMware backup. At the top of the log entry the Head Duration module will provide information about the job start time and the completion time in hours and seconds. The total time to complete the job which was 7758 seconds or a little over 2 hours. The is indicated by the Head and Tail duration.
The primary components of the overall 2 hours are the time to read, calculate deduplication, send over the network, and finally write the data and each has a corresponding section in the log. When looking at the components it should be done sequentially but some components work in tandem and there will be overlap. The first component value of interest is the Read value which is denoted by a – in the Time column which indicates it took less than a second. This indicates the client disk read performance is good. This is line 15 located right under the line that contains |_Config Files[VM File]
Virtual Server Agent
|*2370591*|*Perf*|318936| =======================================================================================
|*2370591*|*Perf*|318936| Job-ID: 318936 [Pipe-ID: 2370591] [App-Type: 106] [Data-Type: 1]
|*2370591*|*Perf*|318936| Stream Source: MediaAgentVSAProxyA
|*2370591*|*Perf*|318936| Simpana Network medium: SDT
|*2370591*|*Perf*|318936| Head duration (Local): [07,May,19 21:40:47 ~ 07,May,19 23:50:05] 02:09:18 (7758)
|*2370591*|*Perf*|318936| Tail duration (Local): [07,May,19 21:40:47 ~ 07,May,19 23:50:05] 02:09:18 (7758)
|*2370591*|*Perf*|318936| ----------------------------------------------------------------------------------------------------------------------------------------
|*2370591*|*Perf*|318936| Perf-Counter Time(seconds) Size
|*2370591*|*Perf*|318936| ----------------------------------------------------------------------------------------------------------------------------------------
|*2370591*|*Perf*|318936|
|*2370591*|*Perf*|318936| Virtual Server Agent
|*2370591*|*Perf*|318936| |_VM[MY-VM5].................................................................. -
|*2370591*|*Perf*|318936| |_Config Files[VM Files]........................................................ -
|*2370591*|*Perf*|318936| |_Read........................................................................ - 17566822 [16.75 MB]
|*2370591*|*Perf*|318936| |_Open File................................................................... - [Samples - 13] [Avg - 0.000000]
|*2370591*|*Perf*|318936| |_Buffer allocation........................................................... 1 [Samples - 325] [Avg - 0.003077]
|*2370591*|*Perf*|318936| |_Pipeline write.............................................................. - 17586937 [16.77 MB]
|*2370591*|*Perf*|318936| |_Disk[MY-VM5.vmdk]......................................................... -
|*2370591*|*Perf*|318936| |_Disk Read................................................................... 663 12913213767 [12.03 GB] [65.30 GBPH]
|*2370591*|*Perf*|318936| |_Buffer allocation........................................................... 720 [Samples - 126329] [Avg - 0.005699]
|*2370591*|*Perf*|318936| |_Pipeline write.............................................................. 6653 13030135763 [12.14 GB] [6.57 GBPH]
|*2370591*|*Perf*|318936|
We have found Waldo, um, I mean our performance problem. The next value of interest is Src-side Dedup which is located on line 8. Out of the total time of 7758 seconds, 7388 seconds was spent calculating deduplication which is the majority of the time. This means that deduplication is the source of the problem and may be too slow which could be caused due to insufficient hardware and/or deduplication size capacity threshold being reached. See BOL for sizing guidelines.
|*2370591*|*Perf*|318936| Reader Pipeline Modules
|*2370591*|*Perf*|318936| |_CVA Wait to received data from reader........................................... 948
|*2370591*|*Perf*|318936| |_CVA Buffer allocation........................................................... 6642
|*2370591*|*Perf*|318936| |_SDT: Receive Data............................................................... 7 13006897903 [12.11 GB] [Samples - 410031] [Avg - 0.000017] [6229.86 GBPH]
|*2370591*|*Perf*|318936| |_SDT-Head: Compression........................................................... 44 13006840367 [12.11 GB] [Samples - 410030] [Avg - 0.000107] [991.11 GBPH]
|*2370591*|*Perf*|318936| |_SDT-Head: Signature module...................................................... 7426 8523310008 [7.94 GB] [Samples - 410030] [Avg - 0.018111] [3.85 GBPH]
|*2370591*|*Perf*|318936| |_SDT-Head: Signature Compute................................................... 36 8444734791 [7.86 GB] [786.48 GBPH]
|*2370591*|*Perf*|318936| |_Src-side Dedup................................................................ 7388
|*2370591*|*Perf*|318936| |_Buffer allocation........................................................... 20
|*2370591*|*Perf*|318936| |_Passing to next module...................................................... 3
|*2370591*|*Perf*|318936| |_Sig-lookup.................................................................. 7357
|*2370591*|*Perf*|318936| |_SIDB-Lookup............................................................... 7357 [Samples - 47677] [Avg - 0.154309]
|*2370591*|*Perf*|318936| |_SIDB:CL-QueryInsert[MediaAgentVSAProxyA]..................................... 5940
|*2370591*|*Perf*|318936| |_SIDB:CL-QueryInsert[MediaAgentVSAProxyB]..................................... 1415
|*2370591*|*Perf*|318936| |_Source Side Dedupe stats.................................................. -
The subsequent sections denoting network transfer and disk library write speeds are good or at least are not the source of the transfer lag. The time spent during network traffic can be derived from the value SDT-Head: Network transfer which below has a value of 31 which a small portion of the transfer time. The Disk Library write time is indicated by the Writer: DM: Physical Write value which below has a value of 126 which again is a small portion of the overall transfer time.
|*2370591*|*Perf*|318936| |_SDT-Head: CRC32 update.......................................................... 1 7113341016 [6.62 GB] [Samples - 279863] [Avg - 0.000000]
|*2370591*|*Perf*|318936| |_SDT-Head: Network transfer...................................................... 31 7113341016 [6.62 GB] [Samples - 279863] [Avg - 0.000111] [769.33 GBPH]
|*2370591*|*Perf*|318936|
|*2370591*|*Perf*|318936| Writer Pipeline Modules
|*2370591*|*Perf*|318936| |_SDT-Tail: Wait to receive data from source...................................... 7730 7113398552 [6.62 GB] [Samples - 279864] [Avg - 0.027621] [3.09 GBPH]
|*2370591*|*Perf*|318936| |_SDT-Tail: Write Process......................................................... 202 7113341016 [6.62 GB] [Samples - 279863] [Avg - 0.000722] [118.07 GBPH]
|*2370591*|*Perf*|318936| |_DSBackup: Update Restart Info................................................... -
|*2370591*|*Perf*|318936| |_DSBackup: Update Index.......................................................... -
|*2370591*|*Perf*|318936| |_DSBackup: Update Restart Info Index............................................. -
|*2370591*|*Perf*|318936| |_DSBackup: Media Write........................................................... 177 7100556487 [6.61 GB] [134.50 GBPH]
|*2370591*|*Perf*|318936| |_SIDB:CommitAndUpdateRecs[MediaAgentVSAProxyA]...................................... 52
|*2370591*|*Perf*|318936| |_Writer: DM: Physical Write.................................................... 126 7079572021 [6.59 GB] [188.38 GBPH]
|*2370591*|*Perf*|318936|
|*2370591*|*Perf*|318936| ----------------------------------------------------------------------------------------------------
1600 4ec8
So by reviewing the CVPerfMgr.log we can see clearly that of the total transfer time the vast majority was spent on deduplication and thus is the source of the problem. The deduplication engine indicates high Q&I threshold times over 2000 (2 Microseconds) via the entry in the log, SIDB:CL-QueryInsert. This can also be reviewed easily in the CommCell Console. See the following KB regarding DDB thresholds.
One additional source of information are the Resource Monitor log files that are located in a sub directory of the Commvault log files directory on both the client and the MediaAgent. These can be opened, formatted, and analyzed in your favorite spreadsheet software.
Well, there you have it! Happy performance analyzing or something like that.