What’s New in Release 4.5.1
NEW RELEASE 4.5 UPGRADE REQUIREMENT
Starting 4.5, Supervisor requires 24GB RAM. The increase from 16GB RAM in prior releases is needed for the data collection robustness and visibility feature. Supervisor node is now caching device monitoring status for faster performance by avoiding database I/O. Without the additional RAM, Supervisor node will not operate properly.
This release adds features and functionality in several areas.
Platform Features
Data collection robustness and visibility
Export events to other Big Data systems via Kafka
CMDB Outbound Integration for ConnectWise Dashboard slideshow
Performance and Availability Monitoring
Maintenance calendar for Synthetic Transaction Monitor jobs
Real time performance probing
SLA calculation for SNMP and WMI Ping
Trace route monitoring
Log Management and Security Monitoring
Multi-tenant reporting device handling
Windows Agent Enhancements
Device Support
New Support
Enhanced Support
Significant Enhancements
DataManager and ReportWorker module robustness
Additional metrics on trend charts
Simplify Cloud and Collector health GUI
Ability to manually add hosts to Application Groups
Set important process and critical interface definitions directly from CMDB
Dashboard charting enhancements
Accounting for internal and performance monitoring events
Ability to change event database purge/archive thresholds
Ability to set remote directory renaming action during archive Registration APIs
Bug Fixes / Enhancements
Current Open Bugs/Enhancements
Platform Features
Data collection robustness and visibility
This release enhances the reliability and visibility of AccelOps data collection in the following ways.
Detailed visibility on when data was last collected: (a) data from performance monitoring jobs on a per device, per job basis and (b) data pushed from external devices on a per device per protocol basis. Last collection times are visible by simply visiting CMDB > Device > Monitor tab. The times are updated frequently (every 2 minutes).
A versioning scheme is introduced to make sure that the Application Server and the data collection agents (Java agents and Performance Monitor modules in Collectors, Workers) are always in sync. This ensures that when user changes (either manual or from discovery) are always reflected in data collection. If there is a version discrepancy, means that data collection agents are not working on the most up to date version, an alert is created based on a system rule.
System rules are provided for the following error scenarios: User can decide to restart a module or the entire application via a notification policy/remediation scripts.
- all jobs on a data collection agent are delayed
- a particular job on a data collection agent are delayed
- a version discrepancy is detected – a data collection agent (Collector, Worker) has not picked up the correct monitoring version within a certain amount of time
Details on how data collection times and status is reported in CMDB are here.
Export events to other Big Data systems via Kafka
AccelOps collects a wide variety of logs and performance metrics and uses the data for its own analysis. This release enables users to export the logs in a parsed format to any external system via Kafka, a highly scalable distributed message bus (see Apache Kafka). AccelOps has developed a connector that publishes to the Kafka message bus. This feature can be used to populate a Big Data system with rich AccelOps data.
Details on configuring AccelOps for Kafka export is discussed here
CMDB Outbound Integration for ConnectWise
ConnectWise is an important help desk / ticketing system specially for service providers. AccelOps already has two-way integration with
ConnectWise ticketing – a ticket can be created in ConnectWise and state updates in ConnectWise is reflected in AccelOps. This release extends the integration to cover CMDB. When AccelOps discovers a device, ConnectWise CMDB can be populated, either automatically or on demand. When AccelOps discovers changes, the change can be synced to ConnectWise. A framework is provided to convert device attributes like Organizations, host names, device types to ConnectWise specific fields and fields.
Details on configuring AccelOps for ConnectWise outbound CMDB integration is discussed here. AccelOps provides a special content mapping feature where any AccelOps CMDB attribute and values can be converted into a corresponding ConnectWise CMDB attribute and values (see Step 11).
Dashboard slideshow
Users are now able to select a set of dashboards and display them in a slideshow mode on big monitors to cover the entire display. This is useful for Network and Security Operation Centers.
Details on creating dashboard slideshow is discussed here.
Performance and Availability Monitoring
Maintenance calendar for Synthetic Transaction Monitor jobs
This release allows the ability to add Synthetic Transaction Monitor (STM) jobs to a maintenance calendar. While a STM job is under maintenance, the job is not executed and system rule does not trigger if the job fails.
Details on how to create maintenance calendars for STM jobs is detailed here.
Real time performance probing
Often for checking the health of a device or an application, it is necessary to probe the device and check its current performance metrics. Until now, the option in AccelOps would be to query the system for performance monitoring events – this does not quite serve the purpose since the polling intervals are too large (3 minutes of so for most jobs) – so you would not get results for next 3 minutes. This release allows users to probe the device at a much faster pace (e.g. few seconds apart) and see the metrics in a real time scrolling fashion on the GUI. These metrics are polled in addition to the regular scheduled performance polls – they are neither stored nor do they trigger any rules or are part of any report. Currently, only a subset of important system performance metrics are supported for real time performance probes, e.g. system CPU, memory, disk, interface and process utilization.
Details on how to probe devices for real time performance metrics is discussed here.
SLA calculation for SNMP and WMI Ping
Until now, we calculated Min/Max/Average Round Trip Time, downtime and SLA for ICMP Ping only. This notion is extended for two other critical performance monitoring protocols – SNMP and WMI.The events PH_DEV_MON_SNMP_PING_STAT and PH_DEV_MON_WMI_PING_STAT now contain the following additional attributes
Average Round Trip Time (RTT)
Max Round Trip Time
Min Round Trip Time
Pct Packet Loss
System Down time
System Degraded Time
SNMP Ping is calculated by issuing a very basic SNMP OID (1.3.6.1.2.1.1.1 – sysDescr in MIB-2) that is present in all SNMP implementations. WMI Ping is calculated by fetching a basic WMI Class (Win32_OperatingSystem) that is present in all WMI implementations.
Statistical computations (e.g. max, min, average) are done by sending 5 requests for the same object a few seconds apart. System is considered down for the polling interval if packet loss is 100%. System is considered degraded for the polling interval if packet loss is less 100% but greater than 50%.
Two reports are provided
Top Devices by SNMP RTT
Top Devices by WMI RTT
Trace route monitoring
Trace route is important for monitoring hop by hop latency between two wide area end points. It is important to know when latency for a particular hop increases significantly – this is often a precursor for internet outage. This release allows users to run trace route from any AccelOps node to any destination using the Synthetic Transaction Monitoring (STM) framework.
Details on how to set up trace route monitoring is described here. One report is provided: Top Trace Route Hops by RTT.
Log Management and Security Monitoring
Multi-tenant reporting device handling
This release allows AccelOps to handle reporting devices that are themselves multi-tenant. As an example, a Fortinet firewall can report logs for multiple organizations from the same source IP – the organizations is reported via the Virtual Domain variable. As another example, Qualys Vulnerability Scanner can report vulnerabilities for the devices belonging to multiple organizations in the same report via the qualysAssetGroup attribute.
A framework is provided to handle multi-tenant reporting devices. User can set up mapping rules specifying
attribute that specifies the external organization in the log. mapping between external organization to AccelOps organization.
Using these definitions, reporting devices are created and logs are mapped to the respective organizations. Subsequently, rules also trigger in the respective organizations. Details are in Event Organization Mapping.
Windows Agent Enhancements
This release provides several enhancements
- AccelOps Windows Agent and Agent Manager now communicate over HTTP(S) instead of HTTP
- File integrity monitoring events will now contain users that made file changes
- Ability to export and import license and monitoring template assignments
- Support for non-English locale for Windows Servers
- Differentiate between files and directories in AccelOps-WUA-FileMon events by using the osObjType attribute. This information is provided for the following cases: (a) create, (b) change, (c) rename but only for the new name. This information can not be provided for the following cases: (a) delete, (b) rename – for the old name.
Windows agent upgrade and configuration is covered here.
Device Support
New Support
- Nutanix – discovery and performance monitoring via SNMP – see here
- Cisco FireSIGHT integration via eStreamer API – log monitoring – see here
- AWS RDS and EBS – performance monitoring – see here
- Airlines in-flight entertainment systems monitoring
- Qualys Web Application Firewall log monitoring – see here
- CiscoWorks Network Control Manager (NCM) – log monitoring – see here
- Lantronix SLC Console Manager log monitoring – log monitoring – see here
- Vasco DigiPass – log monitoring – see here
- Juniper DDoS Secure – log monitoring – see here
- Cisco Wide Area Application Services (WAAS) – performance monitoring – see here
- Motorola AirDefense Wireless IDS – log monitoring – see here
- Motorola WiNG WLAN Access Point – log monitoring – see here
- Cisco Telepresence Video Communication Server – log monitoring – see here
- Application server log monitoring – Redhat JBoss, IBM Websphere and Oracle Weblogic – see here 15. Brocade ADX load balancer – performance monitoring – see here
- Ruckus Wireless LAN – performance monitoring – see here
- Fortinet FortiManager – performance monitoring – see here
- NetBotz NBRK 2000 – environmental monitoring – see here
- Cisco NBAR monitoring – see here
Enhanced Support
VMware SDK 5.5 API integration – AccelOps automatically uses the API for the right VMware version.
Nessus 6.0 integration – AccelOps automatically determines the right Nessus server version and uses the right API for server versions 4, 5 and 6.
Significant Enhancements
DataManager and ReportWorker module robustness
In this release, DataManager and ReportWorker do not restart under the following conditions
NFS is temporarily not available
Unable to create directories during writing or purging
The modules fall behind in reading shared buffer storage
Additional metrics on trend charts
Users can now see maximum, minimum, percentiles and simple moving averages directly in trend charts in Analytics and Dashboard sections.
Simplify Cloud and Collector health GUI
Users can select what columns to display in Cloud and Collector health pages under Admin tab. By default, fewer columns are displayed now.
Ability to manually add hosts to Application Groups
Device and Application groups are important CMDB objects that allow users to write targeted rules and reports. Until now, Application groups were only populated by discovery. This release allows users to manually add to Application groups in cases where discovery is not practical.
Important user case:
Suppose a rule triggers, namely Excessive DNS requests from a host. The host is actually a DNS server which was not discovered. There is need to create an exception for this rule for this DNS server. Three choices –
- Create a rule exception for this host – sometimes this is not very manageable long term since the fact this is a DNS server can not be used in other analytics
- Discover the host and make sure that the host is in the DNS server group – sometimes this may not be practical.
- Manually add the server to the DNS server group using this feature. The DNS server group can be used for other rules and reports.
The rule would stop triggering – as desired
Set important process and critical interface definitions directly from CMDB
A important process and a critical Interface are always monitored for up/down status. Before this release, these needed to be configured from Admin > General Settings. Setting important process was difficult since one had to type in the process name, This release allows user to set these directly from CMDB > Device.
Dashboard charting enhancements
The following improvements are added
For Bar charts, the legends appear next to the charts and not at the bottom. This improves legibility.
Maximum number of displayed entries are increased form 50 to 200.
Accounting for internal and performance monitoring events
AccelOps has 3 kinds of logs/events
External logs – these count towards the licensed eps
Performance Monitoring events generated by AccelOps when it monitors a device – these also count towards the licensed eps
Internal system logs – generally reporting errors and important informational events – these do not count towards the licensed eps
Since each of these log types have to indexed, stored and since they trigger rules and reports, system performance can be affected. This release provides accurate accounting of these event types via the phstatus commands and also system provided reports. See here for details.
Ability to change event database purge/archive thresholds
By default AccelOps starts to purge (or archive if archive is set) when the free space in event database falls below 10GB. This continues until free event database space reaches 20GB. In very high event rate situations, this 10GB buffer may not suffice and database may become full. This release allows the values to be customized by the user. In phoenix_config.txt, under the phDataManager section, modify the low_space_action_threshold and low_space_warning_threshold values and restart the phDataManager module. This needs to be done at Supervisor and Worker nodes.
Ability to set remote directory renaming action during archive
When AccelOps is archiving and the destination directory already exists, then you can configure AccelOps to either rename the existing directory and archive new data to that location or skip archiving
Registration APIs
Three new APIs are provided for the following functions. For details, see here.
Register Workers to Supervisor
Register Collector to Supervisor
Register Supervisor to AccelOps License Manager
Bug Fixes / Enhancements
Id |
Severity |
Component |
Description |
15147 |
Major |
System |
Separate Chinese language support from English versions |
13921 |
Major |
Application
Server |
SANS Low Sensitivity does not update by the system default API |
14228 |
Minor |
System |
New install images for Collector and Super utilize the same OS RPM packages |
14695 |
Minor |
System |
AccelOps can not connect to the Internet via a Proxy |
14940 |
Minor |
System |
Address Web Server HTTP Trace/Track Method Support Cross-Site Tracing Vulnerability:
(CVE-2004-2320, CVE-2007-3008) by disabling the ability to respond to HTTP TRACE requests |
15079 |
Minor |
System |
Secure Redis service running on Supervisor node by disallowing access from the outside |
13647 |
Minor |
Application
Server |
Stopped Report Generates an Application Exception when it is re-ran |
14409 |
Normal |
Application
Server |
Need to escape special character in rule definition xml |
14274 |
Normal |
Discovery |
VCenter discovery – sometimes a folder shows no VMs in Dashboard > VMView |
15020 |
Normal |
GUI |
Can’t adjust sliders on Dashboard Widgets with multiple sliders |
14347 |
Normal |
GUI |
Add/Modify Rule Exception causes Rule to Save with a new name |
14474 |
Normal |
GUI |
External lookup broken on Summary Dashboards |
14667 |
Normal |
Performance
Monitoring |
Changing a Custom WMI (not just WMI) does not take effect even after discovery |
14469 |
Normal |
Device
Support |
Default WMI Parser not parsing Sharepoint Event Types Correctly |
13393 |
Normal |
Discovery |
Resolve device hostname for ping only discover devices |
13811 |
Normal |
Performance
Monitoring |
No Performance Data Collected After Fortigate Firewall upgrade to version 5.2.3 |
13626 |
Normal |
Rules |
Refined Sub-pattern in “Black List User Agent Match” to reduce false positives |
14417 |
Normal |
Application
Server |
Discovery merge need to OVERWRITE device group also instead of add on |
15014 |
Normal |
GUI |
CMDB Device filtering does not work when Reporting IP can be resolved by DNS |
15177 |
Normal |
Parser |
Some IOS hardware failure events do not parse |
15182 |
Normal |
Performance
Monitoring |
Device interface utilization may not be reported because of XML size overflow (extra large deployments) |
14474 |
Normal |
GUI |
“External Lookup” broken on Summary Dashboards |
12992 |
Normal |
Application
Server |
Reverse Tunnels do not timeout as described |
8515 |
Normal |
Discovery |
NetBotz NBRK0200 is not discovered as NetBotz |
12319 |
Normal |
Performance
Monitoring |
Add Provisioned disk size into PH_DEV_MON_VM_DISK_UTIL event |
13954 |
Normal |
Performance
Monitoring |
Memory Utilization for HPUX process reported as higher than actual Physical Memory Utilization |
14576 |
Normal |
Performance
Monitoring |
PH_JAVA_AGENT_ERROR due to vmDataStore perfmap wrong key |
14826 |
Normal |
Application
Server |
When App server is restarted, false Collector down emails are sent out |
14844 |
Normal |
Application
Server |
Need to turn off Beaconing report generation when Beaconing feature is turned off |
14935 |
Normal |
GUI |
CMDB Exception Report does not correctly populate customer (Org) |
7463 |
Enhancement |
GUI |
Allow Location information in custom email template |
13068 |
Enhancement |
GUI |
Location CSV import needs to be able to do the following (a)intelligently find the entry, (b)merge the entries with changes that are necessary and (c) provide a UI update to tell which entries were updated with changes |
13726 |
Enhancement |
GUI |
Use labeled bars on bar charts rather than a legend |
14212 |
Enhancement |
GUI |
Add a CMDB report for clear rules |
14585 |
Enhancement |
Application
Server |
Optimize CMDB Object REST API for EventType, BizService, Device, Application groups via App Server caching technique |
14701 |
Enhancement |
Application
Server |
Selenium import utilizing java web driver instead of python web driver scripts |
14775 |
Enhancement |
GUI |
In CMDB page, change “Last Updated Time” to “Last Discovered Time” and “Last Updated Method” to “Last
Discovered Method” |
14781 |
Enhancement |
GUI |
Widget dashboard – Table View – Allow one table for whole dashboard |
13809 |
Enhancement |
GUI |
Format report bundle PDF output – show correct page index, remove total number of pages |
14989 |
Enhancement |
GUI |
In Rule/report filter condition, allow user to choose any event attribute attribute IN CMDB Object |
14760 |
Enhancement |
GUI |
In Admin > Setup > Change/Performance Monitor page – Do not show devices deleted by discovery |
15149 |
Enhancement |
Rule / Query
Engine |
Optimization of Rule and Report Worker for large IP Value Set |
13776 |
Enhancement |
Reports |
CMDB Report added to show Rules with Clear Conditions |
15141 |
Enhancement |
Device
Support |
Merge Windows via Log Discovery Using machine GUID |
13726 |
Enhancement |
GUI |
Using Labeled Bars on Bar-charts Rather than a Legend |
14474 |
EnhancementGUI |
|
Allow user to not show Event Type in Dashboard (save precious space) |
15059 |
Enhancement |
Device
Support |
Additional Parsing for DNS Bind (RPZ) |
15091 |
Enhancement |
Device
Support |
Handle Unknown event types for Ironport Mail and Web events |
Current Open Bugs/Enhancements
Id |
Severity |
Component |
Description |
8867 |
Major |
Rule Engine |
LAST and FIRST operators in rules do not work (may crash Rule Worker module) |
11036 |
Major |
Rule Engine |
Rule Worker module may abort when a PctChange Expression is used |
14242 |
Major |
Query Engine |
RBAC data conditions not enforced for SP organizations when login in via the super org and moving to another org. |
15022 |
Major |
Parser Engine |
Parser module may stall/pause if a host name resolution is slow |
11112 |
Major |
Rule Engine |
COUNT DISTINCT operations consume large resources for rules utilizing Anomaly Detection |
14478 |
Major |
GUI |
Sometimes GUI pops up warning (Large amount of data stored over the boundaries) when users restore the archived data or delete the restored data |
15109 |
Major |
Performance
Monitoring |
Failed Custom JDBC job shows in performance page after Discovery |
14766 |
Major |
Application
Server |
LOG discovery does not work properly with multi-tenant reporting devices |
15230 |
Major |
Parser |
Syslog-over-TCP does not work correctly |
15247 |
Normal |
Parser |
AIX Parser cannot parse events correctly. |
15253 |
Normal |
Parser |
Reporting device name is parsed wrong in LinuxInotifyParser (affects Linux file integrity monitoring via AccelOps agent) |
14929 |
Normal |
Performance
Monitoring |
Maintenance calendar issue – Maintenance for a device does not start at the configured time if there is a long running disabled job of another device |
15068 |
Normal |
Application
Server |
Dashboard Search Filtering Does not work for Clariion LUNs under Summary Tab |
15231 |
Normal |
Application
Server |
Generating PDF Reports over 100 Pages will drop Page Footer |
15294 |
Normal |
Parser |
Strange device types may be created by Netflow based LOG discovery. This does not affect system operation. |
14829 |
Normal |
Documentation |
Rule syntax invalid if use “regexp” as the sub-pattern name |
15233 |
Minor |
Application
Server |
“Validation Status” column in Admin->Event DB->Event Integrity does not allow for sorting. |
15300 |
Minor |
GUI |
For Report Server, if you sync -> unsync -> sync is rapid succession, then the last sync may not take effect |
9261 |
Enhancement |
Application
Server |
Charts in exported reports (PDF format) only contain stacked charts – not line charts |