Pete Whitney

Subscribe to Pete Whitney: eMailAlertsEmail Alerts
Get Pete Whitney: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: SOA & WOA Magazine, IT Strategy

Article

Genesis of a Genetic Algorithm

Understanding GAs from business to implementation

Have you ever wondered how a good idea is transformed into business value? Have you ever thought how does someone take an abstract idea and produce business value from apparent nothingness? Have you ever postulated what can you do to leverage the assets at your disposal for the greater good? If so, than sit tight because you are about to take a journey that will equip your inquisitiveness with the step by step actions that FireScope took when it transformed its corporate wide historical metrics from passive data asset into the next level of business intelligence.

With unique insight, FireScope is able to identify potentially hidden relationships from among your IT assets and reveal cause-and-effect metrics which you might not have even known existed. This wizardry is accomplished using a distinctive genetic algorithm solution and yet is as simple to execute as applying a few mouse clicks. In this article we will introduce the business idea that triggered subsequent investigations, which lead to initial analysis, and ultimately lead to the completion of FireScope's genetic algorithm implementation. This journey will cover several topics such as business proposition, data normalization, and genetic modeling. But at the end of the journey you'll recognize that the combination of metric collection and data comparison delivered by FireScope Inc. is unmatched in the IT industry.

But do take caution, because the later portion of this article is presented from the perspective that you already possess a basic understanding of what a genetic algorithm is, as well as some of the concepts that are used in modeling a genetic algorithm. Fear not if you are not yet there, as you can still garner significant benefit from this article without understanding the nuts and bolts of it. Let the journey begin.

FireScope from 30,000 feet
As background, a functioning FireScope deployment has the ability to gather metrics from all forms of existing IT assets, normalize the gathered metrics, provide historical analysis of the metrics, and most importantly provide service views for worldwide operations which are unparalleled in the IT industry.

FireScope collects a vast array of corporate wide metrics. Some examples are CPU utilization, disk storage, host temperature, interface traffic, and memory utilization. Other examples include database metrics, JMX metrics, NetApp metrics, VMware metrics, web server response time, and so many more that we've neither the time nor the space to mention them all. It is also important to understand that every metric gathered by FireScope is collected on its own schedule. Even two or more metrics that have the same collection interval do not collect at exactly the same instant. So with this universe of data the natural question became what can we do to leverage this asset and deliver the next level of business intelligence?

Revealing hidden secrets
If you've been around the IT world long enough than you've likely experienced a time when an update was made to a web-server which invoked new or existing services from an application server which in turn caused dead locks on your database server. Unfortunately, the dead locks did not occur in test because they were load related and as a result weren't uncovered until your public facing application was placed under heavy load on Cyber Monday. Don't worry you didn't need those sales anyway!

Now to be fair, anyone who consistently monitors their IT infrastructure can identify that their web server is not performing as expected. Furthermore, if you know all of the relationships between your web servers, application servers, and database servers, you can even set up static alerts that draw your attention to the notion that one layer of your business is impacting another layer. But where the story gets really interesting is if you fall into one or more of the following categories:

  1. The alerting values that you set are either too high or too low.
  2. You have not taken the time to set up alerts for related IT assets.
  3. You are not aware of the relationships between your IT assets.
  4. You do not properly monitor your IT assets

I hope you can see that this is a very complex world that we live in. To detect a catastrophic corporate shutdown you need tools that can search out the fact that independent metrics collected from independent servers are impacting one another. This is exactly the idea that FireScope sought to solve when it postulated the use of a genetic algorithm to provide optimal search heuristics and uncover hidden relationships in the very same metrics that it had already collected from your IT assets. But not so fast, in order to accomplish this goal we need the ability to compare disparate metrics.

Not all time is created equal
As noted above, FireScope collects a universe of metrics from a universe of IT assets and each is collected on its own schedule. As an example consider CPU utilization taken from two different hosts both being polled on 30 second intervals. Since both are polled on their own schedule we really couldn't say that the two metrics had similar signatures if our metric collection times were not equal. Yet another problem arises when you consider comparing two metrics that have different collection intervals such as CPU utilization collected every 30 seconds on one asset vs. CPU Utilization collected every 5 minutes on another asset. How can these be easily compared?

FireScope needed the ability to easily normalize the time domain from all collected metrics in order to fairly and accurately compare metrics collected on independent schedules. As it turns out, FireScope already trends all metrics that have numeric representations. You can think of trending as averaging over time. But for the purposes of FireScope's genetic algorithm, the trending operation also contributed an effective normalization in the time domain for all metrics having a numeric representation. So problem one is solved, because all numeric metrics are trended every hour starting on the hour.

Not all units have the same value
Let's trade one for one. I'll give you a nickel for every dollar you give me. Sound fair? Yea, I didn't think you would go for that either, but that thought does bring to light the next challenge of comparing metrics having differing units. The problem becomes even more challenging when you consider that in some instances the units might not even be from the same domain! Consider the chart below which details just a few metric/unit pairings from your IT assets:

Metric

Units

Interface Traffic

Bytes/second

CPU Utilization

Percentage

Host temperature

Degrees F or C

Web Server Response Time

Seconds

How can "Degrees F" be compared to "Percent CPU Utilization"? FireScope needed the ability to compare differing metrics collected across your IT infrastructure each having potentially different metric/unit representations.

Well, what if we compared the relative rise/fall of collected values instead of comparing the values themselves? In doing so we wouldn't be comparing the values themselves, but the increase or decrease in metric value over time. In short, FireScope calculates the tangent, or rate of change, between hourly trends for all collected metrics as a pre-processing step for its genetic algorithm. As you'll see later, this series of data is used to form a "gene" or genetic sequence and subsequently used to compare one metric against another to determine how closely the two signatures rise or fall within the same time frame.

Action and reaction
If you're in IT, you will most likely care if a rise in web traffic caused a delayed rise in CPU activity on another system which in turn caused a general slowdown in your business response times.

The third level of value that FireScope delivers with its genetic algorithm solution is the ability to search out cause-and-effect metrics from within your IT assets. By applying a sliding window comparison of a selected metric against the searched metrics, FireScope can search out the possibility that deviations in one metric appear before or after a target metric anomaly.  In doing so, the notion that one metric caused another metric to deviate is displayed graphically by a simple time series display. Furthermore this cause and effect rendering can appear in either direction.

  1. The target metric may have deviated because it is impacted by some other metric
  2. The target metric deviation may have impacted some other metric in your system
  3. Or both of the above are true and your system has multiple cause-and-effect metrics

But once again you may not have even known that these metrics have cause-and-effect relationships or that the relationships exhibit delayed response signatures.

Starting the analysis
Let's assume that you fall into category 3 referenced above which implies that you are actively monitoring your IT assets, but you might not be aware of all of the relationships between all of your IT assets. Let's also assume that you are experiencing a slow-down in one of your important business operations, but other than the apparent slow-down you can't quite explain why this portion of your business is slow. You observe that your CPU load is higher than normal on one system which is displayed via a historical graph of the system CPU load. You start FireScope's analytics and select this same metric. You provide the analysis start time by selecting the data area just prior to the point in time where the CPU load started to rise. Next you select the analysis end time by marking the data area after the CPU returned to normal, or you select now because the CPU load is still high. As a last step you trigger the Genetic Algorithm analysis, asking FireScope to search out other metrics that exhibit a similar metric response during the same timeframe as the selected metric. The result of the analysis is the top 5 metrics that most closely match the signature of the selected metric. The value of the analysis is that the resulting metrics may very well contain metrics that either caused the spike in CPU load that is being evaluated, or these other metrics may be being impacted by the selected metric.

Components of a genetic algorithm
A genetic algorithm is an optimized search solution that attempts to mimic the process of natural evolution. Natural evolution uses "genes", "chromosomes", "genetic-mutation", "genetic-crossover" and multiple generations to produce improvements in nature. Of course sometimes natural evolution produces defects or mutations, but just as in nature, these apparent defects can turn out to be extremely valuable assets in the evolutionary process. Genetic algorithms attempt to simulate natural evolution concepts in software by expressing an optimized search solution using these very same natural evolution concepts.

The figure below illustrates FireScope's use of several genetic algorithm constructs and provides a brief synopsis of the genetic algorithm process.

GA building blocks, the "gene"
Let's use a bottom up approach and talk about genes first. As was mentioned above, FireScope compares the relative change over time of disparate metric values. This comparison is accomplished by first digitizing the comparison values into a representative alphabet where each character in the alphabet represents the change in slope, or tangent, between two sequential trend values. Since time is always increasing in this domain the only values that are relevant are values that fall between -π/2 (-90 degrees) to π/2 (90 degrees). FireScope divides this range into 90 distinct buckets each representing one letter of a 90 character alphabet. The digitization process allows for optimized comparison and also filters out small changes that are not significant enough to impact the comparison process. The series of alphabetic values representing digitized values from one metric are encoded into a gene starting from the earliest time slot under consideration to the latest.

GA building blocks, the "chromosome"
Moving up in our bottom up approach is the formation of multiple genes into a chromosome. FireScope's goal is to identify five metrics that are most closely related to a pre-selected target metric in a time range that is slightly wider than the selected metric time range. As a result, FireScope creates a chromosome from five genes of other metric trend values each of which were digitized into a gene. The resulting chromosome represents one possible solution from among millions of combinations of solutions. As the genetic algorithm progresses, it will search out from among millions of chromosomes five other chromosomes that best match the trend pattern of the target chromosome.

GA building blocks, population/fitness/scoring
The first selection of genes to form the initial chromosome population is purely a random selection from all existing numeric trend data. FireScope creates an initial population of several hundred chromosomes (possible solutions) and then applies a fitness algorithm to this population. The fitness algorithm assigns a numeric value which represents an assessment of how closely each chromosome in the population matches the target metric. Chromosomes that score the highest are chosen for mating to produce the next generation. The desired intent is that improved next generation chromosomes are the natural result of combining chromosomes from the best parents of the prior generation.

GA building blocks, mating
This process is sometimes referred to as genetic crossover, and is the process of selecting some genes (metrics) from each of two different parents to produce a new child chromosome. The new child chromosome is 5 genes made via crossover or mating from two high scoring parents from the prior generation. This chromosome, as with all others, represents one possible solution from among millions of possible solutions that might be the 5 metrics that most closely resemble the signature of the pre-selected target metric.

GA building blocks, "mutations"
After mating a small proportion of the new generation of chromosomes are chosen to be randomly mutated. FireScope uses the mutation process to inject previously unexplored genes (metrics) into the search process. If a chromosome is selected for mutation a random gene is replaced by a gene from a metric that has not yet been explored. This mutation process can have the effect of improving the overall score of the selected chromosome, or degrading it. However the randomness of this process has been shown to improve GA search capabilities just as mutation in nature sometime provides improvement though natural genetic evolution.

Cause-and effect
In comparing other metrics against a target metric, FireScope evaluates a longer timeframe than the time range selected by the user's target metric. The compared metric trend values are compared both prior to the target metric and after the target metric. By expanding the search window and sliding the target metric over the searched metric FireScope delivers the ability to detect if a searched metric may have caused the target metric to deviate from a normal signature, or if the target metric caused other metrics to deviate from their normal signature. This determination is accomplished by detecting the similarity of the target metric to the searched metric. This approach has nothing to do with genetic algorithms, but is simply a higher level value that is extracted from the FireScope's genetic algorithm implementation.

GA building blocks, "completion"
After several thousands of generations have been evaluated and the improvement of scoring has slowed to an acceptable level the genetic algorithm completes and the top scoring chromosome from the last generation is selected as the best solution. This chromosome contains 5 genes that represent the top scoring metrics which most closely match the target metric. Each gene (metric) from the top scoring chromosome can be displayed back to the user for further investigation, and each is displayed on the same graph as the selected target metric.

Conclusion
If this is your first exposure to genetic algorithm techniques, it can be overwhelming to try to understand all of this Genetic Algorithm terminology. Concepts such as genes, chromosomes, mutation, and fitness algorithms are difficult to conceptualize. While FireScope uses genetic algorithm techniques, it is important to understand that this approach is nothing more than achieving optimal search times to deliver the business value of revealing possibly unknown relationships between disparate metrics collected throughout your IT assets.

It is also instructive to realize that the real ingenuity in this approach is not in the application of the genetic algorithm, but rather in the normalization techniques that were applied to deliver the ability to compare disparate metrics. While the application of the genetic algorithm does provide optimized search results, these results could not have been achieved if it weren't for the initial work of normalizing the time domain, normalizing the value domain, and implementing the sliding window analysis that delivers the ability to uncover delayed cause-and-effect metrics hidden within your IT infrastructure.

FireScope's genetic algorithm coupled with your inquisitiveness form a near super human capability that exists nowhere else in the IT industry. As with all supernatural powers, you must use them wisely!

References

More Stories By Pete Whitney

Pete Whitney is a Solutions Architect for Cloudera. His primary role at Cloudera is guiding and assisting Cloudera's clients through successful adoption of Cloudera's Enterprise Data Hub and surrounding technologies.

Previously Pete served as VP of Cloud Development for FireScope Inc. In the advertising industry Pete designed and delivered DG Fastchannel’s internet-based advertising distribution architecture. Pete also excelled in other areas including design enhancements in robotic machine vision systems for FSI International Inc. These enhancements included mathematical changes for improved accuracy, improved speed, and automated calibration. He also designed a narrow spectrum light source, and a narrow spectrum band pass camera filter for controlled machine vision imaging.

Pete graduated Cum Laude from the University of Texas at Dallas, and holds a BS in Computer Science. Pete can be contacted via Email at pwhitney@cloudera.com.