Nortel Institute for Telecommunications: Strategic Analysis

Nortel Institute for Telecommunications of the University of Toronto

Data Analysis for Commercial and Industrial Applications

Over 100 international research, private sector, and student participants gathered together at the Fields Institute from February 1-4 for a workshop on Data Analysis for Commercial & Industrial Applications organized by the Fields Institute and the Nortel Institute for Telecommunications of the University of Toronto.

The aim of the workshop was to bridge leading-edge mathematical techniques with commercial and industrial applications of data analysis and to present problems motivated by commercial and industrial needs. Speakers presented ongoing research and data analysis challenges while sharing mathematical results and ideas on data analysis across the mathematics, statistics, physics, biophysics, computer science, telecommunications, and engineering communities. The various categories of mathematical methodologies for discussion included: stochastic processes and Markov chains, nonlinear dynamics and nonlinear time-series analysis, multi-fractal analysis, data mining, and data and signal processing. MITACS and Nortel Networks sponsored the Workshop.

Program and Abstracts

Tuesday, February 1, 2000

11:00-11:30 - Opening address
Claudine Simson, V.P., Disruptive Technology, Network and Business Solutions, Nortel Networks

11:30-12:30 - Modern data analysis and its application to Nortel Networks data
Otakar Fojt, The University of York
   In this talk we outline an approach to the analysis of sequential manufacturing and telecom traffic data from industry using techniques from nonlinear dynamics. The aim of the talk is to show the potential of nonlinear techniques for processing real world data and developing new advanced methods of commercial data analysis.
    The basic idea is to consider a factory as a dynamical system. A process in the factory generates data, which contains information about the state of the system. If it is possible to analyse this data in such a way that knowledge of the system is increased, control and decision-making processes can be improved. This will result, if applied, in a basis of competitive advantage to the factory.
   First, we give details of the general idea and the type of recorded data together with the necessary preprocessing techniques. We follow this with a description of our analysis. Our approach consists of state space reconstruction, applications of principal component analysis and nonlinear deterministic prediction algorithms. The talk will conclude with our results and with suggestions for future work.

1:30-2:00 - The need for real-time data analysis in telecommunications
Chris Hobbs, Sr. Mgr., System Architecture, Nortel Networks
   A telecommunications network typically comprises many independently-controlled layers: from the physical fibre interconnectivity, through wavelengths, STS connexions, ATM Virtual Channels, MPLS Paths to the end-to-end connexions established for user services. Each of these layers generates statistics that, in a large network, may easily be measured in tens of gigaBytes per hour.
   Traditionally, the layers have been controlled individually since the complexity of "tuning" a lower layer to the traffic it is carrying has been too great for human operators (particularly where the carried traffic itself has complex statistics) and since the work involved in moving connexions (particularly fibres and wavelengths) has been prohibitive.
   Technological advances in Optical Switches, capable of logically relaying fibre or wavelengths in micro-seconds, have made flexible network rebalancing possible and Carriers, the owners of these large networks, are demanding lower costs by combining layers and exploiting this new agility. In order to address this problem, the Terabytes of data being extracted daily from the large networks need to be analysed: initially statically to determine the gross inter-related behaviours, and then dynamically to detect and react to changing traffic patterns.

2:30-3:30 - Noise reduction for human speech using chaos-like features
Holger Kantz, Max-Planck-Institut für Physik komplexer Systeme
   A local projective noise reduction scheme, originally developed for low-dimensional stationary signals, is successfully applied to human speech. This is possible by exploiting properties of the speech signal which mimic structure exhibited by deterministic chaotic systems. In high-dimensional embedding spaces, the strong non-stationarity is resolved as a sequence of different dynamical regimes of moderate complexity. This filtering technique does not make use of the spectral contents of the signal and is far superior to the Ephraim-Malah adaptive filter.

4:00-5:00 - Scaling phenomena in telecommunications
Murad Taqqu, Boston University (Lecture co-sponsored by Dept. of Statistics, University of Toronto)
    Ethernet local area network traffic appears to be approximately statistically self-similar. This discovery, made about eight years ago, has had a profound impact on the field. I will try to explain what statistical self-similarity means and how it is detected. I will also indicate how its presence can be explained physically, by aggregating a large number of "on-off" renewal processes, whose distributions are heavy-tailed. As the size of the aggregation becomes large, then, after rescaling, the behavior turns out to be the Gaussian self-similar process called fractional Brownian motion. If, however, the rewards instead of being 0 and 1 are heavy-tailed as well, then the limit is a stable non-Gaussian process with infinite variance and dependent increments. Since linear fractional stable motion is the stable counterpart of the Gaussian fractional Brownian motion, a natural conjecture is that the limit process is linear fractional stable motion. This conjecture, it turns out, is false. The limit is a new type of infinite variance self-similar process.

Wednesday, February 2, 2000

9:30-10:30 - Electrical/Biological networks of nonlinear neurons
Henry Abarbanel, Institute for Nonlinear Science at USCD, San Diego
   Using analysis tools for time series from nonlinear sources, we have been able to characterize the chaotic oscillations of individual neurons in a small biological network that controls simple behavior in an invertebrate. Using these characteristics, we have built computer simulations and simple analog electronic circuits, which reproduce the biological oscillations. We have performed experiments in which biological neurons are replaced by the electronic neurons retaining the functional behavior of the biological circuits. We will describe the nonlinear analysis tools (widely applicable), the electronic neurons, and the experiments on neural transplants.

11:00-11:30 - E-commerce and data mining challenges
Weidong Kou, IBM Centre for Advanced Studies
    E-commerce over Internet is having a profound impact on the global economy. Goldman, Sachs & Co. estimates B2B e-commerce revenue alone will grow to $1.5 trillion (US) over the next five years. Electronic commerce is becoming a major channel for conducting business, with increasing number organizations developing, deploying and installing e-commerce products, applications and solutions.
   With rapid e-commerce growth, there are many challenges, for example, how to analyze e-commerce data and provide an organization with meaningful information to improve their product and services offering to target customers, and how to group millions web users who access a web site so that the organization can serve each group of users better and can reduce the business cost and increase the revenue. These challenges would bring a lot of opportunities for data mining researchers to develop better intelligent algorithms and systems to solve the practical e-commerce problems. In this talk, we will use IBM Net.Commerce as example to explain the e-commerce development and challenges that we face today.

11:30-12:00 - Occurrence of ill-defined probability distribution in real-world data
John Hudson, Advisor, Radio Technology, Nortel Networks
   In many communications problems the statistics of the data, communication channels, and behaviour of users is ill defined and not handled well by the simpler concepts in classical probability theory. We can have data with alpha-stable (infinite variance) characteristics, long-tailed and large variance log normal distributions, self similarity in the time domain, and so on. If the higher moments of the underlying distributions do not exist or have disproportionate values then laws of large numbers and the central limit theorem may not be safely applied to a surprising number of problems. The behaviour of some control mechanisms can begin to take on a chaotic appearance when driven by such data.
   In this talk, some of the properties of data, channels and systems that are confronting workers in the communication field are discussed. It is illustrated with examples taken from network data traffic, Internet browsing, radio propagation, video images, speech statistics and so on.

1:30-2:30 - The analysis of experimental time series
Tom Mullin, The University of Manchester
   We will discuss the application of modern dynamical systems time series analysis methods to data from experimental systems. These will include vibrating beams, nonlinear oscillators and physiological measures. The emphasis will be placed on obtaining quantitative estimates of the essential dynamics. We will also describe the application of data synergy methods to multivariate data.

2:30-3:00 - Fuzzy-pharmacology: Rationale and applications
Beth Sproule, Faculty of Pharmacy and Department of Psychiatry Psychopharmacology, SunnyBrook Health Sciences Centre, Toronto
   Pharmacological investigations are undertaken in order to optimize the use of medications. The complexity and variability associated with biological data has prompted our explorations into the use fuzzy logic for modeling pharmacological systems. Fuzzy logic approaches have been used in other areas of medicine (e.g., imaging technologies, control of biomedical devices, decision support systems), however, their uses in pharmacology are incipient. The results of our preliminary studies will be presented in which we assessed the feasibility of fuzzy logic: a) to predict serum lithium concentrations in elderly patients; and b) to predict the response of alcohol dependent patients to citalopram in attempting to reduce their drinking. Since then many current projects have evolved. Approaches to this line of investigation will be presented.

3:30-4:30 - Geospatial backbones of environmental monitoring programs: the challenges of timely data acquisition, processing and visualization
Chad P. Gubala, Director, The Scientific Assessment Technologies Laboratory University of Toronto
   When considering ‘environmental’ issues or legalities, a general and useful description of a pollutant is an element or entity in the wrong place at the wrong time and perhaps in the wrong amount. Prior to the establishment of cost-effective global positioning, monitoring the fate and transport of environmental pollutants was limited to reduced scale and statistically based sampling programs. Whole systems models developed from parcels of environmental studies have been limited in predictive capability due to unnoticed attributes, undocumented synergies or antagonisms and un-quantifiable spatial and temporal variances.
   Advances in the areas of commercial geospatial technologies and high-speed sensors arrays have now offered the possibility of assessing a whole ecosystem in near real time and in a spatially complete manner. This capacity should then greatly improve quantitative environmental modeling and the adaptive management process, further ‘tuning’ the balance between global environments and economies. However, the promise of increased knowledge about our natural resources is now limited by our capacity to move the data collected from integrated geopositioning and sensor systems into meaningful management products. This talk describes these limitations and addresses the needs for developments in the areas of real time analytical protocols.

4:30-5:00 - Data mining and its challenges in the banking industry
Chen Wei Xu, Manager, Statistical Modeling, Customer Knowledge Management, Bank of Montreal

Thursday, February 3, 2000

9:30-10:30 Elements of fuzzy system modeling
I.B. Turksen, University of Toronto
   In most system modeling methodologies, we attempt to find out, in an inductive manner, how a particular system behaves. That is, we essentially try to determine how the input factors affect the performance measure of our concern. There are at least three approaches to system modeling: (1) personal experience, (2) expert interviews and teachings, and (3) data mining with historical data.
   In all these approaches, there are two fundamental theoretical base structures for system modeling: (1) classical two - valued set and logic theory based functional analyses and / or (2) novel (35 years old) Infinite (fuzzy) - valued set and logic based super functional analyses. Furthermore there are to basic learning methods in these two approaches: (1) unsupervised learning and (2) supervised learning. The basic difference between these two methods of learning is that the first has no goal whereas the second has a goal. Generally the goal of supervised learning is to assure that the model result compared to the actual is minimized.
   In classical two-valued set and logic based functional analyses, the world and its systems are seen through the two-valued, black and white, restricted view of, what is called, the clear patterns. Unfortunately, first the two - valued dichotomy forces one to make arbitrary choices when there are many alternatives to choose from. Secondly, functional view can only represent many to one mapping by its very definition. Thirdly, the combination of variables are assumed to be additive and multiplicative leading to linear superposition schema in functional representation of systems. In this view, logical “OR ness” is simply mapped to “algebraic plus” and “AND ness” to “algebraic multiplication”. Fourthly, imprecision in data are generally assumed to originate due to random occurrences.
   Whereas, in fuzzy (infinite) - valued set and logic based super functional analyses, the world and its systems are seen through information granules which admit an unrestricted view of fuzzy patterns. Fortunately, first we are not forced to make arbitrary choices but have the freedom to choose the gradation that is appropriate for a given situation. Secondly, super functional view allows us to make many to many mapping. That is membership functions are identified to specify patterns via fuzzy cluster analyses. But then we can establish cluster to cluster mappings over these functions that gives us super functional representations. Thirdly, the combination of variables are generally super additive or sub additive requiring highly nonlinear representations. In fuzzy theory there are infinitely many ways to represent “AND ness” (conjunction) and “OR ness” (disjunction) depending on context and the behavior of a given system. Fourthly, imprecision in data are generally deterministic due to incapability of our measurement devices.
   In our integrated fuzzy system modeling approach, we first use fuzzy clustering techniques to learn patterns with fuzzy scatter matrices and diagrams to determine the essential fuzzy clusters, i.e., the effective rules of system behavior. This is an unsupervised learning method. Next we fit membership functions to these clusters. As well we determine significant and critical variables that affect the system behavior drastically and moderately, etc.
   Later, we apply supervised learning to determine the nonlinear operators that combine the fuzzy clusters in many to many maps of input and output variables in order to achieve minimum system model error. In this supervised learning we also implement compensation and compromise between the extreme values of formulas that specify combination of concepts and hence the appropriate combination of variables as well as alternate inference schemas.
   Real-life system model building examples include: (1) a continuous caster model that attempts to balance tardiness of customer delivery due dates versus mixed grade steel production and (2) pharmacological models that attempt to determine the effects of medication on humans. Simulated system model building examples include: (1) utilization of Internet data links, (2) analyses of traffic characteristics, and (3) discard rate prediction.

11:00-12:00 - A Steel Industry Viewpoint on Fuzzy Technology -Scheduling Analysis Application
Michael Dudzic, Manager, Process Automation Technology, Dofasco Inc.
   This presentation will discuss the experiences in the use of Fuzzy Expert system technologies as it was applied in a proof-of-concept project looking at 2 specific issues in scheduling the #1 Continuous Caster at Dofasco. This talk complements I. B. Turksen’s talk on Elements of Fuzzy System Modeling.
&
The Application of Multivariate Statistical Technologies at Dofasco
   This presentation will discuss the experiences in the use of Multivariate Statistics (Principle Component Analysis and Partial Least Squares) in applications at Dofasco. The focus example will be the on-line monitoring system at the #1 Continuous Caster.

1:00-2:00 - Recent developments in decision tree models
Hugh Chipman, University of Waterloo
   Decision trees are an appealing predictive model because of their interpretability and flexibility. In this talk, I will outline some recent developments in decision tree modeling, including improvements in model search techniques, and enrichments to the tree model, such as linear models within terminal nodes.

2:30-3:30 - A hybrid predictive model for database marketing
Zhen Mei, Generation 5
   We discuss a simple hybrid approach for predicting response rate in mailing campaigns and for predicting certain demographic and expenditure characteristics in customer database. This method is based on cluster analysis and predictive modeling. As an example we model home ownership for the State of New York.
&
Missing value filling
Wenxue Huang , Generation 5
   The talk is about the missing value filling methodology and software that are being developed by Generation 5 and focused on the mathematics for target data being interval-scaled. A local-and-global (or vertical-and-horizontal) balanced approach in a multivariate and a large database setting will be discussed. The methodology and software may apply to doing prediction: filling in missing values is equivalent to predicting instant target values based on reliable complete historical records and current incomplete input.

4:00-5:00 - Challenges in the development of segmentation solutions in the banking industry and a genetic algorithms approach
Chris Ralph, Senior Manager Market Segmentation, Bank of Montreal
    The Bank of Montreal team is in the process of building market segmentation solutions for a few different lines of business using syndicated survey data. The dataset consists of 4,200 responses from households across Canada (geographically unbiased sample), and contains detailed information on their financial holdings across all institutions, as well as channel usage, banking habits, and household profile information. The process we typically follow in the development of a segmentation solution consists of the following steps:

1) Standard preprocessing stuff (treating outliers, missing values, standardization.) --> 3-5 days
2) Data reduction via factor analysis, PCA, or simple cross-correlations to help avoid redundancy in the cluster runs --> 2-3 days
3) Brainstorming sessions with the lines of business to help us understand key business issues, and generate a list of potential driver variables --> 1-2 weeks
4) AIternative cluster runs using the brainstorming suggestions and data reduction output to generate potential solutions through trial and error. --> 2-4 weeks

   The evaluation of solutions in Step 4 involves making trade-offs between the number of clusters, cluster size, cluster overlap, and the degree to which the current solution meets the needs of the business as determined through the brainstorming sessions. This is usually a painful process that relies heavily on the experience of the analyst to bridge the gap between cluster solution statistics and relevance to the business. Given the highly manual nature of this task, we can only evaluate a very small subset of the universe of possible solutions, and different analysts will generate very different solutions.
   The discussion will focus on the development of an objective function which captures both business rules and cluster statistics, and which allows for the evaluation and ranking of a much larger number of potential solutions. The elements of the objective function will be described in fairly simple terms, which apply to any segmentation problem, and show how genetic algorithms may be used to “evolve” potential solutions. An open discussion will be encouraged of ways to improve the encoding of the problem and the objective function, as well as a discussion of the challenges associated with the integration of business rules. There are also plenty of issues surrounding the use of genetic algorithms to help optimize the search through the space of possible solutions.
   The current objective function captures the business rules simply by measuring the average variance of key “business driver” variables across the clusters, where these variables have been selected ahead of time in cooperation with the line of business. The higher the variance of these variables across the segments, the more distinct and relevant the clusters should be. Average cluster overlap is calculated by building n-dimensional hypersheres (where n = # of cluster drivers) around the centroids of the clusters, where the radius of the hypershere is between 2 and 3 RMS standard deviations. Overlap is defined as occurring when any single observation falls within the hypershere of a cluster, which it has not been assigned to. Cluster size may be integrated into the objective function, where solutions are penalized for having clusters that are either too large or too small.

Friday, February 4, 2000

9:30-10:30 - Interdisciplinary application of time series methods inspired by chaos theory
Thomas Schreiber, University of Wuppertal
   We report on real world applications of time series methods developed on the basis of the theory of deterministic chaos. First, we demonstrate statistical criteria for the necessity of a nonlinear approach. Nonlinear processes are not in general purely deterministic. Then we discuss modified methods that can cope with noise and nonstationarities. In particular, we will discuss nonlinear filtering, signal classification, and the detection of nonlinear coherence between processes.

11:00-12:00 - Symbolic data compression concepts for analyzing experimental data
Matt Kennel, Institute for Nonlinear Science at USCD, San Diego

1:00-2:00 - Geometric time series analysis
Mark Muldoon, University of Science and Technology in Manchester
   A discussion of a circle of techniques, all developed within the last 20 years and all loosely organized around the idea that one can extract detailed information about a dynamical system (say, the equations of motion governing some industrial process...) by forming vectors out of successive entries in a time series of measurements.

2:30-3:30 - Chaotic communication using optical and wireless devices
Henry Abarbanel, Institute for Nonlinear Science at USCD, San Diego

3:30-4:30 - Status of cosmic microwave background data analysis: motivations and methods
Simon Prunet, CITA (Canadian Institute for Theoretical Astrophysics), University of Toronto
    After a brief review of the physics that motivates measurements of Cosmic Microwave Background anisotropies, I will present the current observational status, the analysis methods used so far, and the challenge posed by the upcoming huge data sets from future satellite experiments.