Evaluation Of Information Service Architectures For Grids
1 Institute of Computer Science, KTH, Stockholm, Sweden
The purpose of this research is to evaluate a grid information system based on two different architectures: a hierarchy of MDS4 services and a flat peer-to-peer system based on Distributed Hashtables (DHT) and measure their precision. It takes the approach of pragmatically comparing the systems and experiments are conducted as regarding their ability to answer to queries when resources churn.
Resources and users in a grid are described and searched upon a set of two or more attributes. Locating resources in a grid is more complex than locating resources in a peer-to-peer system: the reason is that resources need to match multi-attribute range queries, i.e. queries that identify the resources characterized by a set of attributes whose values fall into given intervals. Peer-to-peer DHT systems support mostly exact queries for one search key only. The requirement that multi-attribute and range queries should be performed is therefore set and that the system must also be evaluated under churn. Conclusions are drawn about how the systems behave under these circumstances according to the metrics of precision. This work is closely related to the following two topics of CGW’10: monitoring and information management; distributed computing infrastructures.
Grids and Peer-to-peer systems are both distributed facilities for coordinated sharing of computing resources. They have very different requirements: grids are more secure and reliable. Grids are built for Virtual Organizations (VO) that aggregate resources and users. The Open grid Services Architecture (OGSA) specification lists the services needed to build a grid and they are 1) resource management 2) scheduling jobs 3) information services for metadata.
The focus of this work is on information systems for grids: evaluation of their ability to answer to queries when resources churn is done by experiments. The metrics used is that of precision. Information is usually made available through a centralized or distributed system where the distributed system can be designed in a hierarchical or flat fashion. Hierarchies require global knowledge to be able to work and are usually built as a tree. MDS4 (Globus Toolkit Monitoring and Discovery System)  is an available information system for grids that is organized in a hierarchy of data sources and allows for global knowledge within a grid built with the Globus Toolkit. Tree hierarchy is inefficient that is why flat systems like the DHT (Distributed Hash Table)  based ones emerged. The research is focused and narrowed to metadata about resources and users. The approach taken is that of pragmatically compare the systems. This approach could be not completely theoretically fair but it is important to compare technologies while they operate live.
These constraints and limitations apply: a fixed schema of information throughout the experiments is used. Flexible schema models for discovery of data in grid systems are out of the scope of this work.
Not any similar evaluation has been performed on the infrastructures analyzed in this work. Some related work is available: evaluations of performance of discovery within Grid systems, Peer-to-peer middleware for grid monitoring and discovery and some available DHT systems capable of range queries on multi-attributes. The latest were not created for Grid systems.
2.1 Evaluations of performance of discovery applied to grid systems
Work in  evaluate the performance of a hierarchical grid information system built as an MDS4 system as compared to their super peer model which is a not DHT based peer-to-peer system. The paper  is the only work that like ours conducts an evaluation of many platforms for grid information discovery. This paper is from 2003 and the platform being compared are: a relational one based on mySQL 4.0, Xindice 1.1 which is a native XML implementation, and MDS2 which is mainly based on LDAP. No churn behavior is modeled. The metrics used is that of Query Response Time. Authors of  study the performance of the Globus Toolkit Monitoring and Discovery Service (MDS2), the European Data Grid Relational Grid Monitoring Architecture (R-GMA), and Hawkeye.
None of the mentioned papers takes in account churning behavior.
2.2 Peer-to-peer middleware for grid monitoring and discovery
 proposes an improved resource discovery and monitoring system based on the Pastry DHT, specifically for grid computing environments. The proposed system is composed of multiple Pastry layers. Each layer is composed of resource-bearing nodes which contribute one particular resource attribute, such as CPU or disk, over a specific threshold, to the layer. It is capable of multi attribute searches but has a very primitive support for range queries. Only values being “<” than, or “>” a designated value may be retrieved. Papers in  and  describe usage of unstructured peer-to-peer systems within grid systems.
2.3 DHT tools capable of range queries on multi-attributes
MAAN  is built on Chord and its implementation is not available for use. It should provide both multi-attribute and range queries. A separate DHT layer is kept for each attribute. One DHT lookup for each attribute is performed and the results of each sub-query are intersected to filter out the result.  report some deeper evaluation of the locality preserving hash function used in MAAN as regarding also the balance load.  use also a Space Filling Curve as well to achieve range queries over a DHT but they map a d-dimensional space to a one-dimensional index. Such a construction gives the ability to search across multiple attributes. Xenosearch  is not available for use and uses an extension of the Pastry DHT: one Pastry layer for each attribute type is used. The work  is a recommended reading for a deep analysis of this subject.
A simple discrete-event simulation approach is used that is not dependent on external data instead of a trace-driven discrete-event simulation program.
With this fundamental assumption about the model: a two-state failure model is used which alternates between the recovered and the failed state. Another model that is used in research is a three-state model where these states occur in sequence: recovered, failed but not visible, failed.
These stochastic assumptions apply:
|Failure rate is the average number failures per unit of time per node. It follows the exponential distribution. = 1/TTF.
Repair rate is the average number of repairs per unit of time. It follows the exponential distribution. = 1/TTR.
|The experiment uses two recovery models. In the first model a correlation between TTF and TTR exists, TTR depends on TTF; so ρ is fixed for the experiments and then a TTR value is derived from it for each TTF that is experiment with. The second model has a fixed TTR for all the experiments.
|Failure and repair rate are the same on average for each node.
That is: i = i>0 ; i = i>0.
|Query rate is the number of queries per unit of time. It has a uniform distribution and is constant throughout the experiment.
|The intensity relation between and : .
||The number of nodes and number of clients performing queries is kept constant throughout each experiment. Nodes churn while clients that issue queries do not churn.
||Each node has its own repair facility or team.
Schema of the data sets which are queried: The schema of the data describing CPU processors resources and the schema of the data describing membership to a VO is in the following table. The last is similar to that described in the referenced paper 
Table 1 Schema of the data set for CPU processor data and of the data set for membership data
Size of the experiment: The aim is to imitate so much as possible the size of EGEE II grid. The last figures from  tell about 17.000 users spread among 162 registered VOs with about 20000 computation units. The number of VO is rounded up to 250 and the size of 25000 for both the available computational units and the number of users is chosen.
3.1 Results of the experiment
This is the summary about the parameters of the experiment and the configuration which is common for all the following experiments:
|Object: perform multi-attribute and range query to retrieve records. With churn.
||Number of nodes: 250.
Searchable keys stored in the system: # 25000 (for CPU data and for User data sets).
|Items: saved data records (about membership or CPUs).
||Range/selectivity of queries performed: it covers the whole ranges.
|QOS and metrics of discovery: precision, how many false negative are got.
||Clients: one client performs a query every 5 seconds towards the network of nodes.
Each point in the pictures represents a run of one experiment for the given [ρ, time length, type of network, TTF].
MTTR is calculated from ρ as from the formula for intensity that was given at the beginning of this chapter.
All these experiments were run with a ρ of 0.1 while the λ (= 1/MTTF) varies on values of 0.016; 0.002; 0.025; 0.033; 0.05; 0.1; and 0.5 minutes. The values of λ placed on the x-axis of the diagrams are calculated in the table “Calculation of ”.
node is alive
node is stopped
Table 2 Calculation of = (1/MTTF). Values of MTTF and MTTR of experiments when ρ is 0.1
- Results of experiment with TTF and TTR skewed, ρ = 0.1, 250 nodes, 120 minutes on CPU data
The diagram here below summarizes these results.
Figure 1 – Results: CPU data, ρ 0.1, TTR TTF skewed
MDS4 is almost always better then the chord implementation apart when the TTF is at its shortest limit and when TTF is 0.1. The negative rate of the Chord implementation seems to ‘explode’ upwards when the TTF is shortest. The Chord implementation has also a larger dispersion measured as Standard deviation as pictured here below. The largest part of the information that it is known to be in the system is anyway not returned back from queries and this happens both for MDS4 and the Chord based implementation. A TTF shorter than five minutes is not used as MDS4 does not seem to manage very short intervals. Such short intervals of churn are perhaps not relevant when the scope of the experiment is a science grid while it would be relevant in a desktop grid or in an internet-wide general facility.
Figure 2 – Std.Dev. MDS4 and Chord-multi
3.1.2 Results of experiment with TTF skewed and TTR fixed, ρ = 0.1, 250 nodes, 120 minutes on CPU data
A different recovery model is used: the TTR is fixed now while the TTR still varies. The diagram here below summarizes these experiments.
Figure 3 – Results: CPU data ρ 0.1, TTR skewed TTF fixed
3.1.3 Results of experiment with TTF skewed and TTR fixed, ρ = 0.1, 250 nodes, 120 minutes on Membership data
The DHT query is issued on the attribute ‘Group’ which is uniformly distributed in this case. Please refer to the previous table which describes the schema for User data. A recovery model with a fixed TTR while the TTR still varies is still used.
The diagram here below summarizes these experiments.
Figure 4 – Results: User data, ρ 0.1, TTR skewed TTF fixed
It is observed a slight improvement in the precision of the chord implementation which gets better scores then the MDS4. MDS4 is always better from a TTF equal and greater then 0.050.
Conclusions and future work
The experiments show that the MDS4 and the Chord based system behave differently under this model of churn and under the assumptions that were made for the data sets. MDS4 presents a better precision in general but the difference is not that big. The precision rate that both systems deliver is not particularly satisfying. Large part of the information that it is known being available is never retrieved by these systems. On the positive side is that this means that there is still a lot of work of optimization that could be made.
What would happen if the number of searched attributes is increased from three to five? And what if the number of nodes is taken up to 10.000?
The Chord based system presents a much higher level of dispersion in delivering its precision. Can this be accepted or is it a sign of optimizations that must be improved? This system is also the one that has more possibilities to be optimized and improved.
I suggest a deeper analysis about the composition of queries, the modelling of data, the dynamic behaviour of nodes in the system and other metrics to be used.
Query composition: Processing of queries could be optimized for some or all the subsystems that were evaluated. It should be decided how fair this is as well. Experiments could be run by using different query widths.
Modelling data: Scarce sources of information and studies are available about the dynamics and statistical distributions that govern membership, the data about users of a grid. This is a severe shortcoming for simulating a grid environment. This is also a weakness for the design of future grid systems as this data could be used as basis for the improvement of features like fault tolerance. Data about the distribution of CPUs is not available.
Dynamic behaviour: I hope also that more and more data and studies about resource unavailability will be available. This will make future experiments on discovery of resource more consistent and more useful as input to design of and management of discovery systems. One more model for recovery could be used where say one tenth or more of the nodes always remain connected. Also as  suggests it would be interesting that somebody study how system design choices depend on failure characteristics. This was out of the scope of this work.
Other metrics: The same experiments could be run to measure the service time to queries.
- J. Schopf, I. Raicu, L. Pearlman, et al., Monitoring and discovery in a web service framework: functionality and performance of Globus Toolkit MDS4.
- I. Stoica, R. Morris, et al., Chord: a scalable peer-to-peer lookup service for Internet applications, Proceedings of ACM SIGCOMM 2001.
- A.R. Bharambe, M. Agrawal, S. Seshan, Mercury: Supporting Scalable Multi-Attribute Range Queries, Proc. ACM SIGCOMM 2004, pp. 353-366, 2004.
- S. Bharathi, A. Chervenak, Design of a Scalable Peer-to-Peer Information System Using the GT4 Index Service.
- Ian Chang-Yen, D. Smith Nian-Feng Tzeng, Structured Peer-to-Peer Resource Discovery for Computational Grids, University of Louisiana at Lafayette.
- http://cic.gridops.org .
- M. Cai, M. Frank, J. Chen, P. Szekely, Maan: a multi-attribute addressable network for grid information, 2004.
- M. Marzolla,M. Mordacchini,S. Orlando, Peer-to-peer systems for discovering resources in a dynamic grid, Parallel Computing 33 (2007) 339–358.
- C. Mastroianni, D. Talia and O. Verta, Evaluating Resource Discovery Protocols for Hierarchical and Super-Peer grid Information Systems, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP’07).
- Beth Plale, Resource Information Management in grid Middleware: Evaluation of Multiple Platforms with a Benchmark/Workload, Indiana University.
- C. Schmidt, M. Parashar, Enabling flexible queries with guarantees in P2P systems, IEEE Internet Computing 2004.
- B. Schroeder, G. A. Gibson, A large-scale study of failures in high-performance computing systems, International Conference on Dependable Systems and Networks (DSN 2006), pages 249-258, 2006.
- P.Trunfio, D.Talia, et al., Peer-to-Peer Models for Resource Discovery on grids, CoreGRID Technical Report Number TR-0028, 2006
- R.Alfieri, R. Cecchini et al., From gridmap-file to VOMS: managing authorization in a grid environment, Future Generation Comp. Syst. , Vol. 21 , Nr. 4 (2005)
- D. Spence, T. Harris, Xenosearch: Distributed resource discovery in the xenoserver open platform, HPDC-12: Symposium on High Performance Distributed Computing, IEEE, 2003.
- X. Zhang, J. L. Freschl, J. M. Schopf, Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye, 2005-2007.