Topics in Database Systems: Data Management in Peer-to-Peer Systems Fall Semester 2006 ---------------------------------------------------------------------- last updated 2/12/2005 Topics for Term Projects ------------------------- Below is a list of suggested topics. The rating at the end of the description of each topic is a rough estimation of the required effort: from 5 (most effort) to 0 (no effort). Regarding "implementation", what is required is some form of evaluation of what you propose. I do not expect a thorough theoretical evaluation or implementation. 1. Approximate Associative P2P Overlays Previous research has proposed clustering together nodes based on the premise that peers that have answered previous queries for the peer issuing the query are more likely to have an answer. The authors of [1] introduce the notion of a "guide rule" as a set of peers that satisfy some predicate. Then, the routing of the query is restricted within the guide rules specified by the peer issuing the query. The solution in [1] induces large overheads because participation in a guide rule is determined on a per data item. The goal of this project is to use histograms as suggested in [2] to "approximate" the guide rules, so that participation in a rule is based on a set of items rather than a single item.. [1] E. Cohen, A. Fiat, H. Kaplan. "Associative Search in Peer to Peer Networks: Harnessing Latent Semantics". INFOCOM 03 [2] G. Koloniari, Y. Petrakis, E. Pitoura and T. Tsotsos. "Query Workload-Aware Overlay Construction Using Histograms". CIKM 2005. Well defined Theory: 5 New Ideas (creativity): 4 Amount of Reading Required: 2 Implementation: 3 2. Freshness-based Resource Discovery in P2P Systems The idea is to design a search mechanism that takes into account the time (when) each item is published. For instance, priority may be given to the items that have been published more recently. This may require recording time-related information along with each item and extending queries with the possibility of including time. There may be some relation with publish/subscribe p2p systems. In publish/subscribe systems, users subscribe to resources by posing a query that describe the resource of interest (e.g., by providing a keyword such as "Panathinaikos" or "Olymbiakos"). When a new item is published, the system checks whether there are any users whose subscription matches the new item and if so, it notifies them. The P2P system is used to store the subscription queries. For example, when a user poses a subscription query such as "Panathinaikos" or "Olymbiakos", "Panathinaikos" or "Olymbiakos" are hashed to a DHT node along with the ID of the subscriber. When a new item on "Panathinaikos" or "Olymbiakos" is published, again "Panathinaikos" or "Olymbiakos" is hashed and the IDs of the subscribers (if any) are retrieved. Note that publish/subscribe systems can only notify users for new items. ~defined Theory: 3 New Ideas (creativity): 5 Amount of Reading Required: 3 Implementation: 3 3. Transactions and P2P Does it make sense to enforce some (which?) of the ACID properties in a p2p setting. Which of the transaction-related protocols (2-phase locking, 2-phase commit, etc) have a p2p counterpart? There is some related work [3, 4] in the context of grid/cluster transactions. This project requires background knowledge on transaction management in database systems. [3] C. Turker, K. Haller, C. Schuler, H-J. Schek. "How can we support Grid Transactions? Towards Peer-to-Peer Transaction Processing". CIDR 2005 [4] K. Haller, H. Schuldt, C. Turker. "Decentralized Coordination of Transactional Processes in Peer-to-Peer Environments." CIKM05 ~defined Theory: 3 New Ideas (creativity): 5 Amount of Reading Required: 2 Implementation: 3 3. Deploying a P2P System on PlanetLab We have recently become members of PlanetLab [5]: a consortium of world-wide institutions offering and using resources (computers) for implementing and evaluating wide-area experiments in distributed computing. The goal is to deploy a P2P system on top of PlanetLab. You can choose any, structured or unstructured, P2P system. Currently Bamboo [6] (a DHT p2p system) is running on PlanetLab, so you may use it to build a small instance of a p2p file sharing application. [5] Planet Lab web site. http://www.planet-lab.org/ [6] Bamboo site. http://bamboo-dht.org/ ~defined Theory: 1 New Ideas (creativity): 2 Amount of Reading Required: 2 (basically, manuals) Implementation: 5 4. Skyline (or other advanced types of) Queries on P2P The skyline of a set of d-dimensional points contains the points that are not dominated by any other point on all dimensions. Skyline queries in relational database systems were introduced in [7]. There are important in the case of multi-criteria optimization, for example, when selecting a hotel for booking based on its price, distance from the city center, etc. There has been some very recent work on implementing skyline queries in unstructured p2p systems [8] and constrained skyline queries on top of CAN [9]. As a starting point, you may consider how they can be implemented in CHORD, consider unconstrained skyline queries in CAN, or new ways to implement them in unstructured p2p systems. [7] S. Borzsonyi, D. Kossmann, K. Stocker. "The Skyline Operator." ICDE 2001 [8] K. Hose: Processing Skyline Queries in P2P Systems, VLDB 2005 PhD Workshop, 2005 [9] P. Wu, C. Zhang, Y. Feng, B. Y.Zhao, D. Agrawal and A. El Abbadi. "Parallelizing Skyline Queries for Scalable Distribution" EDBT 2006 ~defined Theory: 3 New Ideas (creativity): 4 Amount of Reading Required: 3 Implementation: 3 ** NEW ** 5. Reputation and Trust Management in P2P Survey. You will read a number of papers and write an article summarizing them. Well defined Theory: 2 New Ideas (creativity): 3 Amount of Reading Required: 5 Implementation: 0 [10] Roger Dingledine, Nick Mathewson, and Paul Syverson. Reputation in P2P Anonymity Systems. Workshop on Economics of Peer-to-Peer Systems. June 2003. Berkeley, CA. http://www.freehaven.net/doc/econp2p03/econp2p03.pdf [11] E Damiani, SDC Vimercati, S Paraboschi, P Samarati A reputation-based approach for choosing reliable resources in peer-to-peer networks ACM Conference on Computer and Communications Security, 2002 [12] K Aberer, Z Despotovic. Managing Trust in a Peer-2-Peer Information System. CIKM 2001 ** NEW ** 6. Incentives for Sharing in P2P Survey. You will read a number of papers and write an article summarizing them. Well defined Theory: 2 New Ideas (creativity): 3 Amount of Reading Required: 5 Implementation: 0 [13] Tsuen-Wan Ngan, Dan Wallach, Peter Druschel. Enforcing Fair Sharing of Peer-to-Peer Resources. Proceedings of the Second International Workshop on Peer-to-Peer Systems (IPTPS '03). February 2003. Berkeley, CA. http://www.cs.rice.edu/~dwallach/pub/iptps2003.html [14] Karthik Tamilmani, Vinay Pai, and Alexander E. Mohr. SWIFT: A System With Incentives For Trading. Proceedings of the 2nd Workshop on the Economics of Peer-to-peer Systems. June 2004. Cambridge, MA. http://mnl.cs.sunysb.edu/home/vinay/papers/swift-p2pecon.pdf [15] Incentives Build Robustness in BitTorrent B Cohen. Workshop on Economics of Peer-to-Peer Systems, 2003 7. **NEW** Publish/Subscribe in P2P Survey. You will read a number of papers and write an article summarizing them. Well defined Theory: 2 New Ideas (creativity): 3 Amount of Reading Required: 5 Implementation: 0 [16] W Terpstra, S Behnel, L Fiege, A Zeidler, A Peer-to-Peer Approach to Content-Based Publish/Subscribe DEBS 2003 [17] P Triantafillou, I Aekaterinidis Publish-Subscribe Over Structured P2P Networks DEBS 2004 *. Evaluation of Existing P2P Systems Choose at least 2 popular P2P systems and design and implement a number of tests for evaluating them. The evaluation criteria may include for example the coverage (for instance, the percentage of the queries you posed that were satisfied), response time, easy of use, security, etc. Well defined Theory: 1 New Ideas (creativity): 3 Amount of Reading Required: 2 Implementation: 4 8 - Other You may suggest your own project. ==++===============================================================++++= Milestones: Nov 30: Form teams (up to 3 people) and select topic Dec 7: 2-3 pages project proposal (details later) xxx : Intermediate presentation (optional) Jan 27: Report (up to 5000 words, format of a research paper (e.g., abstract, introduction and conclusion sections, related work section - more details later) last week of Jan: Workshop (15-25' talk per team, depending on the number of teams)