diff --git a/3DM_RFC/RIW2021_RFC_Broker-based Content Discovery for Closed Content Ecosystems.md b/3DM_RFC/RIW2021_RFC_Broker-based Content Discovery for Closed Content Ecosystems.md new file mode 100644 index 0000000..260f858 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Broker-based Content Discovery for Closed Content Ecosystems.md @@ -0,0 +1,96 @@ +# RIW2021 | RFC: Broker-based Content Discovery for Closed Content Ecosystems + +_Status:_ < draft > + +_Area of Improvement:_ Distribution Graph Forming + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +This RFC targets content discovery, resolution and delivery (for closed content ecosystems, e.g., Spotify), where content cannot be consumed outside the application. The proposal is based on the fact that in the majority of cases content is linked to some “context”. This can be, for example, a video linked from a webpage. Metadata attached to the aforementioned video link point to the broker that either has a copy of the content, or knows where to find it. + + +### Proposal/Construction + +Assumptions of the proposal include: +- Users find CIDs through links in websites or apps. +- Content publishers play the role of the tracker. They know the Provider node with which they have a deal with. They tell their users to go find the content there. +- Content publishers can outsource this operation to independent brokers. For instance, the broker for my website is the hosting company. We call content publishers brokers in the following, for simplicity. +- The links resemble bittorrent-like magnet-links, which include the CID, the content publisher, or broker, as well as metadata. Metadata is a very central point in this proposal. What is included in the metadata is something to be defined, but could be the map of brokers storing the content, or a semantically meaningful name. Metadata should help with discovery. Metadata generally help finding seeders. + +The architecture consists of: +1. end-users requesting content, +2. Provider nodes that store content +3. Brokers that know which Provider stores which content/CID +4. Websites that link to content, through a CID and metadata that help with content discovery (e.g., provide a map of brokers that know about this CID) +5. A DHT, where Providers publish provider records. The DHT is consulted by the brokers in regular time-intervals to get updated information. + +The main content retrieval operation is as follows (see also figure): +1. Client finds “magnet link” from website (clicks) +2. Link returns the broker’s address +3. Client requests CID from broker +4. Broker returns Provider's address (alternative: broker forwards request to Provider directly) +5. Client requests CID from Provider (alternative: Provider delivers content to client) +6. Provider delivers content to client + + + +Things that happen in the background to support the above sequence are: +1. There is a DHT where Providers publish provider records for the content they store +2. Brokers ask the DHT for the providers of the CIDs they’re keeping an index of in regular time intervals, but off band (i.e., not in real-time when they receive the request from clients). This ensures that they have up-to-date information and the request does not have to wait for the slow DHT walk in order to complete. +3. In case content becomes popular at some other network/geographic area, the local Provider in that area should fetch the content and let the broker know, so that he promotes them too. It is not clear how the Provider that wants to replicate content will know which broker to update. A potential solution is for this information to be included in the metadata of the magnet link so that a new Provider can get it from there. In this case, there is an issue when the publisher decides to change broker. +4. The fetching of content by the new Provider discussed above should be supported by the economic model: the original Provider will want to satisfy all requests themselves in order to get all the profit. However, we should incentivise replication of content in order to achieve lower delivery latency. + +### Impact + +This is a stable and simple design for closed content ecosystems, which applies to a large proportion of applications we use today (e.g., spotify, netflix), where content is consumed from within an application (or perhaps website) and the content publisher has complete control over the content (what is published, where it is stored etc.). + +### Pros and Cons + +Pros +- Brokers always have up-to-date information. The fact that clients get links/CIDs through websites or applications means that a broker is never (under normal circumstances) going to be asked for a link/CID they don’t know. +- It is a clever way to cover a large number of use-cases in a safe manner, i.e., without content discovery failures. +- It can be as fast as DNS. +- Simple design + +Cons +- The system seems to break if the content publisher decides to move their content to a different broker, since it is very difficult/impossible to update all websites, forums, or even browser bookmarks that include the magnet link. This will result in content discovery failure. A potential solution is for the user to go the slow path and ask the DHT. However, the above will not be an issue in “closed content ecosystems”, such as Spotify where you can’t consume content outside the application and the publisher has complete control over the content through the app. +- It is not clear how the Provider that wants to replicate content will know which broker to update. A potential solution is for this information to be included in the metadata of the magnet link so that a new Provider can get it from there. In this case, there is an issue when the publisher decides to change broker. +- An issue that arises is how can we update the magnet link with the list of brokers. Ex.: I don’t want to always be asking a broker very far away to let me know of Providers near me to fetch the content. How can the system assign a new broker near me to keep track of Providers near me that store the content? To my understanding this cannot happen automatically and the only way to do it is by the content publisher doing it manually. + +Other notes: +- Is the scheme giving too much power to the brokers? +- How can we avoid giant broker monopolies and centralisation? +- Who is paying the broker? +- There could be a “finder’s fee” for the broker +- How do you make sure that a broker is not giving you wrong content. +- It seems that by having the broker as a very central entity that is responsible for the content discovery, the system loses the benefits of content addressing. Ex.: the system does not seem to be able to track a user near me that has the content and route my request to them instead of going to the content provider that is far away. + +### Implementation notes + +The design can step on existing protocols within the IPFS ecosystem. For example: + +- The DHT structure -> libp2p DHT +- When connecting the client to the list of Provider nodes, the broker can use bitswap +- Provider updates (i.e., when a new Provider node stores a hot copy of a content) can propagate to brokers through libp2p’s pubsub/gossipsub. + +### Evaluation + +**Qualitative** +- We should make sure the scheme does not result in content discovery failure. It does not seem to be the case for closed content ecosystems, but need to double-check. +- For more general content consumption we should have a backup solution. This could be that when a content publisher changes broker and therefore the metadata of CIDs in websites do not point to the right broker anymore, the request ends up going through the DHT. This can become the default way in the long run when content has moved between several different brokers. +- We should find a solution for the case where new brokers are added to the list by the content provider. What is the impact on the system? How can we update the list of brokers? + +**Quantitative** +- Taking into account connection establishment, how long does it take to resolve content and start the delivery? +- How efficient is the proposal in finding the nearest replica in Providers closest to the client that has decided to keep a hot copy? +- Can we integrate “Transient Provider nodes” (i.e., normal end-user peers) in the scheme? + + +### Prior work +Related work where the proposal takes inspiration from. diff --git a/3DM_RFC/RIW2021_RFC_Consume Global, Serve Local.md b/3DM_RFC/RIW2021_RFC_Consume Global, Serve Local.md new file mode 100644 index 0000000..195b086 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Consume Global, Serve Local.md @@ -0,0 +1,95 @@ +# RIW2021 | RFC: Consume Global, Serve Local + +_Status:_ < draft > + +_Area of Improvement:_ < Opportunistic Deployments > + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +This RFC addresses the use case of heavy content distribution (ideally video) in local neighborhood environments, where the Provider node is connected to residential users. The content catalogue of a service (say Netflix or the national broadcaster) is split in chunks and stored in end-user home devices. The Provider is connected to a critical mass of tens of thousands of local users and redirects requests to serve content locally. + +### Setup & Assumptions + +- Picture the environment where the ISP’s DSLAM/GPON is connected to ~20k-40k houses. Now, replace the ISP’s DSLAM/GPON with the Provider node's gear +- Consider a content publisher (e.g., Netflix, or local national broadcaster) who has a video content catalogue. +- The videos are split in chunks and stored in the residential premises. This can be in user devices (laptops, desktops), wifi APs or set top boxes. +- Depending on what user devices we consider, the connections can be quite stable, but we still consider the environment opportunistic, as users might disappear, stop serving content, or unplug things from mains. +- The Provider would need to have random access to the content, content would need to be in small chunks so that it is served from several users and avoid saturating one user’s uplink. Erasure coding can help with collecting pieces from several peers. + + +### Construction + +1. Clients request content to their local Provider by CID. +2. The Provider keeps track of which users store what content (identified by CID). +3. The Provider redirects requests to the users that store the requested CID +4. Users return content to the client through the Provider. +5. Given that the Provider sees all the content flowing through itself, it can judge whether the user’s uplink is saturated and redirect to other users (if available), or apply DASH to reduce the rate. + - If the users are within WiFi proximity of each other, they can connect directly. + - The transfer can avoid going through the overlay Provider and connect directly through the ISP’s infrastructure, but in this case, the Provider will not be able to monitor, adjust rate and learn about new peers that have replicated the content. It is not clear how much faster the direct connection can be. +6. If neighbours can support the requesting user’s HD, then all good, but if not, the client connects back to the main server or higher-tier Provider. This is monitored by the Provider who can make the decision whether to continue streaming by the local peer, or go to a higher-tier Provider. +7. The local Provider can keep local copies of very popular content. + - When edge node saturation reaches “critical mass”, the Provider node no longer needs to store if kept in higher order cache (ie: Europe -> Germany -> Neighborhood/City) + - The Provider can assess rate of delivery, and then determine better options + - Ie: netflix dropping quality from 1080p to 480p for faster delivery + + + +- Both Providers and end-users serving content should be rewarded and therefore supported in the cryptoeconomic model. +- What if peers cut the Provider out of the loop? + - They will likely experience reduced quality, as the Provider is not able to monitor and adjust + - [Wi-Stitch](https://dl.acm.org/doi/10.1145/3098208.3098211) paper is related. + +### Pros and Cons + +Pros: +- Product-oriented solution that is easily implemented in a content-addressable network +- Provides huge bandwidth savings and benefits to ISPs +- Decreases load and delivery times for end-users +- Decreases storage and delivery costs for the content publisher + +Cons: +- Privacy issues not solved in current RFC +- Upload bandwidth saturation needs to be avoided and a smart algorithm plus evaluation needed for that +- Unclear where the Provider node should be located in the network topology + +### Implementation notes + +- Metering and trust model + - We can consider a model where the original content publisher (e.g., netflix) gets notified by the application itself, when the user clicks ‘play’. This helps a lot with accounting. + - How about accountability and the fact that the Provider or other content publishers/caches can send random bits? The original publisher could provide a key to decrypt the content, or check the hash of the whole content, or some random parts of it or some combination. + - QoS can be checked by the client-side media player. The above technique with “surprise checks” can also be done to check if content is delivered timely. + +- Connectivity and bandwidth + - Clients connect to a provider, who has deployed a cluster of nodes and the question is how do clients get connected to their neighbours. + - Through the Provider who is connected to everyone. + - Preconfiguration of local peers with the local Provider node. + - How can app developers control the rate? How does the switchover happen, when my neighbours’ bandwidth is not enough to serve HD video requests? + +- Privacy + - Opportunistic setups are not privacy-friendly, generally speaking. Neighbours can see what one is requesting. + - There are simple encryption techniques that are not super strong, but can make things better. + - Another idea is to hash the hash of the content, as has been discussed for IPFS denylists. + +### Evaluation + +- Upload capacity of users should not be saturated. Evaluation is needed to identify at which point QoS suffers +- Assess erasure coding based approaches and their benefits to avoid saturation of uplink +- Time to First Byte compared to a traditional CDN setup + + +### Prior work + +- Suh, K., Diot, C., Kurose, J., Massoulie, L., Neumann, C., Towsley, D., Varvello, M., [Push-to-peervideo-on-demand system: Design and evaluation](https://ieeexplore.ieee.org/abstract/document/4395129?casa_token=QVwdZGNLekoAAAAA:WIbAVHxdIRvKRj1AY6AsamLcdvJwNSTP6oglwYLb1iRjg_QxxlivFdmUFjHQ7PjLEar9hUahFQ), IEEE Journal on Selected Areas in Communica-tions, 25(9), 2007. +- [WiStich](https://dl.acm.org/doi/10.1145/3098208.3098211) +- Amiri, M.M., Gündüz, D., [Fundamental limits of coded caching: Improved delivery rate-cache capacitytradeoff](https://ieeexplore.ieee.org/abstract/document/7782423?casa_token=nq0PcmikCKAAAAAA:kAgWYU8gjIm2wiIRgrbOHquPzsfcCGBuX4pgqzZ97rWYq0aSwjSlFYNzyOnFyVxPchj134lcGQ), IEEE Transactions on Communications, 65(2):806–815, 2017. +- Anjum, N., Karamshuk, D., Shikh-Bahaei, M., Sastry, N., [Survey on peer-assisted content delivery networks](https://www.sciencedirect.com/science/article/pii/S1389128617300464), Computer Networks, 116:79–95, 2017. +- Shanmugam, K., Golrezaei, N., Dimakis, A.G., Molisch, A.F., Caire, G., [Femtocaching: Wireless content delivery through distributed caching helpers](https://ieeexplore.ieee.org/abstract/document/6600983?casa_token=kLqwrjL-QDAAAAAA:kBs_OkhEFTMoByEoCKXqUM_EKojv_GgV27VC2hdwz7YlLl3T6ggjDAR-G-D1Zx3MYREzIIFdpg), IEEE Transactions on Information Theory,59(12):8402–8413, 2013 + + + diff --git a/3DM_RFC/RIW2021_RFC_Credit-based Retrieval Network.md b/3DM_RFC/RIW2021_RFC_Credit-based Retrieval Network.md new file mode 100644 index 0000000..34a0a47 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Credit-based Retrieval Network.md @@ -0,0 +1,248 @@ + + +# RIW2021|RFC: Credit-based Retrieval Network + +_Status:_ **draft**; **~~ready for review~~**; **~~ready to publish~~ ** + +_Area of Improvement:_ Cryptoeconomics + +_Estimated Effort Needed:_ <?> + +_Prerequisite(s):_ <?> + +_Priority:_ <? P0, P1, P2> + + +### Abstract + +This RFC proposes an economic model for retrieval networks based on a trustless credit network. Clients and content publishers exchange credits backed by an escrow deposit. These credits are used to pay retrieval providers for their services. For providers to mint their reward they need to commit the received credits on-chain. The total reward is proportional to the number of credits they hold. + + +### **Proposal/Construction** + +This RFC consider the following participants in the system: + + + +* Clients that retrieve content by CID from retrieval service. They may be charged with a monthly fee, or not charged at all for the services they use. +* Publishers store content on the Filecoin (or IPFS) network, and register it with the retrieval market for a per-CID fee. +* Storage miners (alternatively, the publishers) keep cold copies of the content. +* Profit maximizing (retrieval) providers who are paid in proportion to their services and represent the basic infrastructure of the retrieval network. + +The RFC proposes the design of a credit network for fast content trade between retrieval providers. The network uses an on-chain smart contract to keep track of participants' escrows and for settlement purposes. + +The retrieval service is organized in consecutive global sessions of fixed duration. E.g., each calendar day is one session. (Each session is a smart contract, which is itself subordinate to a “master” contract which manages the succession of sessions.) + +Retrievals in this RFC are organized in three stages: + + + +* **Session setup:** Specifies the content to be served through the retrieval network and performs all on-chain and off-chain setups required to make the session content available for retrieval (i.e. “hot and ready”). + * To serve content through the network, publishers need to store this content in Filecoin (or IPFS) or be willing to provide cold copies (with the right encoding) themselves. + * In order to join a session, the publisher pays a fee to the session’s contract; pays the retrieval fee in the contract; and advertises the CID and the location of the cold copy (e.g. miner and sector or address of publisher). + * Providers create escrows of desired sizes (which depend on their expected traffic imbalance). These escrows back an independent credit system for payments between providers, and are entirely independent of the session contracts. Escrows can be refreshed at any time as needed. + * Once the session has been established, providers download the list of authorized clients, as well as the cold storage locations of the participating content CIDs. +* **Content delivery: **Providers deliver content to clients and to other providers, upon request. Clients pay providers (for content) using client-credits (which represent shares of session fees), whereas providers pay providers using provider-credits (which represent hard currency, e.g. FIL, stored in the provider escrows). + * Clients request content from a chosen provider by CID. + * The provider authenticates the client (needed only if clients are required to pay for service). + * The provider looks queries the retrieval network for cached copies of the CID (leveraging the graph forming infrastructure in place, i.e. DHT, gossipsub, NDN, etc.). + * If the content is cached by other providers, the provider may pick the closest one and buy content, paying with provider-credits. Otherwise, if there is no hot copy of the content in the retrieval network, the content must be bought from cold storage (using FIL) for its subsequent caching (or downloaded from the publisher themselves). After a hot copy of the content is retrieved, the content is forward to the client. The client pays for the content using client-credits. +* **Settlement: **As a result of delivering content, providers accumulate client-credits and provider-credits. Both of these are redeemed for value (e.g. a hard currency, like FIL) during settlement after the end of a session. + * Provider-credits are redeemed for the value they correspond to, based on the escrows they are backed by. Note that provider credits correspond to value one-for-one, as they are entirely based on a conventional credit system (as described in the relevant papers). + * Client-credits are different. A client-credit corresponds to a share from the “session revenue”, which is the sum of all fees paid by all clients and all publishers (this does not include the escrows paid by providers!). + * Revenue distribution is computed after the end of the session period as follows: + * Revenue = client fees + publisher fees. + * + * Provider revenue share = number of client-credits collected by provider / total number of client-credits + * Provider revenue = total revenue * provider revenue share + + + +

>>>>> gd2md-html alert: inline image link here (to images/image1.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

+ + +![alt_text](images/image1.png "image_tooltip") + + + +### Settlement speed optimization + +Every chain-based system that settles off-chain tokens (in our case client-credits and provider-credits) for on-chain value requires that the tokens be uploaded to the chain. In our application, the number of tokens is extremely large (it roughly equals the number of requests in a day) making it impractical to upload all of them to the chain. To solve this problem we utilize a “lottery” technique (described in the [MicroCash paper](https://arxiv.org/abs/1911.08520)). + +This technique is generic: It is independent of how value is assigned to a token. Thus, in our case, it can be used for both client-credits and provider-credits, separately. + +Here is how the lottery algorithm works in short. + +Suppose T is an off-chain token, held by party P, that must be redeemed for on-chain value. + +Denote by V(T) the value assigned to ticket T by the application logic. + +The protocol for redeeming T is as follows: + + + +1. Determine if T is a “winning” token. Each token is “winning” independently with probability W, say 5%, based on a common public randomness source. +2. If T is winning, P uploads T to the chain and receives value V(T)/P for it, +3. If T is not winning, nothing transpires. + +What is notable about this protocol: + + + +* Only a fraction W, in our example 5%, of the tokens are uploaded to the chain +* The expected value of each token is V(T). + +When a large number of tokens is redeemed, the actual value received by a participant is almost exactly equal to the expected value. (This is a simple consequence of the [Chernoff bound](https://en.wikipedia.org/wiki/Chernoff_bound), and is proved in the MicroCash paper.) + + +### Design discussion + +This RFC is designed to showcase two different payment models for content: + + + +* Clients pay fixed per-session fees, +* Providers pay per-CID fees. + +These two models are implemented, respectively, by: + + + +* Client-credits, which represent share from the revenue from fees, and +* Provider-credits, which are tethered to value (e.g. a hard currency) through escrows. + +A simplified version of this design can be obtained by removing the client role (and the client-credit logic). This results in a retrieval market where all participants assume the “provider” role and pay per CID. Note that being a “provider” does not obligate you to cache and sell content, thus real-world clients can use the “provider” role as well. + +The upshot of the simplified design is that it is a stepping stone towards the complete design, which adds time-based subscription models. **We advise that a first implementation attempt targets the simplified design.** + + +### Technical details + + +#### Credit systems and double spending + +Our design is agnostic to the type of credit lines used. + +There are two types of credit systems in the literature, which differ in the type of the credit constraint: + + + +* In a _bilateral_ credit system a credit line pertains to one borrower and one lender +* In a _multilateral_ credit system a credit line pertains to (one or) multiple borrowers and (one or) multiple lenders + +For instance, the traditional credit card system implements a multilateral credit system, where the credit of one borrower (the customer) pertains to all lenders (the merchants of goods). + +For instance, in this RFC, in the provider-credit system, providers are both borrowers and lenders. + + + +* If provider-credit is bilateral, each provider will need an individual credit line (and therefore escrow) for each other provider they intend to do business with. +* If provider-credit is multilateral, each provider will need a single credit line (backed by a single escrow) for all trades regardless of counterparty. + +Multilateral credit is strictly more flexible than bilateral credit. However, there is a trade off. Decentralized implementations of bilateral credit can prevent double-spending (in real time), whereas multilateral implementations detect double-spending at settlement time. + +In summary: + + + + + + + + + + + + + + + + + + +
+ Bilateral + Multilateral +
Double-spending in decentralized setting + Prevented + Detected at settlement +
User friendliness + Participants can trade directly with a small set of chosen counterparties + Any pair of participants can trade +
+ + +**Our recommendation** is to use a multilateral credit system and address detected double-spending by “freezing” the account of the offender for a duration of time, proportional to the overspent amount. (The approach taken by the US Bankruptcy system.) Since penalties are assigned to on-chain accounts, for the former to be effective, accounts must be Sybil-resistant. This is addressed in the next section. + + +#### Sybil-resistant accounts for penalty attribution + +Each participant in the retrieval market is an “account” on-chain, identified by a public key. Account identities are used to authenticate parties in a trade, as well as to assign penalties when double spending is detected. The state of an account (e.g. outstanding penalties) is stored on-chain for everyone to see. + +Users may be tempted to create new accounts, when old ones are penalized, thereby performing a Sybil attack. To counter this, we utilize a mechanism that provides higher quality of service to accounts that have successfully completed a larger number of trades in the past, thereby introducing a significant cost to abandoning an existing account (e.g. to avoid penalties). This is accomplished with the following protocol: + + + +* The chain stores the total volume of successfully settled past transactions for every account, and +* Providers prioritize requests, based on the historical volume of the requesting account. + + +#### Collusion attacks + +Collusion opportunities depend on the payment model. Thus per-CID and per-session fees are analyzed separately. Both are collusion-resistant: + + + +* Per-CID payments are collusion-resistant because content is directly exchanged for value, so everyone’s wealth is preserved after each trade. +* Per-session payments (where client-credits correspond to a share of the deposited fee) are collusion-resistant, because (by design) all credits issued by a single client represent shares only of that client’s deposited fees. + +**Impact** + +The RFC proposes a reward model that can be embedded to any of the design proposals designed for the rest of the components of the system. In particular, it does not address fair-exchange, and instead assumes that a fair-exchange protocol (for exchanging credits for content) is given. + +It can be easily pluggable into any design and imposes no requirements to the rest of the components. It enables for the coexistence of several economic models over the RFC’s infrastructure. + + +### Pros and Cons + +**Pros** + + + +* Provides high-bandwidth payments, using a low-bandwidth chain. (The credit network is a “layer 2” infrastructure.) +* Supports multiple payment models. + +**Cons** + + + +* Assumes a fair-exchange protocol. + + +### **Evaluation** + +**Qualitative** + + + +* The RFC increases the size of the retrieval market (demand for content retrieval) in the Filecoin network. +* The RFC provides high-liquidity: Given a fixed supply and demand, the market maximizes the number of matches. + +**Quantitative** + + + +* End-user latency and bandwidth are sufficient for good UI experience (e.g. snappy browsing). +* Simulation analysis of the full economic model to understand the Nash Equilibrium. + * Is there an equilibrium? + * Is the equilibrium fair? +* End-user overhead of retrieval. + + +### Prior work + + + +* [MicroCash: Practical Concurrent Processing of Micropayments](https://arxiv.org/abs/1911.08520) +* [Liquidity in Credit Networks: A Little Trust Goes a Long Way](https://arxiv.org/abs/1007.0515) +* [Liquidity in Credit Networks with Constrained Agents](https://arxiv.org/abs/1910.02194) diff --git a/3DM_RFC/RIW2021_RFC_Hybrid CDN with Recruiter Providers.md b/3DM_RFC/RIW2021_RFC_Hybrid CDN with Recruiter Providers.md new file mode 100644 index 0000000..daa4e4c --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Hybrid CDN with Recruiter Providers.md @@ -0,0 +1,61 @@ +# RIW2021 | RFC: Hybrid CDN with Recruiter Providers + +_Status:_ < draft > + +_Area of Improvement:_ < Opportunistic Deployments > + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +Provider nodes recruit storage from local, ephemeral devices (laptops and desktops) to increase their storage footprint. The target is not to save on aggregate storage or bandwidth of the RM - in some cases the RM might have to use more bandwidth if they act as relays. The target is to discover local content, where available and serve it locally. The proposal also has the potential to save on concentrated bandwidth and request-handling load. + +This proposal is similar to the [“Consume Global, Serve Local”](https://github.com/protocol/ResNetLab/pull/26) RFC. + +### Construction + +We consider a Hybrid-CDN-like architecture, where Providers act as the centralized controllers orchestrating all the devices in their surroundings (or directly connected to them). When peer A is looking for some content, it sends the request to its local Provider. The Provider answers with a list of peers storing the content. These peers may be other RMs, or opportunistic deployments that RM knows are near peer A. Peer A then requests the retrieval of the chunks of content to any of these peers. A Bitswap-like protocol may be used for these retrievals. Instead of directly sending the request to a local Provider node, peer A can send in parallel requests to devices in their local area network. Additionally, in the RM response, there is always some fallback Provider with a hot copy of the content for the case when the opportunistic deployments storing the copy are not connected anymore. + +Devices can build resource reputation by testing QoS specifying different permutations of the chunks to each peer, so as to optimize expected time to completion. + +Ideally, nodes would be opportunistically trading data (and micropayments) with their spatial neighbors in a way that maximizes their EV (of reselling data and of coins) +We may need a way to do offline micropayments. + +**Open Questions** + +- We may need a way to do offline micropayments for the cases where devices are only locally connected to each other. +- Should we be concerned about privacy? +- How is the model affected by device mobility and churn? + +### Impact + +Recruiting every-day user devices to serve in the 3DM is a big win as we really achieve the goal of low-resource devices participating, contributing and being rewarded by the network. The impact can be huge. + +### Pros and Cons + +Pros: +- The idea of having everyday devices participating in the network is super impactful and interesting +- Performance and capacity of the network will increase dramatically, if users are given the right incentives + +Cons: +- Privacy may become an issue +- Limitations such as “only requested content is cached to a node’s storage” need to be lifted in order for the scheme to scale and be successful. +- The cryptoeconomic model is challenging. +- NAT traversal needs to have a solution for the scheme to be viable. A simple solution is for “recruited” devices to keep a permanent connection open with the client + +### Implementation notes + +- Privacy needs to be looked at +- NAT traversal needs to have a solution for the scheme to be viable. A simple solution is for “recruited” devices to keep a permanent connection open with the client + +### EValuation + +TBA + +### Prior Work + +TBA diff --git a/3DM_RFC/RIW2021_RFC_Incentive alignment for collaborative routing.md b/3DM_RFC/RIW2021_RFC_Incentive alignment for collaborative routing.md new file mode 100644 index 0000000..eae1a9b --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Incentive alignment for collaborative routing.md @@ -0,0 +1,84 @@ +# RIW2021 | RFC: Incentive alignment for collaborative routing + +_Status:_ draft + +_Area of Improvement:_ Distribution Graph Forming + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +This proposal assumes that Provider nodes participate in the routing of requests in the network by creating paths from client to provider (or cache). The proposal builds (or rather suggests) an economic model to enable and incentivize Provider nodes to autonomously (but cooperatively) find solutions to improve network routing performance. For instance, all routers that forward a request to its destination get a portion of the reward back. + +### Proposal/Construction + +The core tenet of this proposal is that “if we have aligned incentives, good routing will follow naturally”. The ultimate target is to build a trustless mechanism to reach consensus on some appropriate global metric for network performance, and then everyone gets a “dividend” (from a block reward subsidy pool, for example, or out of a tax on transaction fees/value) proportional to that metric. + +A brief description of the construction goes as follows: + +- Routers keep routing information about how to route content. This is not defined in this proposal. +- Every client wanting to retrieve content from the Filecoin network is connected to a Provider nodes or router (terms used interchangeably here). +- When a router receives a request it is in their best interest to forward it to the right place as they get a reward if the content transfer is successful. + - This also incentivises them to learn and keep as much routing information as possible. + - How routing information propagates in the network and populates the routers’ routing tables is not discussed here. +- The routing and forwarding of requests should not include any cryptographic operations (or include only minimal) in order to be as fast as regular data transfers. +- When the content transfer completes, the path of routers that have participated in the request routing and the content fetching gets a part of the reward. +- The economic model should be such that routers are not incentivised to collude or cheat. Ideally, two colluding routers should not be able to get more than the aggregate of the reward they would get if they behaved honestly. +- The return per router could be the same for all routers along the path, or it could have different weights depending on several factors, such as: + - degree centrality of nodes, with higher-degree nodes getting more reward, because likely the correct path to the content was found because this node had the most contacts/routing info. This incentivises routers to build as large routing tables as possible and keep many connections open, so eventually the network might become a full mesh with all nodes having roughly similar reward weight. + - the latest nodes on the path get more reward and then the reward declines linearly as we get closer to the client. This incentivises routers to cache and serve the content locally. + + +### Open Questions + +- How can you trustlessly evaluate global network performance? + - Hard problem, but not necessarily harder than trustlessly evaluating performance pairwise between peers +- Will spam be a problem? + - Possibly, in the same way as committed capacity in the Filecoin storage network. But it does not yield outside benefit to the spammer and spamming would come with traffic cost. +- How do you punish those misbehaving? + - Misbehaving is self-punishing; should be irrational to misbehave because there’s no individual advantage to gaming the metrics + - At least assuming an honest majority. Might not work if everyone cheats. + - Not unlike a traditional consensus problem with minimum security threshold + - Maybe some analogue/lessons from EIP1559? The more the providers that join the collusion, the better it is to be outside. + - Could this work under an optional trust model? + - Network voting on good vs bad list; people on the good list could skip proofs. The whole network benefits from people being in the good list, so there is no incentive to “badlist” people unnecessarily. Self-destruct switch: if someone in the good list misbehaves, no one makes money. + - Or perhaps staking on good votes + slashing if they turn out to be bad +- Do downloaders also get rewarded? + - Part of the fair exchange + +- Can we get cache benefits out of symmetric routing? + - If data is encrypted to the end-user (requester), this isn’t possible. We’d need an encryption scheme that allowed intermediary access. + - One solution would be for intermediaries to pose as clients, forge a deal for the data, and then offer a deal to the requester (basically MITM the transaction). + +### Impact + +The impact of this proposal can be quite significant because it can solve two problems at once: the best routing protocol will be built naturally due to incentives and the router reward is also allocated according to the contribution of a router to the network’s routing. + +### Pros and Cons + +Pros +- It can solve two problems at once +- Given that routers participate in the resolution of content (i.e., name-based routing), the scheme can provide very fast resolution and delivery times. +- The scheme enables on-path caching, which can both reduce delivery time and save bandwidth for popular content. + +Cons +- The approach is really interesting, but there are still lots to be figured out wrt both the economic constructions and the routing protocols. +- Might be a little complicated. + +### Implementation notes + +TBA + +### Evaluation + +Verify that colluding routers on a path cannot get more reward than the aggregate of the reward they would receive if they behaved honestly. +Define the best reward split algorithm across the path of routers: what makes more sense: evenly split, or based on some weight function? Should popularity of content play a role here? +Confirm that the routing protocol coming out of this construction is as fast as traditional shortest path routing and achieves nearest replica routing (i.e., find the closest copy of the content). + +### Prior work + +The [“Proof of Prestige”](https://www.ee.ucl.ac.uk/~ipsaras/files/Proof_of_Prestige-icbc19.pdf) approach can be very helpful here. It proves that colluding is avoided and the “progressive mining” operation proposed is very close to building the path from source to destination. diff --git a/3DM_RFC/RIW2021_RFC_Name-based Request Forwarding for Nearest Replica Routing.md b/3DM_RFC/RIW2021_RFC_Name-based Request Forwarding for Nearest Replica Routing.md new file mode 100644 index 0000000..2a0064d --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Name-based Request Forwarding for Nearest Replica Routing.md @@ -0,0 +1,76 @@ +# RIW2021 | RFC: Name-based Request Forwarding for Nearest Replica Routing + +_Status:_ < draft > + +_Area of Improvement:_ < Distribution Graph Forming > + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + + +### Abstract + + +This RFC assumes that Provider nodes (i.e., those that store hot copies of content in the network) act as routers for requests. It also assumes that every client is connected to one or more Provider nodes to which they forward their requests. Content is explicitly named by CID. Every Provider declares and serves some network prefix. Progressively, these prefixes form the topology. Content items are initially pegged to the network location of their publisher, e.g., /de/berlin/my-cid, but nothing prevents other Providers from serving content with any name, achieving nearest-replica routing. + +### Proposal/Construction + +- A Provider is seen as an entity that covers some geographic area, somewhat similar to a DNS server. It is the entry point of the clients to the Filecoin network. + - More than one Provider can serve some geographic area, either split between the Providers, or in some hierarchical manner, where the initial Provider is a higher-tier Provider node for the rest of the Providers. + - Example: the first Provider node deployed in Germany declares the prefix “/de” as their service area. Other Providers joining the system can then start serving “/de/berlin/”, “/de/munich/” etc. +- A client should ideally be connected to more than one Provider node. +- Every Provider keeps a record of: i) all the content they store locally, ii) the content items they have heard about in terms of their CID, as well as iii) “where did they hear the content from” (defined similarly to a forwarding network interface of a router, binding the full name of a content to the IP address of the Provider on that direction). + - Example: the Provider serving the network area /de/berlin/ sees request /de/munich/cid1. It forwards the request towards /de/munich/ and populates its routing table with: cid1 -> /de/munich/ | . +- Provider nodes are interconnected with each other in a mesh structure. Every provider node should be connected to at least one other provider node. The degree of connectivity is to be defined. +- When content publishers publish their content, they have to link it to the Provider in the geographic area where the content is stored. + - Example: content item with cid2 is published by a content publisher or a hosting company connected to the Provider node that serves the area /us/nyc/. The full network name of this CID is /us/nyc/cid2. + - The Provider itself can also store the content if the content publisher has a deal with the Provider directly. This does not change the model. + - The model does not prevent another Provider, say, /us/boston/ from storing and serving the content. The cryptoeconomic model needs to work out when and at which cost for the new Provider this should happen. +- If the Provider receives a request for a content item which it does not know, it forwards the request according to the prefixes it sees. + - Example: if the Provider for /de/berlin/ receives a request for /us/nyc/cid2, which it has never heard about, it does longest prefix matching on the request and forwards it accordingly. If it knows the IP address of the Provider in /us/nyc/ it directs the request there. If not, then it forwards to a Provider “listening”/serving the prefix /us/. If it doesn’t know any Provider serving prefix /us/, it will have to forward to its neighbours, the mechanism of which is to be determined. + - To avoid the above situation where a Provider does not know anyone serving some prefix, we can have a prerequisite that there is basic connectivity between all Providers serving top-level prefixes. Populating routing tables can be done in advance by a routing protocol. +- When routers see repeated requests for the same CID, they do not need to forward all of them towards the source. Instead, they can keep the request and, when they receive the content, they can forward it to satisfy all received requests. This way, the system can support a time-shifted multicast model, which can increase performance significantly, especially for heavy and popular content, e.g., popular HD video. + - The downside of this approach is that bandwidth requirements increase significantly if all traffic needs to travel through all Provider nodes where the request has been forwarded through. This will be the case for non-popular content and we expect to have a tradeoff, where after the tipping point the overall bandwidth spent by this approach is lower. + - An alternative is to follow the above approach (called symmetric routing) only after the request count for an object surpasses a threshold. Below that threshold, the request is forwarded through the graph of overlay Providers, but the actual data is transferred through a direct connection from the Provider to the client. +- A Provider that sees repeated requests for some particular CID can fetch, cache and serve the content. This behaviour should be encouraged by the cryptoeconomic model, as it will improve the performance of the network for popular content. + - Assuming the geo-localised connectivity of clients to Providers, we can assume that a client is always connected to a Provider that is less than $k$ms away (say $k=50ms$). + - Every content item is linked with a record that specifies: i) the price they have set in order to serve the item, and ii) the latency under which they can deliver it to the client (assuming a maximum of $k$ms one-way latency). + +### Impact + +If designed and implemented correctly, this scheme can provide significant performance benefits, especially for popular and heavy content. + +### Pros and Cons + +Pros: + +- Clean and relatively simple routing and forwarding model +- Can achieve nearest-replica routing, which will be very beneficial for popular content +- Can achieve time-shifted multicast, which can save a lot of bandwidth for heavy content + + +Cons: +- It’s a new design with little deployment and testing to date in real systems +- Security properties need to be seen carefully, as there haven’t been lots of studies, or experiences from previous deployments. +- It is not clear how accounting can work on top of this setup. + +### Implementation notes + +TBA + +### Evaluation + +- Evaluate the extra bandwidth requirement that symmetric routing is imposing into the system. +- Define the point where a switch between symmetric and asymmetric routing is the best option in terms of bandwidth consumption and delivery delay. +- Approximate the scalability of the system as the content catalogue of the entire network gets bigger. Routing tables might become too big, which will become non-viable for Providers. Can big routing tables be served from fast memory? +- Based on a content catalogue size, network size and a request distribution pattern, what is the probability of routers to receive requests where they have no information about any of the prefixes in the content name. +- Routing protocol for route setup and updates: how long does it take to propagate route prefixes. + +### Prior Work + +- The work is inspired in large parts from the [Named-Data Networking Architecture](https://dl.acm.org/doi/pdf/10.1145/2656877.2656887). +- Further optimisations for content delivery have been proposed at the [iCDN approach](https://dl.acm.org/doi/pdf/10.1145/3405656.3418716). +- The vast literature on the NDN architecture and supporting protocols can be very useful to find solutions to the challenges of the approach. diff --git a/3DM_RFC/RIW2021_RFC_NoMarkets3DM.md b/3DM_RFC/RIW2021_RFC_NoMarkets3DM.md new file mode 100644 index 0000000..98ea9e3 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_NoMarkets3DM.md @@ -0,0 +1,97 @@ + +# RIW2021|RFC: No market 3DMs + +_Status:_ **draft**; **~~ready for review~~**; **~~ready to publish~~ ** + +_Area of Improvement:_ Cryptoeconomics + +_Estimated Effort Needed:_ <?> + +_Prerequisite(s):_ <?> + +_Priority:_ <? P0, P1, P2> + + +### Abstract + +This RFC proposes an economic model for the retrieval network based on CID bounties instead of an actual market. Content publishers offer a bounty to a CID to reward for all the retrievals successfully performed for that CID for a period of time. All miners able to show proofs-of-storage of hot copies of the CID, and proofs-of-retrieval at the end of the specified period of time are rewarded with a split of the bounty. + + +### **Proposal/Construction** + +Anyone looking to serve content through the retrieval network can offer a bounty for the CID that wants to be served. Ideally, a cold copy of this CID object will be stored in Filecoin storage, or a hot copy in the IPFS network. The bounty is determined for a specific period of time (mins, hours, or days). Miners have access to the pool of bounties available and can freely choose the CIDs they want to store and serve. This approach is similar to those of transaction fees in traditional Bitcoins, where according to the transaction fees, miners choose the transactions to include in the next block. + +Miners can choose their own strategy to profit from retrievals: they may choose CIDs with large bounties because the probability of being rewarded is larger; or go for CIDs with small bounties as despite these being low, the competition to split the rewards is lower. + +In order to be eligible for the reward, miners targeting the bounty for a CID must generate periodic proofs-of-storage for the hot copy of the CID. Proofs-of-storage only unlock a percentage of the bounty, in order to unlock the full bounty a minimum number of retrievals for the content need to be performed. Thus, by the end of the lifetime of the bounty, miners receive a split of the unlocked bounty proportional to the number of proof-of-storage they successfully committed. + +This RFC doesn’t reward retrieval but storage of hot copies and evenly splits the unlocked bounty for a CID between all the miners committed to serve the content. This favors the decentralization of the system and spreading hot copies throughout the network. The higher the bounty, the higher the probability of miners storing and serving a CID. Proofs-of-retrieval, i.e. the number of times the CID has been successfully retrieved, only determines the percentage of the bounty that is unlocked. + +This RFC allows several alternative implementation: + + + +* The settlement and reward distribution is done only at the end of the lifetime of the bounty. This means that the content publisher stakes from scratch the amount of the bounty which is not distributed until the end. Throughout the lifetime of the bounty, a publisher is allowed to increase the bounty as an attempt to increase the perceived QoS of the CID’s retrieval. However, the distribution to miners is not done until the end. +* And additional implementation may be devised where a set of periodic checkpoints are determined throughout the lifetime of the bounty to unlock part of the bounty as rewards. + +A more thorough analysis will need to be done in order to determine what implementation is better. + +**Impact** + +Simplifies the implementation of the economic model for the retrieval network. This RFC delegates the responsibility of its operation on reliable proof-of-storage and proof-of-retrieval constructions. Its simplicity makes it applicable as a supplement to other RFCs. + + +### Pros and Cons + +**Pros** + + + +* Simple and elegant design. +* Limited overhead over the rest of the system components. + +**Cons** + + + +* It requires the construction of reliable proofs-of-storage and proofs-of-retrieval. +* It doesn’t prevent sybil and collusion attacks. Miners may run clients retrieving the content they are providing in order to unlock all of the available bounty. There needs to be a way to prevent these attacks. + + +### **Implementation **n**otes** + +Once the proof-of-storage and proof-of-retrieval constructions have been figured out, the implementation of this RFC is limited to a smart contract (or a spec actor) to set new CID bounties, track its lifetime, accept the committing of proofs by miners, and distributes the bounty at the end of its lifetime. + + +### **Evaluation** + +**Qualitative** + + + +* The RFC increases the size of the retrieval market (demand for content retrieval) in the Filecoin network. +* The RFC ensures the profitability of retrieval miners for their cost model and committed resources. +* The UX of clients with this RFC is not penalized in any way (latency, QoS, etc.) +* The model prevents sybil and collusion attacks, and fosters collaboration between entities. + +**Quantitative** + + + +* Simulation analysis of the full economic model to understand the Nash Equilibrium. + * How does the size of the bounty affect the perceived QoS of CID retrieval? + * What takes for miners to keep a copy of a CID? +* Functional testing of implementation: + * Generation of proofs and commitment. + * Create and top-up bounties. + * Bounty distribution and settlement. +* Overhead that the RFC imposes over a file retrieval. +* Testnet to validate all the assumptions. + + +### Prior work + + + +* [Filecoin’s Proof-of-Storage](https://spec.filecoin.io/#section-algorithms.pos) +* [Filecoin’s economic model](https://spec.filecoin.io/#section-algorithms.cryptoecon) diff --git a/3DM_RFC/RIW2021_RFC_OS-level OppNet Component.md b/3DM_RFC/RIW2021_RFC_OS-level OppNet Component.md new file mode 100644 index 0000000..31350db --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_OS-level OppNet Component.md @@ -0,0 +1,69 @@ +# RIW2021 | RFC: OS-level OppNet Component + +_Status:_ draft + +_Area of Improvement:_ Opportunistic Deployments + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +Provide a uniform opportunistic networking interface for applications to register and use, in a way that preserves resources and guarantees benefits flow back to the user. The component would be responsible for optimising the resources allocated to connect with nearby devices in a horizontal manner across different applications. + +### Construction + +- OS-level provider (think COVID contact tracing service provided by Android and iOS) + - Avoid having multiple always-on opportunistic clients, one for each application (fragmentation & waste of resources) + - Guarantees benefit flows to the user vs. the application developer + - Sidesteps restrictions on background applications + - Applications register with OS provider +- It should be as invisible as possible, requiring little intervention/configuration by user + - But the OS could proactively prompt users for help distributing specific content given need and revenue opportunity +- Use local information (e.g. calendar events with location) to decide which information to fetch and store + - Look at upcoming locations and declared interests in said locations + - Leaks minimal data; does not require sharing of contact or location data by opportunistic Provider nodes + +**Open Questions** +- How do you predict what data will be desirable? +- How do you cross the border with this information? + - Data is not encrypted to destination (that would limit efficiency). You could encrypt in-transit or time lock but is that sensible/sufficient? +- How do you communicate said interests in a privacy-preserving way? +- How do you accommodate network needs while avoiding tracking? + +- Would the data be stale by the time it gets there? + - If based on your calendar, this can easily be prevented by having timed interests +- Could machine learning help? +- Does the OS have the concept of CID? Why do it at OS level? + - Android or iOS modules, not POSIX API + - Intended to circumvent OS restrictions on BG activity and guarantee user benefit + - Getting this into a mobile OS seems pretty hard, given complexity and resource requirements. Would require a critical mass so that they can’t ignore it, i.e. bootstrapping problem + +### Impact + +The proposed component would provide benefit to the performance of any opportunistic networking application and therefore, its impact would be significant. However, it is not an add-on software component, or application and would therefore need to be adopted (or perhaps even developed) by the OS vendor(s). + +### Pros and Cons + +Pros: +- The construction would provide performance benefits at the device level. +- The construction would be useful to many applications and could work simultaneously to serve all of them +Cons: +- It’s not an add-on software component and would therefore have to be integrated within the multiple OSes, i.e., OS vendors would have to accept it +- Not something that PL could implement, or make as a product + +### Implementation notes + +TBA + +### Evaluation + +Compare performance benefit of one OS-level component that serves horizontally several applications, against several applications with their own component. + +### Prior work + +TBA + diff --git a/3DM_RFC/RIW2021_RFC_Omniscient Routers.md b/3DM_RFC/RIW2021_RFC_Omniscient Routers.md new file mode 100644 index 0000000..adc49d4 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_Omniscient Routers.md @@ -0,0 +1,79 @@ +# RIW2021 | RFC: Omniscient Routers + +_Status:_ draft + +_Area of Improvement:_ Distribution Graph Forming + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +The core goal of this design is to give clients knowledge about where the hot copies of the content they are looking for are stored. The RFC builds on the premise that the entire content catalogue of provider records of the 3DM ecosystem will be in the order of TBs, hence, routers will be capable of storing it in its entirety. Given this assumption, the RFC explores mechanisms to support and optimise the distribution of record updates, as well as how the network forms. + +### Proposal/Construction + +We consider an architecture where clients connect to one of their local (or known) routers or Provider nodes. Routers are peers in the network with content routing information, who act as request forwarders. Provider nodes on the other hand are nodes that act as routers, but also store hot copies of content. In the context of this RFC, Providers are a subset of routers, although the role of routers has not been crystalised from the cryptoeconomic point of view, so eventually the role of a router might have to collapse into that of a Provider. In this RFC, we use the terms Provider and router interchangeably. + +Provider nodes and routers will always be looking to gather as much information as possible about content records, ideally storing information about all content records in the network in their local database. + +We assume that the required size to store the amount of records (i.e. hot copies of individual objects stored in the retrieval network) will be in the order TBs, so it is not unreasonable to think of a router keeping information about every single content record (in slow memory). Routers will look to have as much updated information about client records as possible, as in this way they would contribute to serving retrieval requests. In turn, being able to serve more retrieval requests means that the router will gather more information about popular content, which it could utilise to fetch and serve popular content from its own cache (hence, receive rewards). Routers will want to be in “the best path” for the content in order to have access to the reward. That said, the underlying retrieval economic model will be enough to incentivize routers to act honestly and keep their routing tables up to date. + +In order to minimize the required size of routing information, we propose using probabilistic set structures (such as bloom filters, cuckoo filters, or interactive protocols) to compress routing data and exchange it with other peers. For instance, peer A would build a bloom filter with a low probability of false positives representing all the content records he knows about, or knows how to reach. Peer A shares this with other peers, such as peer B. When peer B receives a request for CID1, it starts checking it against its own bloom filter, and the bloom filter of all the other peers from which it has received information. Routers will try to have as much information in their routing tables as possible, so finding a router with the routing information for a record should be straightforward. + +Alternative ways to spread provider records could include: +1. Routers share records with their x-hop neighbours, where x is a value that guarantees traffic remains manageable. +2. Routers keep a record for requests they serve only and progressively build as big a table as possible. +3. If names have hierarchical structure, then routers can only keep prefixes that point towards “network directions”, rather than keeping a separate full name entry per content. This enables name aggregation, hence, routing table scalability. + +Routers can update the freshness of their routing information in several ways: +- Through an “updates” pub-sub channel where routers exchange deltas over their routing structures to update the view of others. +- Inspecting the requests being forwarded to them by other peers. +- Using “keep-alive” messages to see if the content has expired. + +The flow of a client request with this scheme in place would be as follows: +1. The client sends the request to its local Provider node, which acts as the gateway to the retrieval network. +2. If this local content service has a hot copy for the content it directly serves it to the client. If on the other hand it has knowledge about who has a hot copy for the content in its routing table, it replies with the peer (or group of peers) with the copy (similar to what traditional CDN’s DNS systems currently do). +3. If the local content service doesn’t know where a peer is storing the content, it points to a bigger content service (other router with more information about content records). In this way, we try to recursively find the hot copies. +4. Once the client receives a response with a dialable peer storing a hot copy of the content, the retrieval starts, and the client metering comes into action. +5. Retrieval itself does not follow symmetric routing, i.e., the content does not flow through the Provider node/router that pointed to the hot copy. Optimisations here could include symmetric routing when a router/Provider receives multiple requests for the same content (i.e., time-shifted multicast). + +A name-based routing approach, based on hierarchical names could be an improvement here. + +### Impact + +This is an impactful proposal for two reasons: +1. It provides an easy (to conceptualise and implement) way to kickstart the 3DM network (i.e., all Provider nodes/routers know everything about every content object), which will be viable in the beginning. +2. It gives us a nice way to experiment with request forwarding through the network of Providers. There is a lot of optimisation that can be carried out afterwards, but having a working playground will be very valuable. + +### Pros and Cons + +Pros +1. Fast retrieval of content +2. Easy to implement +3. The network of Provider nodes becomes the 3DM graph itself, which makes performance optimisation easier, e.g., multicast can be easily implemented on top. + +Cons +1. Having symmetric routing on by default would increase bandwidth consumption of routers/Providers. +2. A lot of memory needed to keep provider records. +3. There is no backup solution for the case when the local Provider/router does not have the content and does not know anyone who has the provider record. The proposal mentions some hierarchical structure so that the request is forwarded to a parent cache, but this has to be better defined. + +### Implementation notes + +TBA + +### Evaluation + +- Size of routing table relative to the content catalogue size +- Bloom/Cuckoo filter efficiency evaluation +- Symmetric vs Asymmetric retrieval routing - what is the right point to switch between the two and what is the bandwidth overhead +- Pubsub protocol evaluation for content record propagation relative to the network size and the content catalogue +- Is the speed at which a record is served an issue? Given the size of the routing table, entries will have to be stored in slow memory. + +### Prior work + +N/A + diff --git a/3DM_RFC/RIW2021_RFC_On-Demand Opportunistic Resource Deployment.md b/3DM_RFC/RIW2021_RFC_On-Demand Opportunistic Resource Deployment.md new file mode 100644 index 0000000..ac99399 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_On-Demand Opportunistic Resource Deployment.md @@ -0,0 +1,63 @@ +# RIW2021 | RFC: On-Demand Opportunistic Resource Deployment + +_Status:_ < draft > + +_Area of Improvement:_ < Opportunistic Deployments > + +_Estimated Effort Needed:_ + +_Prerequisite(s):_ + +_Priority:_ + +### Abstract + +The goal of the proposal discussed here is to match supply to demand peaks by deploying resources around specific needs, on-demand. We can think of this as a swarm of collocated users that are “called in” when demand for storage or bandwidth increases. A potential extension could be plug-n-play devices (e.g., Raspberry Pis) that run dedicated webapps to serve local demand. + +### Construction + +- Two user categories: (i) users with resources but not full Provider nodes; (ii) Provider nodes that only aim to provide short-term service driven by high/profitable demand. +- A potential third user category would be plug-n-play devices, such as Raspberry Pis that run dedicated software/webapps and can provide support for local demand, when plugged in. +- There is no mobility issue in this scenario. The main concern is how to incentivise the deployment of capacity to address specific local surges in demand (not unlike Uber surge pricing). +- Requires some infrastructure to distribute information related to real demand (not publisher requests). Could be a gossip market of deals. + +**Open Questions** + +- How do we price short-term needs? (out of scope of this RFC) +- What are the steps to advertise as a provider? +- How does someone analyse the network to determine needs? +- How does a Provider recruit, or “call-in” devices on demand? Do they listen to some channel constantly and “wake-up” when they receive some specific beacon? +- What are example applications here? + - Are floating-content applications applicable? + - Is message propagation in disaster-scenarios applicable? +- Can we listen to existing CID requests? +- Is there a way to provide advance information, pre-empt actual needs? + - This would need to be vetted +- This service has customers: clients and publishers. If publishers pay, it’s easy to call Providers to the event. + +### Impact + +It would be nice to have “pop-up” capacity on-demand. Many interesting applications can be built on top, such as floating content-like applications, message propagation in disaster scenarios, where the network is down, or even “local social networks” in large events. + +### Pros and Cons + +Pros: +- Applies to several use-cases where it can enable communication and applications that are not possible today. + +Cons: +- Tricky to implement: how would devices be “called in”? +- Difficult to integrate an economic model into this. The return for users being “called in” every once in a while will be minimal and therefore, users will likely not bother. It could apply in cases where there is a real need (e.g., disaster), or a common spirit/vision (e.g., football club/stadium) + +### Implementation + +Implementation through different wireless media (e.g., WiFi Direct vs Bluetooth) need to be investigated. There are different performance limitations by using each one of them + +### Evaluation + +Evaluate how quickly can a swarm of on-demand devices be formed and start contributing to the network. + +### Prior work + +[Composable Distributed Mobile Applications and Services in Opportunistic Networks](http://www.netlab.tkk.fi/~jo/papers/2018-06-wowmom-composable-apps.pdf) + + diff --git a/3DM_RFC/RIW2021_RFC_QFIL Closed Retrieval Economy.md b/3DM_RFC/RIW2021_RFC_QFIL Closed Retrieval Economy.md new file mode 100644 index 0000000..1996884 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_QFIL Closed Retrieval Economy.md @@ -0,0 +1,135 @@ + + +# RIW2021|RFC: QFIL Closed Retrieval Economy + +_Status:_ **draft**; **~~ready for review~~**; **~~ready to publish~~ ** + +_Area of Improvement:_ Cryptoeconomics + +_Estimated Effort Needed:_ <?> + +_Prerequisite(s):_ <?> + +_Priority:_ <? P0, P1, P2> + + +### Abstract + +This RFC models the economic model of the retrieval network as a closed economy backed by the Filecoin economy. It divides the model into two parts: an auction market between content publishers and retrieval miners to agree on the minimum deposit the publisher needs to stake to pay for the retrievals; and the closed retrieval economic orchestrated by the QFIL currency that rewards the parties involved in the retrievals and is backed by the publisher’s deposit. + + +### **Proposal/Construction** + +This RFC assumes content publishers (CPs) as the ones paying for retrievals, i.e. the entity willing to serve a specific CID through the retrieval network is the one paying for the retrievals. The content to be served through the retrieval network may be stored sealed in Filecoin, or in a hot (unsealed) state in IPFS. + +To put content into the network, content publishers run a “take-it-or-leave-it” auction with retrieval miners. CPs offer a price and a desired QoS for retrievals. Retrieval miners accept offers from the pool according to their reserve price. Retrieval miners will only choose offers from the auction pool for which the offer price is over the miner’s reserve price (minimum price it is willing to accept for an offer). Miner’s compute their reserve price according to their expected costs of: + + + +* Query execution (e.g.. Unsealing data, cloud bandwidth costs, amortized storage costs for a cache, solving NP problems for metering --if the construction requires it--, searching data structure, etc.). +* Cost of bandwidth +* Collaboration costs: costs incurred to mint rewards for other miners helping on retrievals. + +When a retrieval miner accepts an offer, the CP needs to stake the payment deposit according to the base price and the specification of the deal, and from there on the accepting retrieval miner becomes the “deal owner”. This “take-it-or-leave-it” auction may be replaced by a simple negotiation phase between a miner and a content publisher. The result of the setup phase would be the same: a deposit to cover retrieval payments and back the rewards being minted in the closed retrieval economy, and a retrieval miner as “deal owner” of this stake. + +The closed retrieval economy is governed by a retrieval currency (QFIL). QFIL is a volatile token that can be in two forms: + + + +* Transient (TQFIL): Is the form it takes when it is minted. Committing a valid voucher for a full (or partial) file exchange triggers the minting of new TQFIL by the deal owner to reward the parties involved in the exchange. TQFIL is associated with the deal owner, and it can’t be transferred, it can only be transformed into stable QFIL or FIL from the deal stake. TQFIL has a larger decay than stable QFIL to prompt a transformation decision on holders (preventing an entity from having a lot of power over a deal stake). In order for a deal owner to mint new TQFIL it needs to burn FIL or QFIL. +* Stable (QFIL): Base stable currency of the closed economy. It has a lower decay than TQFIL, and it is burned to mint new TQFIL. Retrieval miners will be willing to help other miners in the system in order to be rewarded with QFIL, and avoid having to burn their FIL to pay for the retrievals of deals they own. + +Instead of having a global system orchestrating all stakes and deals in the network, the role of the deal owner enables a local orchestration of deal funds, with QFIL as the glue of the full retrieval economy. As an analogy, TQFIL is the local currency, while QFIL is the global currency in the economy. + +This RFC requires a reliable signal to trigger reward minting. This can be in the form of an on-chain commitment of a payment voucher, a transaction, etc. There is no specific requirement imposed in this sense. Ideally, if the construction of these signals allows the inclusion of the different parties involved in the retrieval (and not just the miner serving the file) more complex rewarding systems may be devised. + +Finally, retrieval deals may include an additional “reward for QoS”. Content publishers may provision an additional stake in their deposit to unlock a reward if a retrieval was performed with the minimum QoS required by him in the deal. + +Retrievals under this model would look as follows: + + + +* Clients send their requests to their local retrieval miner for a specific CID. If this miner is not able to fulfill the retrieval himself will pay QFIL to all the collaborators committing resources to help him with the deal. This collaboration may be in the form of hot copy sharing, forwarding services, or whatever other scheme we want to incentivize in the graph forming architecture. +* The successful retrieval of the client generates a signal (as mentioned above in the form of a voucher or transaction) and triggers the minting of the rewards for the retrieval in the form of the deal owner’s TQFIL. This same signal triggers the unlock of the QoS additional reward if applicable. +* The reward is distributed almost entirely to the retrieval miner that performed the retrieval, except for a small amount reserved for the deal owner (that way incentivizing retrieval miners to accept deals even if they won’t be completely involved in the retrieval). + * An alternative reward mechanism may be devised where instead of minting rewards for the retrieval miner and the deal owner exclusively, and enable cross-payments between retrieval miners for collaboration, the rewards are directly minted for all the parties involved in the exchange. This may impose additional requirements to the system. +* The deal owner needs to hold enough QFIL or a FIL stake to mint the TQFIL for the reward. +* Finally, all the rewarded parties choose what to do with their TQFIL, if to cash out into FIL (leave the closed economy), or keep QFIL to back future retrievals of the deals they own. + +_Sidenote: We called QFIL as Quantum FIL because it can be in two forms, but it ends up decaying to a stable form._ + +**Impact** + +This RFC proposes an economic framework backed by the Filecoin economy. The model presents all the elements required to design any incentive system required in the rest of the part of the system: client metering, content payment, and graph forming. By building this economic model as a closed economy we require no changes on the Filecoin economy or impose additional requirements to it. With this, the retrieval economy will be able to evolve independently of the Filecoin economy. + + +### Pros and Cons + +**Pros** + + + +* Economic model decoupled from the Filecoin economy and with all the elements for the design of the incentive systems required in the network. +* The only requirement for the model to work is a reliable signal of successful retrieval to trigger the rewards and payments. +* The design of QFIL fosters the collaboration between entities in the system. +* The fact that every reward minted in the system requires some currency to be burnt prevents sybil attacks (and potentially even collusion attacks). Even more, the minting of every reward is backed by its equivalent in FIL. + +**Cons** + + + +* Fine-tuning all the parameters of the model may be hard. There are a lot of moving parts: design of the rewards, QFIL and TQFIL decay speeds, etc. +* A more thorough analysis of the Nash equilibrium of the system is required to ensure that the design of QFIL indeed fosters collaboration and achieves our goals. +* How to reward collaboration between miners is really tied to the graph forming design, so nothing can be specified in that sense until the design of the graph forming, and the interaction between retrieval miners is fleshed out. +* This model doesn’t prevent an entity from trying to bankrupt a specific content publisher. Someone may want to indiscriminately retrieve a file from a content provider to dry its stake. The fact that content is cached, and the resources required may lead to economies of scale that make the impact of this attack negligible. + + +### **Implementation **n**otes** + + + +* All the logic behind the minting of rewards and deposits should be implemented in a smart contract (or Filecoin actor). This smart contract receives signals from successful retrievals, and when new stakes for a deal have been set up. +* The logic for the auction (or negotiation phase) will be independent from the aforementioned actor. + + +### **Evaluation** + +**Qualitative** + + + +* The RFC increases the size of the retrieval market (demand for content retrieval) in the Filecoin network. +* The RFC ensures the profitability of retrieval miners for their cost model and committed resources. + * How does miners' reserve price be computed? +* The UX of clients with this RFC is not penalized in any way (latency, QoS, etc.) +* The model prevents sybil and collusion attacks, and fosters collaboration between entities. + +**Quantitative** + + + +* Simulation analysis of the full economic model to understand the Nash Equilibrium. + * How should the decay of QFIL be designed? + * Does the use of a stable QFIL make sense for miners? Would they be willing to transform TQFIL into QFIL instead of FIL, or an additional reward should be considered making 1TQFIL = alphaQFIL with alpha > 1? +* Functional testing of implementation: + * QFIL mechanism and rewards. + * Auction / Negotiation. + * Payments / Collaboration between miners +* Overhead that the RFC imposes over a file retrieval. +* Testnet to validate all the assumptions. + + +### Prior work + + + +* [ObservableHQ modelling a preliminary proposal of this RFC](https://observablehq.com/@protocol/3dms-cryptoecon-proposal) + +This RFC takes inspiration from the following papers: + + + +* [Edge-MAP: Auction Markets for Edge Resource Provisioning:](https://drive.google.com/file/d/1g-kbaPUosWnY1a9098T9nh6TDx9ekQeJ/view?usp=sharing) Inspiration for the use of local auctions. +* [An economic mechanism for request routing and resource allocation in hybrid CDN–P2P networks](https://drive.google.com/file/d/19_OPhuR4qkbH55SkG8ucgMYn-js445Q0/view?usp=sharing): Cost prediction in hybrid CDNs. +* [Proof-of-Prestige: A Useful Work Reward System for Unverifiable Tasks](https://drive.google.com/file/d/13gcJP0DZnCCpcIy5BFbe7uLaxxGpL7JW/view?usp=sharing): Use of a volatile token. +* [A Market Protocol for Decentralized Task Allocation](https://drive.google.com/file/d/1GC3DU0I_-6NSMtfoTl55t6fKPptIkCL5/view?usp=sharing): Use of a reserve price to achieve Nash equilibrium in a decentralized auction system. diff --git a/3DM_RFC/RIW2021_RFC_ZK-friendly IPLD data model.md b/3DM_RFC/RIW2021_RFC_ZK-friendly IPLD data model.md new file mode 100644 index 0000000..16f7e80 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_ZK-friendly IPLD data model.md @@ -0,0 +1,66 @@ + + +# RIW2021|RFC: ZK-friendly IPLD data model + +_Status:_ <**draft**; **~~ready for review~~**; **~~ready to publish~~**> + +_Area of Improvement:_ < Data Delivery Metering | Distribution Graph Forming | Opportunistic Deployments | Cryptoeconomics > + +_Estimated Effort Needed:_ <?> + +_Prerequisite(s):_ <?> + +_Priority:_ <? P0, P1, P2> + + +### Abstract + +Verifying a CID for a full IPLD data model can be complex and computationally extensive. Building an ZK-compatible IPLD data model, would allow the inclusion of a ZK proof of the CID avoiding the verification computation. Using ZK-compatible hot copies for 3DMs would increase the ability of generating proofs, benefiting the design of the client metering protocol. + + +### **Proposal/Construction** + +In order to verify data structures inside a SNARK, the computations that have been done to calculate the hash has to be redone inside the SNARK. Currently IPLD does not use SNARK friendly hash functions, nor repeatable data structures in many places. For instance, any change in the structure of building the Merkle Tree or how the hash of a file is constructed means writing a new SNARK circuit, increasing complexity and cost. A large part of decisions on data structures for hashing and hash functions in Filecoin have been designed to work well with SNARKs. + +This RFC proposes the design of ZK-compatible IPLD structures so proofs can be embedded on IPLD structures. This is a complementary proposal that may benefit the design of other constructions and protocols of 3DMs. + +IPLD -> Raw bytes -> Encrypt data -> Merklize with Poseidon Hash -> ZKCP protocol proof -> Decrypt -> Generate IPLD -> Verify + +**Impact** + +It makes hot copies in 3DMs ZK-ready, opening the door to the design of proof-based protocols. + + +### Pros and Cons + +**Pros** + + + +* Creates a data model in which generating proofs is easier than in the current IPLD model. +* Eases the construction of protocols that require proof over the data in the retrieval network. +* It has been done successfully in Filecoin. + +**Cons** + + + +* Requires changes over the current IPLD structure. +* It may be hard to impose all the data served through the retrieval network to follow this structure. Additional encoding may be required to serve the data in this format into the network. + + +### **Implementation **n**otes** + +<If possible, some pointers on how to implement the proposal> + + +### **Evaluation** + +Generating proofs and validating data in IPLD structures can be done in an efficient way. + + +### Prior work + + + +* [Filecoin Merkle Proofs](https://spec.filecoin.io/#section-algorithms.sdr.merkle-proofs) diff --git a/3DM_RFC/RIW2021_RFC_ZKCP Optimizations.md b/3DM_RFC/RIW2021_RFC_ZKCP Optimizations.md new file mode 100644 index 0000000..bea87c2 --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_ZKCP Optimizations.md @@ -0,0 +1,57 @@ +# RIW2021 | RFC: ZKCP Optimizations + +_Status:_ **draft**; **~~ready for review~~**; **~~ready to publish~~** + +_Area of Improvement:_ Data Delivery Metering + +_Estimated Effort Needed:_ <?> + +_Prerequisite(s):_ <?> + +_Priority:_ <? P0, P1, P2> + + +### Abstract + +TBWRITTEN + +### **Proposal/Construction** + +* Assumption + * We assume that encrypting a file and issuing a Proof-of-Retrievability is an expensive operation +* Optimization I + * encrypted the data once, then encrypt the encryption key to the client to do the fair exchange + * Pros + * Only issue s Proof-of-Retrievability once + * A group of providers can collaborate over a set of pre-encrypted files and issue the keys to the clients as they request + * Cons + * Opens the potential for a grieving attack, where multiple clients ask the provider to get the file, but then only one client pays for the decryption key + * Open Problem: Need to find a way for individual parties to be unable to share the keys to decrypt the same ciphertext. + * Perhaps the client specific key does one last, non expensive scramble? +* Optimization II + * Only prove a small n of chunks out of N + * Getting n random chunks in an initial interaction for free might be fine as it would take O(n^3) time to collect the whole file that way + * Question for Steven: how do you prove random pieces belong to file that you want? + * Merkle-tree, share the merkle path + * If an IPLD dag is used, we could leak data on proving a path + * Attacker wouldn’t necessarily want to pull different pieces from multiple endpoints (expensive) + +**Impact** + +TBWRITTEN + +### Pros and Cons + +TBWRITTEN + +### **Implementation **n**otes** + +TBRITTEN + +### **Evaluation** + +TBRITTEN + +### Prior work + +TBRITTEN diff --git a/3DM_RFC/RIW2021_RFC_ZKCP with Fair Exchanges of 1 bit.md b/3DM_RFC/RIW2021_RFC_ZKCP with Fair Exchanges of 1 bit.md new file mode 100644 index 0000000..ac5442e --- /dev/null +++ b/3DM_RFC/RIW2021_RFC_ZKCP with Fair Exchanges of 1 bit.md @@ -0,0 +1,57 @@ +# RIW2021 | RFC: ZKCP with Fair Exchanges of 1 bit + +_Status:_ **draft**; **~~ready for review~~**; **ready to publish** + +_Area of Improvement:_ Data Delivery Metering + +_Estimated Effort Needed:_ <?> + +_Prerequisite(s):_ <?> + +_Priority:_ <? P0, P1, P2> + +### Abstract + +In a ZKCP and ZKCSP, there are (at least) two events might happen that will lead to value loss, these are: + +* The client doesn’t deliver the payment to get access to the key, creating a grieving attack on the provider of the data and/or service +* The provider vanishes before it sells the key to the client, leaving the client with unusable cyphertext + +In order to solve this, we propose a continuously running Fair Exchanges of \ +1 bit of the key throughout the delivery of the file. Giving the provider some assurance of the payment and giving the client an option to only have to brute force a few bits (O(2^k)) of the key in case the provider disappears (simpler than brute forcing the whole key) + +### **Proposal/Construction** + +In a ZKCP for a file, it is agreed that at each n blocks out of m total blocks, a Fair Exchange for 1 bit of the key happens. The flow goes as follows: + +* Incremental fair exchanges +* Exchange 1 bit of the file decryption key for 1 bit of a redeemable payment ticket/signature of the same size. +* For both of these, all bits are necessary to make any use +* However, if k bits are missing, the remaining bits can be brute-forced in O(2^k) +* If either party aborts, at any round, they only have one more bit than the other party. +* That means the malicious party only gains an advantage of 2x the compute needed to get the resource they wanted (the decryption key or the payment) + +**Impact** + +Saves RTT at the end for selling the key, gives assurance to both parties. Somewhat similar to pay-per-packet + +### Pros and Cons + +Pros +* Reduce an additional RTT at the end, piggybacking the fair exchange on the File Transfer itself +* Enables the client to signal satisfaction to the provider in case the provider is not delivering on the right bandwidth (e.g. not meeting the SLA, client stops paying for more bits) + +Cons +* Added complexity and overhead and proving each 1 bit for each key + +### **Implementation **notes** + +TBWRITTEN + +### **Evaluation** + +TBWRITTEN + +### Prior work + +ZKCP diff --git a/BEYOND_BITSWAP/README.md b/BEYOND_BITSWAP/README.md deleted file mode 100644 index 2900b6a..0000000 --- a/BEYOND_BITSWAP/README.md +++ /dev/null @@ -1,97 +0,0 @@ -# Project: Beyond Bitswap - -## Motivation & Vision - -File-transfer is at the core of IPFS and every subsystem inside IPFS is built to enable it in a fast and secure way, while maintaining certain guarantees (e.g. discoverability, data integrity and so on). - -There are a thousand ways to slice a file into pieces and how to transfer it over the wire. However, finding what is the optimal way for the runtime (high powered, low powered device) and network conditions (stable, unstable, remote, offline) is the key challenge. - -In high level, this project is about: -* Continuing the previous work on Block Exchange (i.e. Bitswap) and Graph Exchange (i.e. GraphSync) -* Creating a harness that enables to reproducibly run tests that demonstrate the performance of different file-transfer strategies. -* Research and prototype new strategies to acquire new speed ups. -* Acquire leverage by exposing the harness to the whole Open Source and Research community, in a way that others feel compelled to join the effort and try their own strategies. - -In short, the aim of the project is two-fold: to drive speed-ups in file-sharing for IPFS and other P2P networks; and to enable a framework for anyone to join the quest of designing, implementing and evaluating brand new file-sharing strategies in P2P networks. - -## Why the project code name? - -Bitswap has been for some time the file-sharing subsystem within IPFS, then Graphsync came to propose a new way of approaching file-sharing on IPFS. The scope of the project is not only to improve Bitswap's performance, but file-sharing in P2P networks as a whole. We don't restrict ourselves exlusively to Bitswap or IPFS for our exploration. - -Being said that, the fact that IPFS had an infrastructure in place to start testing our ideas, and Bitswap being its file-sharing module, made us start our initial explorations over Bitswap and IPFS, but our aim is to go way farther and improve file-sharing performance with new protocols and proposals every P2P network can leverage and benefit from. In short, we want to go "Beyond Bitswap". The project can be considered a success if by the end of it one has a set of pluggable protocols and modules to achieve file-sharing in P2P environments, along with all the testbeds, tools and benchmarks required to improve this protocols and go _"Beyond Bitswap"_. - -## 💌 Invite to Research with us - -ResNetLab collaborates with over 10 Research Groups all over the world and Protocol Labs Research has developed research collaborations in multiples of ten in the last few years. We are always eager to collaborate with more researchers in all kinds of capacity, from thesis project (M.Sc or PhD), to Post-Doc, Grants, RFPs and independent research projects. - -We are making all our contributions, ideas, testbed, benchmarking and analysis scripts available below. You are more than welcome to pick any of these assets and build on top of it. If you have questions, please [mail us](mailto:resnetlab@protocol.ai). - -## Contributions & Results - -### Documents - -* [Related Work](https://docs.google.com/document/d/14AE8OJvSpkhguq2k1Gfc9h0JvorvLgOUSVrj3CnOkQk/edit#heading=h.nxkc23tlbqhl): It gives an overview of the problem, how it will be tackled, and a collection of references and community proposals. -* [Beyond Bitswap Slides](https://docs.google.com/presentation/d/18_aRTye2t6Xs_VhKwEbhvCYYu9ePaLgamIrJkpUDtfY/edit?usp=sharing): Set of slides introducing the project and summarizing the Related Work document from above. - -* [Survey of the state of the art](https://docs.google.com/document/d/172q0EQFPDrVrWGt5TiEj2MToTXIor4mP1gCuKv4re5I/edit?usp=sharing): It summarizes a list of papers on file-sharing strategies in P2P networks used as a groundwork for the projects. -* [Evaluation Plan](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl): Document describing the testbed and evaluation plan designed to test the performane of current implementation of file-sharing systems, and compare it with the improvements implemented within the scope of this work. -* [Enhancements RFC](#enhancements-rfcs): A list of enhancements proposals and ideas to improve file-sharing in IPFS and P2P networks. - - -### Enhancement RFCs - -This section shares a list of improvement RFCs that are being currently tackled, discussed and prototyped. Each RFC aims to test a specific idea or assumption, and they may initially be implemented over Bitswap, but that doesn't mean the conclusions drawn are exclusively applicable to the Bitswap protocol. RFCs are divided in the different layers for file-sharing in P2P sytems identified in the [Related Work](https://docs.google.com/document/d/14AE8OJvSpkhguq2k1Gfc9h0JvorvLgOUSVrj3CnOkQk/edit#heading=h.nxkc23tlbqhl). - -If you want to familiarize with our work, we highly recommend exploring first the RFCs in `prototype` state, and then move to the ones at a `draft` or `brainstorm` state. `prototyped` RFCs are in a stage where there is working prototype you can start evaluating and playing with. The `draft` state means that the RFC is ready for implementation, while `brainstorm` RFCs require further discussions and design work. - - -| RFC | Status | -|-------------------------------------------------------------------------------------------------------------|-------------| -| [RFC|BB|L0-09: Hashing algorithm improvements](./RFC/rfcBBL009) | `brainstorm`| -| [RFC|BB|L1-04: Track WANT messages for future queries](./RFC/rfcBBL104.md) | `prototype` | -| [RFC|BB|L1-02: TTLs for rebroadcasting WANT messages](./RFC/rfcBBL102.md) | `prototype` | -| [RFC|BB|L1-06: Content Anchors](https://github.com/protocol/ResNetLab/issues/6) | `brainstorm`| -| [RFC|BB|L1/2-05: Use of super nodes and decentralized trackers](./RFC/rfcBBL1205.md) | `brainstorm`| -| [RFC|BB|L12-01: Bitswap/Graphsync exchange messages extension and transmission choice](./RFC/rfcBBL1201.md) | `draft` | -| [RFC|BB|L2-03A: Use of compression and adjustable block size](./RFC/rfcBBL203A.md) | `prototype` | -| [RFC|BB|L2-03B: Use of network coding and erasure codes](./RFC/rfcBBL203B.md) | `brainstorm`| -| [RFC|BB|L2-07: Request minimum piece size and content protocol extension](./RFC/rfcBBL207.md) | `brainstorm`| -| [RFC|BB|L2-08: Delegate download to other nodes (bandwidth aggregation)](./RFC/rfcBBL208.md) | `brainstorm`| - -**Layer 0: Data Structure:** -* [RFC|BB|L0-09: Hashing algorithm improvements](./RFC/rfcBBL009): - -**Layer 1 RFCs: Discovery and announcement of content:** -* [RFC|BB|L1-04: Track WANT messages for future queries](./RFC/rfcBBL104.md): Evaluates how using information from a nodes surrounding can help the discovery and fetching of popular content in the network. -* [RFC|BB|L1-02: TTLs for rebroadcasting WANT messages](./RFC/rfcBBL102.md): It evaluates how broadcasting exchange requests TTL hops away, and allowing other nodes to discover and retrieve content on behalf of other peers, may help the discovery of content improving performance. -* [RFC|BB|L1/2-05: Use of super nodes and decentralized trackers](./RFC/rfcBBL1205.md): Aknowledge the fact that P2P networks are also social networks and there are different types of nodes in the network. Explore the use of side-channel discovery mechanisms. -* [RFC|BB|L1-06: Content Anchors](https://github.com/protocol/ResNetLab/issues/6): Evaluate the use of gossipsub to perform more efficient content routing. - -**Layer 2 RFCs: Negotiation and transmission of content:** -* [RFC|BB|L12-01: Bitswap/Graphsync exchange messages extension and transmission choice](./RFC/rfcBBL1201.md): Proposes dividing the exchange of content in two phases: a negotiation phase used to discover the holders of the different chunks of a file, and a transfer file to explicitly request blocks from different chunk holders. This opens the door to additional exchange strategies and schemes to improve performance. -* [RFC|BB|L2-03A: Use of compression and adjustable block size](./RFC/rfcBBL203A.md): Evaluates the potential performance improvementes on the use of compression for the exchange of content in P2P networks. -* [RFC|BB|L2-03B: se of network coding and erasure codes](./RFC/rfcBBL203B.md): Evaluates the potential performance improvementes on the use of network coding and erasure codes to leverage the transmission of content from multiple streams. -* [RFC|BB|L2-07: Request minimum piece size and content protocol extension](./RFC/rfcBBL207.md): Evaluates how the size of the chunks that comprises content requested in a P2P network may affect performance. -* [RFC|BB|L2-08: Delegate download to other nodes (bandwidth aggregation)](./RFC/rfcBBL208.md): Leverage the resources of other peer "friends" to collaboratively discover and retrieve content, and perform faster content retrievals. - -Feel free to jump into the discussions around the project or to propose your own RFC opening an issue in the repo. - -### Code & Testbed - -* [Testbed, benchmarking, analysis scripts and related assets](https://github.com/protocol/beyond-bitswap/): All the code used for the implementation and other auxiliary testing assets. Additional documentation is provided in the repo. -* [Bitswap fork](https://github.com/adlrocha/go-bitswap): This fork of `go-bitswap` is the one being used to implement and evaluate some of the RFCs and where additional metrics that want to be tracked in the testbed are being included. RFCs are imeplemented in different branches with the name of the RFC code. - -### Talks / Videos - - -* [Progress update September 2020](https://drive.google.com/file/d/1vUWnfQMIqz9hoqWB941vbzqkP16-_ydd/view?usp=sharing): Progress update of the project explaining the RFCs implemented, the testbed and some preliminary results. -* [How rfcBBL104 was implemented](https://drive.google.com/file/d/1YS3RoNdeeG1vauJpfvHvKUQzPHr97eHF/view?usp=sharingg): Video on how the implementation of rfcBBL104 was approached. -* [A Deep Dive In Bitswap](https://drive.google.com/file/d/1jgTOFFtRL0UYeDk98NHoNlEuujBaK08b/view?usp=sharing): Workshop describing in detail the operation of Bitswap and the implementation of some of the improvement RFCs. -* [Demo of compression in libp2p](https://drive.google.com/file/d/1YcemfkS5ZNnH66-tTGmerNrgrsW-bbpD/view?usp=sharing): A demo of the exchange of files between two IPFS nodes with compression enabled in libp2p. - -### Publications -* ["Two ears, one mouth": how to leverage bitswap chatter for faster transfers](https://research.protocol.ai/blog/2020/two-ears-one-mouth-how-to-leverage-bitswap-chatter-for-faster-transfers/) -* [Honey, I shrunk our libp2p streams](https://research.protocol.ai/blog/2020/honey-i-shrunk-our-libp2p-streams/) -* [Beyond Bitswap](https://adlrocha.substack.com/p/adlrocha-beyond-bitswap-i) -* [Network Coding in P2P Networks](https://adlrocha.substack.com/p/adlrocha-network-coding-in-p2p-networks) -* [Hash Array Mapped Tries](https://adlrocha.substack.com/p/adlrocha-hash-array-mapped-tries) \ No newline at end of file diff --git a/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage1.png b/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage1.png deleted file mode 100644 index b411f5a..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage1.png and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage2.png b/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage2.png deleted file mode 100644 index f8db38d..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage2.png and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage3.png b/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage3.png deleted file mode 100644 index 383293c..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/rfcBBL102-stage3.png and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/images/rfcbbL104-result-baseline.png b/BEYOND_BITSWAP/RFC/images/rfcbbL104-result-baseline.png deleted file mode 100644 index 52a8daf..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/rfcbbL104-result-baseline.png and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/images/rfcbbL104-results-rfc.png b/BEYOND_BITSWAP/RFC/images/rfcbbL104-results-rfc.png deleted file mode 100644 index f111970..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/rfcbbL104-results-rfc.png and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/images/rfcbbL104.png b/BEYOND_BITSWAP/RFC/images/rfcbbL104.png deleted file mode 100644 index 3c41d1a..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/rfcbbL104.png and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/images/ttl_slow.gif b/BEYOND_BITSWAP/RFC/images/ttl_slow.gif deleted file mode 100644 index 2a64d58..0000000 Binary files a/BEYOND_BITSWAP/RFC/images/ttl_slow.gif and /dev/null differ diff --git a/BEYOND_BITSWAP/RFC/rfcBBL009.md b/BEYOND_BITSWAP/RFC/rfcBBL009.md deleted file mode 100644 index 2c82025..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL009.md +++ /dev/null @@ -1,30 +0,0 @@ -# RFC|BB|L0-09: Hashing algorithm improvements -* Status: `Brainstorm` - -## Abstract - - -Every time Bitswap receives a new block, [it generates the CID from the payload of the block](https://github.com/adlrocha/go-bitswap/blob/fad1a007cf9bc4f7e8e3f182a4645df60a88a9c6/message/message.go#L222) in order to verify that it belongs to a block it has in its wantlists. This means computing a lot of hash functions. This may involve a significant overhead. - -## Description -Exploring more efficient implementation of hash functions, or alternative hash algorithms to fit different hardware architectures could remove an important overhead for Bitswap (and other modules from the IPFS ecosystem). - -## Implementation plan -- [ ] Evaluate the overhead of hashing every block in Bitswap. This can be done by exchaching a large file and precompute the CIDs so computing the CID for every block is not needed. -- [ ] If we see that the overhead from hashing every block is significant, explore other hash functions and make a Bitswap implementation able to support other hash algorithms. Perform the same evalution from above and check the difference in the overhead. - -# Impact -- Reduction in the Bitswap protocol overhead. The protocol runs faster. - -## Evaluation Plan - -- [The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) - -- Measurement of the overhead for different file exchanges for different hash algorithms. - -## Prior Work -- https://github.com/minio/blake2b-simd - -## Results - -## Future Work diff --git a/BEYOND_BITSWAP/RFC/rfcBBL102.md b/BEYOND_BITSWAP/RFC/rfcBBL102.md deleted file mode 100644 index e3f83ac..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL102.md +++ /dev/null @@ -1,95 +0,0 @@ -# RFC|BB|L1-02: TTLs for rebroadcasting WANT messages -* Status: `Draft` -* Implementation here: https://github.com/adlrocha/go-bitswap/tree/feature/rfcBBL102 - -## Abstract - -This RFC proposes setting a TTL on Bitswap WANT messages and TTL ceiling per node, in order to increase the chance of a node finding a provider that has the content without resorting to the DHT. - - - - - -## Shortcomings - -Bitswap only sends WANT messages to its directly connected peers. This limits the potential for finding the peer with the content to the peers directly connected to or the ones that result from a DHT query, which has its cost in time and connectivity. - -## Description - -The idea is to include a TTL to WANT messages. That way instead of forwarding WANT messages to our directly connected peers, we can increase the scope to, for instance, the connected peers of our connected peers (TTL=1). With this, we increase the span of discovery of content without having to resort to the DHT. This TTL needs to be limited to a small number to avoid flooding the network with WANT requests. It also complicates the implementation of the protocol, as now nodes need to track not only sessions from their directly connected peers but also from the ones x-hops away from them. Several design decisions would have to be made in the implementation such as the following (ideally the best value for these fields will be determined in testing. Additionally, we could set them to be dynamic according to the state of the network or the developer's desire. This will be explored in the future work). - -- Max TTL allowed. [This study proves](http://conferences2.sigcomm.org/acm-icn/2015/proceedings/p9-wang.pdf) that a Max TTL = 2 achieves the best performance (for moderately popular content) without severe impact in latency, so we can consider this as the baseline value. However, The impact and performance of this will depend heavily on how many connections each node maintains. - -- Forwarder of discovered blocks: Nodes x-hops away from the source of the requests can send responses following two approaches: - - - Symmetric routing: Messages are forwarded to the requestor following the same path followed by the WANT messages. - - - Asymmetric routing: Messages do not follow the same path followed by the WANT message, and responses are directly forwarded to its original requestor. In this alternative, nodes follow a "fire-and-forget approach" where intermediate nodes only act as relays and don't track the status of sessions, the receiving node X-hops away answer the requestor node directly, and the only one tracking the state of the session is the originating peer (and maybe the directly connected peers while the session has not been canceled, so that if they see any of the requested blocks it can notify its discovery). When implementing this approach we have to also bear in mind that establishing connections is an expensive process so in order for this approach to be efficient we should evaluate when it is worth for nodes to open a dedicated connection to forward messages back to the original requestor. This does mean that the WANT messages need to have an additional field of “requester” so that the receiving node knows who to dial to deliver a block. - -Initially, the protocol will be designed using symmetric routing, and will explore other routing alternatives in the future work. When exploring symmetric routing we need to bear in mind that according to IPFS values, nodes shouldn't push content to other peers that haven't requested it. - -Again, this proposal should include schemes to avoid flooding attacks and the forgery of responses. It may be sensible to include networking information also in the request to allow easy discovery to forward responses X-hop away. - -## Implementation plan -- [X] Include TTL in WANT messages. Nodes receiving the WANT message track the session using relay sessions, reduce in one the TTL of the WANT message and forward it to its connected peers. Duplicate WANT messages with lower or equal TTL should be discarded to avoid loops (higher TTLs could represent request updates). WANT sessions should be identified at least with the following tuple: {SOURCE, WANT_ID} so nodes know to whom it needs to send discovered blocks. (See figures below for the proposed implementation of the symmetric approach). - -- [X] Test the performance and bandwidth overhead of this scheme compared to plain Bitswap for different values of TTL. - -- [ ] Evaluate the use of a symmetric and asymmetric routing approach for the forwarding of discovered blocks. - -- [ ] Consider the implementation of "smart TTLs" in WANT requests, so according to the status of the network, bandwidth available, requests alive, number of connections or any other useful value, the TTL is determined. - -## Implementation details -### Basic implementation -* An additional TTL field has been added to Bitswap WANT entries in Bitswap messages to -enable the forwarding of exchange requests to peers TTL+1 hops away. -* Bitswap is set with a defualt TTL of 1, so corresponding messages will only be forwarded -to nodes two hops away. -* Sessions now include a TTL parameter to determine how far their WANT messages can go. Sessions started within the peer (because the peer wants a block) are considered `direct`, while the ones triggered from the reception of a WANT mesages with enough TTLs are referred as `relay` (the peer is doing the work on behalf of another peer and it is not explicitly interested in the block). An `indirect` flag has also been added to sessions in case in the future a different strategy want -to be implemented for relay sessions (like the use of a degree to limit the number of WANT messages broadcasted to connected nodes to prevent flooding the network). Currently direct and relay sessions follow the exact same strategy for block discovery and transmission. - - -* All the logic around relay sessions is done in `engine.go`, `session.go`, `peerwantmanager.go`: - - Whenever a peer receives a WANT message from which it doesn't have the block and its TTL is not zero, it sends a DONT_HAVE right away, and it tells the relay session to start a discovery for those WANT messages with TTL-1. - - Whenever a new block or HAVE messages are received in an intermediate node for an active relay session, these messages are forwarded to the source (the initial requester). This action updates the DONT_HAVE status of the intermediate node so it is again included in the session. - - _We need to be careful, in the current implementation blocks from relay sessions are stored in the datastore for convenience, but they should be removed once all the interested relay sessions for the block are closed and they have been successfully forwarded to avoid peers storing content they didn't explicitly requested._ - - When receiving a HAVE the relay session will automatically send the WANT-BLOCK to the corresponding peers, we have identified the interest from every peer (including direct ones) so when a peer receives a block for an relay file it will automatically forward it to the source (there is no need to forward interest for WANT-BLOCKS because this is automatically managed withing the relay sessions). Relay sessions work in the same as direct sessions in this first implementation. - - -### Symmetric approach message flows -![](./images/ttl_slow.gif) - - -# Impact -We should expect a latency reduction in the discovery of content but it may lead to an increase in the bandwidth overhead of the protocol. We do not expect the increase in the bandwidth overhead to be substantial, given that response messages are not big in size - -## Evaluation Plan -- [ ] [The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) - - To evaluate the performance of this RFC we need a network where the `MAX_CONNECTION_RATE` of nodes is small, the number of passive nodes in the network (neither seeding nor leeching content) is high, and the number of seeders providing the content small. This will force content to be several hops away from leechers. Leechers should request the content all at the same time (if done in waves leechers in a wave would become seeders in the next wave and may add noise to the measurement). - -- [ ] An additional measurement to consider is to compare the times a node needs to resort to the DHT to find the content in plain Bitswap compared to the RFC (this would determine how effective the strategy is). - -## Prior Work -This RFC was inspired by this proposal. The RFC is based on the assumption that DHT lookups are slow and therefore is better to increase our “Bitswap span” than resorting to the DHT. It would be great if we could validate this assumption before considering its implementation. - -## Results -TBA - -## Future Work -Some future work lines to consider: - -- Combine with RFC|BB|L1-04 so apart from setting a TTL to WANT messages, every peer receiving a WANT message tracks it in its peer-block registry enhancing also the discovery scope with peer-block registries tables. - -- With a very high number of connections the network is effectively flooded, which is not something we want. We could envision this technique as an efficient alternative to keeping many (questionable quality) connections. [[slides](http://conferences.sigcomm.org/acm-icn/2015/slides/01-01.pdf)] - -- If we end up using request manifests as suggested in RFC | BB | L1/2-01, max TTLs could be specified in the exchange request message or determined according to the total connection of a peer to limit the network flooding. Even more, it'd be interesting to explore this RFC with RFC | BB | L1-06 so using GossipSub overlay network as a base, and according to scores and max connections of peers, WANT TTLs are determined. - -- Evaluate techniques used in GossipSub to fine-tune or enhance the use of WANT TTLs preventing the network from being flooded. Even more, an additional line of exploration could be devised in which GossipSub is used as the messaging infrastructure leveraged by Bitswap to exchange WANT messages. - -- Two concerns not addressed in the implementation of this RFC are: - - Privacy: The fact that WANT messages are exchanged to nodes certain hops aways scatters information about the content being requested by nodes. This is not a problem for the symmetric approach compared to Bitswap's baseline implementation, because there is no way to authenticate the source of the WANT request. New privacy concerns compared to the baseline arises when WANT messages include the requester of the content in order to be able to directly forward the content to them. - - DDoS attacks: This RFC would make it fairly easy for a malicious node to launch an amplification attack and it should be considered in future iterations of the implementation. An example of the attack: - 1. Create block "Block1" of maximum size on Node A - 2. Connect Node B to as many peers as possible - 3. Send request for Block1 from Node B to all peers with maximum TTL. This will cause Block1 to be passed around between all the nodes, so the attacker can amplify the attack's bandwidth. -A simple workaraound to this attack can be to inspect WANT messages and assign a budget to connected peers to prevent them from abusing the protocol. diff --git a/BEYOND_BITSWAP/RFC/rfcBBL104.md b/BEYOND_BITSWAP/RFC/rfcBBL104.md deleted file mode 100644 index 992cbfe..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL104.md +++ /dev/null @@ -1,67 +0,0 @@ -# RFC|BB|L1-04: Track WANT messages for future queries -* Status: `Prototype` -* Implementation here: https://github.com/adlrocha/go-bitswap/tree/feature/rfcBBL104 - -## Abstract - -This RFC proposes to leverage the knowledge acquired when receiving WANT messages sent by others in future requests issued by the peer tracking them. By keeping track of the WANT messages received, a peer will have the possibility to assert the likelihood of a peer having a block after a period of time, the rationale being: if someone asked for this in the past, they probably have it by now. With this information, a peer can issue queries to the peers that might have the block without having to enter in a discovery phase. - - - -## Shortcomings -Bitswap nodes currently send WANT messages blindly to all its connected peers. On the other hand, WANT messages include a lot of useful information about “recently accessed content” of a nodes’ connected peers. By tracking this information more directed and efficient searches of content can be performed. - -## Description -Every time a peer requests content to the network it sends WANT requests to all its connected peers. A lot of information about the content being shared in our surroundings can be extracted from the reception of these requests. This proposal is based on the assumption that if a node is requesting content it potentially will store it in the near future. - -With the implementation of this RFC, IPFS nodes will: -- Track all the WANT messages received and start building a “local view of the content”. We call this local view “peer-block registry” that is populated with information about the CIDs and the peers that have recently requested them. -- With the creation of the registry above, we then use it as a new content routing-like service, in which we first lookup on the registry to see if the CID has been previously requested in “our surroundings”. If this is the case, we send a WANT-BLOCK message directly to that peer. This WANT-BLOCK is sent along with the WANT list. - -With this simple scheme we are reducing to one the RTT required to request content previously accessed by my connected peers. Additionally, if applied to GraphSync, we can have a node fetch a file in one RTT by applying the selector in the CID - -As a second phase of this RFC, we intend to increase the “view” of content, connected peers can periodically share their peer-blocks registry to populate them with more CIDs and peers, even if they are not connected to them. For this scheme we need to come up with ways of limiting the level of spread of “inspection tables” (or we may end up having an alternative DHT) such that maybe I only accept updates to my “inspection tables” from nodes 2-hops away. We also need ways to collect feedback and “garbage collect” outdated information from these tables (or it may end up being useless for a large amount of the requests). - -Some of the known challenges to make this contribution efficient and effective are: -- Peers see potentially millions of WANT messages per day. The data structure containing this information should be compacted (e.g. using an accumulator) so that the overhead storing of it is low -- At the same time, seeking through this table must be fast as peers will need to query it for many blocks. The data structure should be both compact and fast to read. - -Initial explorations indicated that an HAMT with an accumulator like approach are good candidates for this job. - -## Implementation plan -- 🛠 Evaluate the use of HAMT and accumulators to easily access the peers from the structure that “potentially” has the CID. - - [x] Do a test with 100K files of 1MB to see the number of Wants received by a single node. - - 🛠 Show how naïve approaches may not work on the scale of decentralized. For this we will add a test in which we track the memory footprint of the two implementations of registries, the FlatRegistry and the HAMTRegistry for a large amount of files. - - [x] Use HAMTRegistry as an efficient data structure for the registry. - - ⚠ Current implementation of HAMTRegistry doesn’t include CHAMP modification. - - [ ] Evaluate the use of accumulators to access registry entries and analyze how changing the size of the prefix in the accumulator structure used affects the bandwidth, memory footprint of peers, and the chance of finding discovery. This will allow us to put a ceiling to the number of entries to be tracked in the registry and its overhead. -- [x] Implement WANT inspection and design the data structure used to track the data being exchanged in requests. -- [x] Design protocol followed by peers to leverage this data structure to include information from it in its requests (sending an optimistic WANT-BLOCK in a Bitswap session to nodes in the table who have seen the desired CID before). -- [x] Implement some basic unit tests to be used throughout the development and enhancements over the RFC. -- [x] Design a test evaluation in the testbed (Waves test case included). -- ⚠ Design the garbage collections and exchange schemes for these tables. - - FlatRegistry limits the maximum number of peers per CID allowed. For this Registry the garbage collection means cleaning the entries of outdated CIDs. - - HAMTRegisry updates the key with the new list of peers. There is a maximum number of entries allowed in each key. - - The garbage collection strategy will be defined according to the results of the memory footprint tests and the accummulator ceiling. - -![](./images/rfcbbL104.png) - -# Impact -We can expect the time to discover content in the network to be reduced. - -## Evaluation Plan -- [The IPFS File Transfer Benchmarks](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) -- [x] Create a test case that simulates the interest in a dataset by a growing population of nodes (e.g Use different waves of peers interested in a file). This will create the scenario in which the next wave will benefit from having the knowledge that the first wave might already have the file. - - [ ] Include noise in the test case. Along with the regularly accessed files, nodes request random CIDs to pollute their registries. - - [ ] Clear registries between run counts to remove advantage with files with similar blocks. -- [ ] Track memory footprint of peers. - -## Results -The results for the implementation of this RFC were reported here: https://research.protocol.ai/blog/2020/two-ears-one-mouth-how-to-leverage-bitswap-chatter-for-faster-transfers/ - -## Future Work -- Protocol to share peer-block registries between nodes to increase “local views”. -- A good idea for reducing the scope of the content we keep track of is to somehow monitor the latency to the node and keep track of content that lives nearby. -- We can go further and think of budget-based forwarding schemes where nodes can forward only up to a fixed amount of requests. We’ve investigated several funky content discovery strategies in these two papers: - - [On Demand Routing for Scalable Name Based Forwarding](http://conferences.sigcomm.org/acm-icn/2018/proceedings/icn18-final53.pdf) - - [A Native Content Discovery Mechanism for Information-Centric Networks](https://www.ee.ucl.ac.uk/~ipsaras/files/efib-icn17.pdf) diff --git a/BEYOND_BITSWAP/RFC/rfcBBL1201.md b/BEYOND_BITSWAP/RFC/rfcBBL1201.md deleted file mode 100644 index b520266..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL1201.md +++ /dev/null @@ -1,84 +0,0 @@ -# RFC|BB|L12-01: Bitswap/Graphsync exchange messages extension and transmission choice -* Status: `Draft` -* Implementation here: https://github.com/ - -## Abstract -This RFC proposes expanding Bitswap and Graphsync exchange messages with additional information. This information is used in content requests for receivers to be able to understand clearly the content requested; and in responses so responders can share his specific level of fulfillment of the request to the requestor. With this information the requestor can select the best nodes to perform the actual request to have the best performance possible in the transmission. - - - - - -## Shortcomings - -Bitswap and Graphsync’s current discovery process is blind and optimistic. An IPLD selector or a plain WANT list with the list of request CIDs are shared with connected peers hoping that someone will have the content. When a peer answers saying that it has the requested block, a subsequent request needs to be performed to get the rest of the blocks belonging to the DAG structure of the requested CID. The idea behind this RFC is to add a way for requestor and connected peers to give more directed feedback about the result of the request. - -## Description -To request content to the network, instead of sending plain WANT messages or an IPLD selector, the requests will include the following information: - -- Plain legacy request (want list or IPLD selector). This would allow this RFC to be backward compatible with existing exchange interfaces. - -- Parameters for the exchange protocol (such as "send blocks directly if you have them", or "send only leaf blocks", "send all the DAG structure for the root CIDs I send", or any other extension we may come up with). - -- Specific requirements (such as the minimum latency of the bandwidth desired for the exchange). - -- Any additional data that may be useful and that we can act upon at a protocol level. - -Nodes receiving this message will respond with the level of fulfillment of the request (number/range of blocks belonging to the request that the node stores , and if they fulfill or not the specified transmission requirements). This request can also include the list of blocks under the CID/IPLD select the request will eventually look for. No blocks are shared (except explicitly specified) in this exchange, it is only used as a way of "polling the surroundings" for the content. - -With this information, the requestor inspects the characteristics and percentage of fulfillment of all the responses and chooses the best peers to request the blocks from distributing the load depending on the nodes it is connected to, and to parallelize as much as possible the exchange. This offers peers an opportunity to try and find the optimal distribution of requests for blocks that maximizes the output. The transmission flow with the chosen peers is triggered through a TRANSFER message, where the desired blocks and the transmission parameters are specified (this opens the door to the use of compression, network coding and other schemes in the transmission phase). - -While the requester is receiving blocks through different transmission flows, it can trigger new rounds of discovery sending additional request messages to connected peers or selected peers in the DHT to increase the overall level of fulfilment or find better transmission candidates. The discovery and transmission loop will be permanently communicating. - -### Implementation - -Nodes receiving this manifest will answer with the level of fulfillment of the request. Upon reception of these responses, the node can start transmission requests to all the desired nodes. Meanwhile, we can resort to the DHT to send these exchange requests to peers we are not directly connected to. The flow of the protocol would be: - -- Send exchange requests (IPLD selector/list of blocks, network conditions, node conditions) to connected peers. - -- Receive responses: R1=50% fulfillment; R2=30% fulfillment; R3=5% fulfillment; We select the peers that lead to a larger level of fulfillment U(R1, R2)=75% fulfillment, and request the start of a transmission flow with them. Meanwhile, we resort to the DHT or perform an additional lookup to find the data pending for full fulfillment of the request. All of these phases should be in constant contact, so in case we receive better responses from peers we can act upon start new transmission or adapt to the conditions of the network. - -The above proposal may present a few shortcomings for which we would have to include schemes to prevent such as: - -- Reducing the number of RTTs when the number of blocks requested and their size is small. We need to include a way of merging the discovery and transmission phases to minimize the RTTs when appropriate. - -- For large files send only the first 2 layers in the response before the requestor triggers the transmission phase. - -- Use of accumulators in the level of fulfilment in responses to improve checks and the time between request and transmission phase. - -- Avoid response forgery. This is out of the scope of this RFC but is something worth exploring in the future work. - -## Implementation plan -- [ ] Include additional information for exchange requests in WANT messages. - -- [ ] Determine the basic structure of exchange requests, the information included in it, and how it will be leveraged by nodes. When designing these messages we need to ensure that it is compatible with existing WANT messages for backward compatibility. Thus, if an outdated Bitswap node receives an exchange request it still knows how to interpret the request. Along with this exchange request, the TRANSFER message should be designed. - - - [ ] Use of Graphsync selectors in WANT messages. - -- [ ] Design and implement the message exchange protocol for the content discovery and negotiation phases: - - - [ ] 1\. Send exchange requests and collect responses for content availability and network status. - - - [ ] 2\. Start transmission channels (TRANSFER) with peers fulfilling the request and keep the content discovery loop open in case better content servers appear (either because they are found through the exchange request broadcast, or because we chose to extend the lookup through the DHT and found better peers). - - - [ ] 3\. Fine-tune peer interaction for best performance. - -- [ ] Performance benchmark with plain Bitswap to fine-tune protocol configuration. - -- [ ] Implement more complex queries in request messages. - - - [ ] Use a utility function / score to evaluate the "best peers" for content discovery and transmission. - -# Impact -Adding these exchange request and negotiation phases opens the door to the clear differentiation of content discovery and transmission. This enables the inclusion of new schemes to optimize both levels according to the needs of an application. It will also enable the parallelization of many processes in current exchange interfaces. It also enables a way for clients to influence the operation of the protocol. - - -## Evaluation Plan -- [The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) - -## Prior Work - -## Results - - -## Future Work diff --git a/BEYOND_BITSWAP/RFC/rfcBBL1205.md b/BEYOND_BITSWAP/RFC/rfcBBL1205.md deleted file mode 100644 index 0b817aa..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL1205.md +++ /dev/null @@ -1,49 +0,0 @@ -# RFC|BB|L1/2-05: Use of super nodes and decentralized trackers -* Status: `brainstorm` - -### Abstract - -This RFC proposes the classification of nodes in different types according to their capabilities, and the use of side-channel information to track and discover content in the network. We propose the use of decentralized trackers (with good knowledge of where content is stored in the network and a discovery service for "magnet links"), and supernodes (nodes with high bandwidth and low latency which can significantly improve the transmission of content). Thus, nodes can follow different strategies to speed-up the discovery and transmission by "looking-up" content in decentralized trackers and delegating the download of content to near supernodes. - -This RFC will leverage the "high-quality" infrastructure deployed by entities such as Pinata, Infura or PL. We need to acknowledge the existence of this "high-class" nodes and leverage them to improve the performance of the network. - -### Description - -Introduce in the network the concept of supernodes and decentralize trackers. - -- Supernodes are nodes with high bandwidth, low latency and a good knowledge of where to discover content in the network. Regular nodes would prioritize connection to super nodes as they will speed their file-sharing process. This could be seen as "decentralized gateways" in the network. - -- Decentralized trackers: Similar concept to the one of the "Hydra Boost". These nodes are passive nodes responsible for random walking the network for content and listening to WANT messages or any other additional announcement of metadata exchange devised for content discovery. - -Nodes would point decentralize trackers to speed their content discovery and supernodes (if one of them end up being the provider of the content) to increase the transmission. - -We could envision the use of side channel identifiers for content discovery, equivalent to "magnet links", which instead of pointing to the specific content, it points to the decentralized tracker that can serve your request better. These mangent links should be "alive" and update with the status of the network. Thus, we could have: - -- `/ipfs/` identifiers directly pointing to content. - -- `/iptrack/`: Points to the tracker that may node where to find the content. - -- Additionally, the tracker could answer with `[/p2p/Qm.., /p2p/Qm..]` with a list of supernodes that would lead to a faster download of the content. - -### Prior Work - -This is similar or can be linked to the [RFC: Side Channels aka DHT-free Content Resolution from this document.](https://docs.google.com/document/d/1QKso-VwYv9jLxTN7WP_RAArrOLCZwjqdjBKQA2wa3VY/edit#) - -This paper: [2Fast: Collaborative downloads in P2P networks](http://www.st.ewi.tudelft.nl/iosup/2fast06ieeep2p.pdf) proposes the idea of delegating the download of content to a group of nodes. We could consider the implementation of a "grouping scheme" for supernodes in which a node can request a group of supernodes to help him download content. This same grouping strategy could be considered for plain nodes as an independent RFC (combination of ideas presented in [RFCBBL207](./rfcBBL207) and [RFCBBL208](./rfcBBL208)). - -### Implementation Plan - -- [ ] Implementation of super-nodes and the download delegation protocol. - -- [ ] Implementation of decentralized trackers and magnet links protocol. - -- [ ] Evaluation of different discovery and transmission strategies using this network hierarchy. - -- [ ] Group of supernodes strategy. - -### Evaluation Plan - -- [The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) - -### Impact - diff --git a/BEYOND_BITSWAP/RFC/rfcBBL203A.md b/BEYOND_BITSWAP/RFC/rfcBBL203A.md deleted file mode 100644 index 43c5953..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL203A.md +++ /dev/null @@ -1,109 +0,0 @@ -# RFC|BB|L2-03A: Use of compression and adjustable block size -* Status: `Prototype` -* Implementation here: https://github.com/adlrocha/go-bitswap/tree/feature/rfcBBL203A -* Compression in libp2p: https://github.com/adlrocha/go-libp2p-compression-examples - -## Abstract -This RFC proposes the exploration of using compression in block transmission. These techniques go from: -Block by block standard compression (e.g. gzip) -Whole transfer compression (e.g. when responding to a graphsync query, send all the blocks compressed) -Custom coding tables for sequences of bytes that appeared often (e.g. generate an Huffman table for all the protobuf headings so that these are compressed by default, like hpack does for http) - -Additionally, to optimize the use of these schemes, a system of adjustable block sizes and coding strategies in transmission could be devised (e.g. dynamic Huffman tables). - - - - - -## Shortcomings -Blocks in IPFS are exchanged without the use of compression, this is a huge opportunity loss to minimize the bandwidth footprint and latency of transferring a file. For context, even minimal web assets are transmitted compressed through HTTP to increase website loading performance, most of them are below 256KiB, which is IPFS default block size. We expect to see several gains in transmission times. - -## Description -Current implementation of file-sharing protocols may benefit from the use of on-the-fly compression to optimize the use of bandwidth and optimize the transmission of content. -Even more, when using the “Graphsynced” approach in the discovery of content, where we request peers for the level of fulfillment of an IPLD selector, we can request all the blocks for the IPLD selector to be compressed in the same package and forwarded to the requestor. - -Some of the compression approaches to be explored in this RFC are: -* Block by block standard compression (e.g. gzip): Every block (and optionally every single Bitswap message) is compressed. Get inspiration from web compression. -* Whole transfer compression: All the blocks requested by a peer in a Wantlist or a graphsync IPLD selector are compressed in the same package. -* Custom coding tables for sequences of bytes that appeared often (e.g. generate a Huffman table for all the protobuf headings so that these are compressed by default, like hpack does for http). -* Use of “compressed caches” so that when a specific content has been identified as “regularly exchanged”, instead of having to compress it again it can be retrieved from the cache. This scheme may not be trivial. -* Use of different compression algorithms. -* Use of different block sizes before compression. - -## Implementation plan -- [x] Perform a simple test to evaluate the benefits of "on-the-fly" compression on blocks (to determine if IPFS could benefit from directly exchanging compressed messages and blocks). Evaluate different compression algorithms used in the web. - - - [x] Evaluate the compression of full Bitswap messages (`bs.compressionStrategy = "full"`): To achieve this we add a compression flag in [Bitswap messages](https://github.com/adlrocha/go-bitswap/blob/master/message/message.go) to be able to identify when messages are compressed. Afterwards, if compression is enabled we need to [compress the message](https://github.com/adlrocha/go-bitswap/blob/d151875a94048c3db59de52b9cb99d0246d74613/network/ipfs_impl.go#L240) before sending it. Compressed messages are identified in the [newMessageFromProto](https://github.com/adlrocha/go-bitswap/blob/d151875a94048c3db59de52b9cb99d0246d74613/message/message.go#L199) of receiving peers, they are uncompressed and processed seamlessly by Bitswap. In order to open the door to the use different compression algorithms and different full-message compression strategies a compressionType has been added to [message.proto](https://github.com/adlrocha/go-bitswap/blob/master/message/pb/message.proto). - - - [x] Evaluate the compression of blocks only (`bs.compressionStrategy = "blocks"`): We compress each block before adding it to the protobuf message in [ToProtoV1](https://github.com/adlrocha/go-bitswap/blob/d151875a94048c3db59de52b9cb99d0246d74613/message/message.go#L583) function of message.go, and then uncompress them in [newMessageFromProto](https://github.com/adlrocha/go-bitswap/blob/d151875a94048c3db59de52b9cb99d0246d74613/message/message.go#L199). For the compression of blocks, the only thing that is changed for the transmission of the block is the RawData, the CID is kept without change so the block is conveniently identified. - - - [x] Use [GZip](https://golang.org/pkg/compress/gzip/) as the default compression algorithm (`engine.compressor = "Gzip"`). - - - [x] Instead of compressing fields of the protobuf message, evaluate the compression of the full stream in the [bitswap network](https://github.com/adlrocha/go-bitswap/blob/d151875a94048c3db59de52b9cb99d0246d74613/network/ipfs_impl.go). - * We may choose to use a multicodec to signal that a stream is compressed and evaluate the fact that instead of using a prefix to signal the size of sent messages, in order to be able to leverage streams, use multicodec and `KeepReading` and `EndOfStream` signals in protocol streams so there is no need to know the size of the compressed message beforehand. - - - [ ] Evaluate other compression algorithms (Brotli and gzip are the best alternative, but in case we want to test with other algorithms): - - - [ ] [ZDAG](https://github.com/mikeal/zdag) Blocks - - - - - [ ] Brotli compression (No Golang implementation, [incomplete implementation](https://github.com/dsnet/compress)) - - - [ ] Compare computational footprint of above implementations - -- [ ] Design and evaluate a simple scheme to "gather" WANT requests and create fully compiled/network coded responses to the requestor. This involves several tasks - - - [ ] A way of matching several blocks with a request (graphsync, wantlist). - - - [ ] Perform the compression operation. This may be computationally expensive and would need some time, and depending on the data, when the prepared response is ready to send, the requestor may have already received all the blocks conforming his desired content. - -- [ ] Evaluate how the use of different block sizes and compression may benefit performance. - -- [ ] Include a "compression parameter" in exchange requests to signal peers that you want blocks to be transmitted using a specific compression technique. - -- [ ] From the preliminary tests performed after the above implementations, evaluate if the protocol could benefit from the use of Huffman tables and compressed caches. - -## Implementation details - -* Block compression: Files within Bitswap are exchanged in the form of blocks. Files are composed of several blocks organized in a DAG structure (with each block having a size limit of 256KB). In this compression approach, we compress blocks before including them in a message and transmitting them to the network. -* Full message compression: In this compression strategy instead of only compressing blocks we compress every single message before sending it. It is the equivalent of compressing header+body in HTTP. -* Stream compression: It uses compression at a stream level, so every byte that enters a stream from the node to other peers is compressed (i.e. using a compressed writer). - -* To drive the compression idea even further, we prototyped a `Compression` transport into libp2p (between the `Muxer` and the `Security` layer) so that every stream running over a libp2p node can potentially benefit from the use of compression. This is a non-breaking change as the `transport-upgrader` has also been updated to enable compression negotiation (so eventually anyone can come with their own compression and embed it into libp2p seamlessly). Some repos to get started with compression in libp2p: - - Compression example: https://github.com/adlrocha/go-libp2p-compression-examples - - Gzip compressor: https://github.com/adlrocha/go-libp2p-gzip - - Testbed to test compression over IPFS: https://github.com/adlrocha/beyond-bitswap/tree/feat/compression - -[See a discussion on the results here](https://github.com/protocol/ResNetLab/issues/5). - - -# Impact -A reduction of latency due to compressed transmissions. Potential increase in computational overhead. - -## Evaluation Plan -- [The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) - -- See the computational footprint of different compression strategies and algorithms. - -- Compare the data sent and received using compression and with baseline Bitswap. - -## Prior Work - -This RFC takes inspiration from: -* [Dropbox’s work on compression](https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less) -* [HPACK](https://blog.cloudflare.com/hpack-the-silent-killer-feature-of-http-2/) -* [HTTP Compression](https://developer.mozilla.org/en-US/docs/Web/HTTP/Compression) -* [Choose best compression algorithm assisted by AI](https://vks.ai/2019-12-05-shrynk-using-machine-learning-to-learn-how-to-compress) -* [Thorough benchmark of different GZip modes](https://www.rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison/) - - -## Results -The results for the implementation of this RFC were reported here: https://research.protocol.ai/blog/2020/honey-i-shrunk-our-libp2p-streams/ - -## Future Work -- If the use of exchange requests and the negotiation phase for content transmission (RFC | BB | L1/2-01) is implemented, it makes sense that once identified a specific peer (or a group of them) as the ones storing a large number of the desired blocks, to request more advanced compression and network coding techniques for their transmission. - -- Detect the type of data being exchanged in blocks and apply the most suitable compression for the data type, such as [image-specific compression ](https://developers.google.com/speed/webp/docs/compression)if images are being exchanged (for this approach, a node will need to have all the blocks for the data). diff --git a/BEYOND_BITSWAP/RFC/rfcBBL203B.md b/BEYOND_BITSWAP/RFC/rfcBBL203B.md deleted file mode 100644 index db06105..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL203B.md +++ /dev/null @@ -1,65 +0,0 @@ -# RFC|BB|L2-03B: Use of network coding and erasure codes. -* Status: `Brainstorm` - -### Abstract - -This RFC proposes the exploration of applying network coding and erasure codes to the content exchanged by peers. These techniques go from: -- The use of erasure codes in the transmission of blocks so they can be requested from different sources, and the original content can be regenerated even without the reception of all the blocks. -- The use of rateless codes to make all blocks for a specific content equally valuable. -- The use of erasure codes for storage (such as Reed Solomon). - -These techniques could lead to additional improvements by including a negotiation phase in the exchange interface (see [RFC|BB|L1/2-01](./rfcBBL1201)). - -### Shortcomings - -In order to recover the content requested, peers need to receive every block of the content's DAG. This means that if just a single block is lost, is too rare, or it is not in the network anymore, it can lead to increased transmission times or in the worst case making the content "unretrievable". The use of erasure coding and network coding can benefit the discovery and transmission of blocks (especially if they are rare), making the content exchange more resilient to unforeseen events. These techniques also improve the transmission of content from several sources. - -This RFC becomes really interesting in networks with high churn and large files. The aim is to parallelize the transmission from different sources. - -### Description - -Several nodes may receive complementary WANT messages from different connected peers. Instead of requesting the content from just one source, or explicitly requesting it from all of them potentially producing duplicates in the network, we could benefit from the use of network coding to enhance the transmission from the multiple sources. - -We can really benefit from the fact that more than one peer may store the content exploring the use of techniques such as: - -- The use of erasure codes and network coding in the transmission of blocks so they can be requested from different sources and the original content can be regenerated even without the reception of all the blocks. Peers can send a linear combination of coded blocks so that the requestor is able to recover the content even if it doesn't receive all the original blocks. This can lead to improvements in transmission and the removal of duplicates in the network (the redundancy and linear combination used in block transmission can be related to the amount of duplicates and the split factor used by sessions). - -- The use of rateless codes to make all blocks for a specific content equally valuable. If several sources serve the content coded using rateless code, every block is equally valuable, and as long as a minimum number of them are received, the content can be recovered. - -- The use of erasure codes for storage (such as Reed Solomon). It adds a storage overhead but allows to regenerate the original content even if all the blocks are not retrieved. The proposal is to store blocks using their original CID (so their identifier doesn't change) but use Reed Solomon to code the content. This would increase the size of blocks, and poses several limitation on the codes to use to generate the Reed Solomon redundancy. - -Using the aforementioned techniques, several seeders fulfilling the request for content would be able to encode blocks and stream them so peers can receive blocks from different sources and reconstruct the original content once a minimum number of blocks have been received. This is a good way of parallelizing the transmission of blocks from different sources before [RFC|BB|L1/2-01](./rfcBBL1201). A problem to be solved to implement this RFC is how to orchestrate peer serving the request (the linear coding applied to the content needs to be deterministic). With RFC | BB | L1/2-01 more complex requests for blocks could be performed. - -### Prior Work - -This RFC takes inspiration from: - -- [This paper](https://www.mdpi.com/2076-3417/10/7/2206/htm) - -- Rateless coding. Check [this document](https://docs.google.com/document/d/1PdfuPZs5ti7u67R9p4lZl_JFBzk477CjmruiWbLQr4U/edit#heading=h.lrqjoh4tz0t6) and [Petar's paper](http://www.scs.stanford.edu/~dm/home/papers/maymounkov:rateless.pdf) for inspiration. - -- HackFS project on [Reed Solomon over IPFS](https://github.com/Wondertan/go-ipfs-recovery). - -### Implementation Plan - -- [ ] Evaluate potential improvements and overhead of using [IPFS Recovery](https://github.com/Wondertan/go-ipfs-recovery). - -- [ ] Evaluate the use of rateless coding (or alternatives not IP protected). With rateless codes we can generate check blocks from the content desired and requested from different nodes so that as long as we receive a minimum number of them we can generate the original information. This could potentially remove duplicates blocks. - -- [ ] If [RFC|BB|L1/2-01](./rfcBBL1201) ends up being implemented, more complex ideas could be evaluated at this end. Discovery and transmission would be two distinct stages, so nodes could eagerly request a compressed or networked coded transmission from a set of nodes. - -### Impact - -Improved transmission leveraging multiple streams, more reliable exchanges, and potential removal of duplicates in the network. - -### Evaluation Plan - -- [The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl) - -- Test case where there are several seeders with the same content and leechers are connected to several of them. - -### Future Work - -If the negotiation phase from [RFC|BB|L1/2-01](./rfcBBL1201) is implemented, additional communications between seeders and leechers could be performed to enhance the use of these techniques. Thus, if a peer receives an overlapping level of fulfilment for its request from different sources, it can trigger the use of network coding and rateless codes so that, with a minimum number of blocks from both of the sources, the requested content can be retrieved. - -Additionally, the use of "in-path" coding could be devised as future work, where intermediate nodes in a path upon the reception of several blocks for which fulfill the same request from different sources combine them to enhance the transmission (this requires further exploration). The impact of this improvement would significantly benefit [RFC|BB|L102](./rfcBBL102), where nodes can trigger relay session to request blocks on behalf of other nodes. \ No newline at end of file diff --git a/BEYOND_BITSWAP/RFC/rfcBBL207.md b/BEYOND_BITSWAP/RFC/rfcBBL207.md deleted file mode 100644 index 048dbc6..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL207.md +++ /dev/null @@ -1,48 +0,0 @@ -# RFC|BB|L2-07: Request minimum piece size and content protocol extension -* Status: `Brainstorm` -* Implementation here: https://github.com/ - - - -## Abstract -This RFC introduces the concept of pieces as an indivisible aggregation of blocks. In order not to modify the underlying chunking and storage of blocks by IPFS, we propose the use of pieces as an aggregation of blocks. Pieces will have a unique identifier independent from those of their underlying blocks so that they can be accessed independently. Thus, if a piece is requested, only nodes having all of the blocks of the piece are allowed to perform the transmission. - - -## Shortcomings -IPFS block size of 256 KB favors deduplication but can harm the efficiency of content discovery and transmission. Evaluating different block sizes or the use of dynamic block sizes could benefit the performance of file-sharing in the network. - - -## Description - -Wanlist messages could include a "minimum piece size" parameter, specifying that only peers with enough blocks to fill a full piece should be allowed to answer with a HAVE message. Blocks are still stored using the same size, but this allows us to identify nodes that will serve us at least a certain amount of blocks. This can become really interesting once RFC | BB | L1/2-01 is implemented and in combination with RFC | BB | L2-03 so we can choose the piece size, and in accordance with this, select the best compression or network coding algorithm to be used in the transmission. - -We can introduce the concept of piece as an irreducible structure where underlying blocks for a piece are not assigned a CID so the piece is only identified by the root CID of the structure. - -- `/ipfs/`: Every block is identified by a CID. - -- `/ipfs/pc/`: Identifies a piece. Pieces can be seen as irreducible groups of blocks. Some nodes may store some of the blocks conforming the piece, and they can be requested directly through their CID. However, if a /ipfs/pc/ request we are signalling that only nodes storing the full piece should send the data. For large pieces this could lead for content to become rare in the network. Fortunately, you can always resort to finding blocks one by one through /ipfs/ identifier. Hence: - - - Add `/ipfs/pc/` content chunks data in pieces of the specific size, and chunks these pieces in different blocks. The add request sent to the network is /ipfs/pc/ for every piece, indicating that all the blocks conforming a piece should be stored in the same node. Others can store sub-blocks for the piece, but add operations for pieces are performed so that all the blocks are stored as a single data unit. - - - Requesting `/ipfs/pc/` finds full pieces before finding individual blocks pending. - - - If no one has the full piece, the node needs to increase the granularity of the request and search by blocks. This adds an additional layer of abstraction for content identification and discovery. - -This scheme would allow us to represent larger blocks of content without removing the benefits of a 256KB block. - -Linking with this idea, we could think of a multiblock system analogous to libp2p's multistream, where anyone could implement his own DAG/Block structure while the underlying structure used (CIDs, 4KB blocks) is maintained, the same way multistreams are built over TCP/UDP/QUIC or existing protocols. Hosts already know how to talk TCP, and they build protocols over it, here IPFS nodes know how to talk 4KB blocks and IPLD, but we allow the implementation of new schemes over this foundation. - -## Implementation plan -TBD - -# Impact - -## Evaluation Plan - -## Prior Work - - -## Results - - -## Future Work diff --git a/BEYOND_BITSWAP/RFC/rfcBBL208.md b/BEYOND_BITSWAP/RFC/rfcBBL208.md deleted file mode 100644 index 1ba3eee..0000000 --- a/BEYOND_BITSWAP/RFC/rfcBBL208.md +++ /dev/null @@ -1,33 +0,0 @@ -# RFC|BB|L2-08: Delegate download to other nodes (bandwidth aggregation) -* Status: `Brainstorm` - -### Abstract - -This RFC proposes the delegation of discovery and downloads to other nodes in the network with better capabilities than ours. Before triggering the download of a file, a node can send a TRANSMISSION_GRAFT message to other peers to request them to become part of its file-sharing cluster. Nodes in a file-sharing cluster search and download content in-line. - -### Shortcomings - -A node alone may have a hard time discovering and downloading all the blocks of a file, especially if it has limited network capabilities and it is trying to download large files. By inviting other nodes to be part of the transmission process, resources are aggregated and the processes can be parallelized. - -### Description - -Basic ideas to develop: - -- Nodes can ask other peers to become part of their cluster. -- Whenever someone in the cluster wants to get content, the list of blocks are distributed between peers in the cluster so each is responsible for the download of part of it. We can set a level of overlap between the blocks distributed to each peer so that even if a peer doesn't get all the blocks the original content can be recovered. -- When a peer downloads all its assigned blocks, it opens a transmission channel with the requestor and directly sends the downloaded blocks. -- Clusters may be created on-demand (a peer requests to build a cluster for a specific file), or a "default cluster" may be created by default according, for instance, to the existing GossipSub mesh a peer belongs to (following a similar approach to [RFC|BB|L1-06: Content Anchors](https://github.com/protocol/ResNetLab/issues/6)). - -## Implementation plan - -# Impact - -## Evaluation Plan - -## Prior Work -* This RFC can leverage (or even be combined with) the work done in [RFCBBL102](./rfcBBL102) where a TTL was added in Bitswap messages so nodes can discover and request content on behalf of other nodes. -* A similar idea to this one was also discussed in [this issue](https://github.com/ipfs/notes/issues/386. - -## Results - -## Future Work \ No newline at end of file diff --git a/BEYOND_BITSWAP/RFC/template.md b/BEYOND_BITSWAP/RFC/template.md deleted file mode 100644 index d391bef..0000000 --- a/BEYOND_BITSWAP/RFC/template.md +++ /dev/null @@ -1,23 +0,0 @@ -# RFC|\: \ -* Status: `Draft` -* Implementation here: https://github.com/adlrocha/ - -## Abstract - - - -## Shortcomings - -## Description - -## Implementation plan - -# Impact - -## Evaluation Plan - -## Prior Work - -## Results - -## Future Work diff --git a/OPEN_PROBLEMS/DECENTRALIZED_DATA_DELIVERY_MARKETS.md b/OPEN_PROBLEMS/DECENTRALIZED_DATA_DELIVERY_MARKETS.md new file mode 100644 index 0000000..e44f347 --- /dev/null +++ b/OPEN_PROBLEMS/DECENTRALIZED_DATA_DELIVERY_MARKETS.md @@ -0,0 +1,562 @@ +# Decentralized Data Delivery Markets (3DMs) - Open Problem Statement + + + + +- [Overall](#overall) + - [Short description](#short-description) + - [Long description](#long-description) + - [Definition of the actors](#definition-of-the-actors) + - [Expected requirements](#expected-requirements) +- [Areas of Work](#areas-of-work) + - [Data Delivery Metering & Fair Exchange](#data-delivery-metering--fair-exchange) + - [How it is done traditionally](#how-it-is-done-traditionally) + - [Properties desired](#properties-desired) + - [How is Data Delivery Metering different from Fair Exchange](#how-is-data-delivery-metering-different-from-fair-exchange) + - [Known challenges](#known-challenges) + - [State-of-the-art](#state-of-the-art) + - [Pay-per-packet](#pay-per-packet) + - [Lock/Unlock access to the resource](#lockunlock-access-to-the-resource) + - [Optimistic Fair Exchange](#optimistic-fair-exchange) + - [Reputation based](#reputation-based) + - [Privacy focused](#privacy-focused) + - [Traditional approaches (close to Web 2.0 world)](#traditional-approaches-close-to-web-20-world) + - [Known attacks to be mitigated](#known-attacks-to-be-mitigated) + - [New ideas being explored](#new-ideas-being-explored) + - [Distribution Graph Forming](#distribution-graph-forming) + - [State-of-the-art](#state-of-the-art-1) + - [DHT-based:](#dht-based) + - [Name-based routing:](#name-based-routing) + - [DNS-style, object-level indexing service:](#dns-style-object-level-indexing-service) + - [PubSub-style:](#pubsub-style) + - [Known attacks to be mitigated](#known-attacks-to-be-mitigated-1) + - [New ideas being explored](#new-ideas-being-explored-1) + - [CryptoEconomic model for Data Delivery](#cryptoeconomic-model-for-data-delivery) + - [How it is done traditionally](#how-it-is-done-traditionally-1) + - [Properties desired](#properties-desired-1) + - [State-of-the-art](#state-of-the-art-2) + - [Economics of Hybrid CDNs and P2P networks](#economics-of-hybrid-cdns-and-p2p-networks) + - [Game theoretical models and reward/reputation systems in P2P networks](#game-theoretical-models-and-rewardreputation-systems-in-p2p-networks) + - [Credit networks and Token Designs](#credit-networks-and-token-designs) + - [Auctions, decentralized markets, and efficient resource allocation](#auctions-decentralized-markets-and-efficient-resource-allocation) + - [Known attacks to be mitigated](#known-attacks-to-be-mitigated-2) + - [New ideas being explored](#new-ideas-being-explored-2) + - [Additional: Opportunistic deployment](#additional-opportunistic-deployment) + - [State-of-the-art](#state-of-the-art-3) + - [Transient Providers](#transient-providers) + - [Known shortcomings](#known-shortcomings) + - [New ideas being explored](#new-ideas-being-explored-3) + + + +# Overall + +## Short description + +With the emergence of Decentralized Storage Networks and the rapid decrease in the price of storage services and hardware, there is a rapidly growing need to leverage the additional storage capacity contributed to Decentralized Storage Networks by new players, including end-users, and use it to deliver reliable and high-quality storage and delivery services. Similarly to Content Delivery Networks (CDNs) for the traditional Cloud Storage market, we now have the opportunity to build **Decentralised CDNs** for Decentralised Storage Networks. The Decentralized Data Delivery Markets (3DMs) Open Problem covers all the essential areas of work that need to be studied in order to create **a fully permissionless free market for data delivery that supports fair data exchange** on the service provided. + +## Long description + +Serving content globally at scale is a hard technical problem, as evidenced by the multiple decades of innovation and improvements in the Content Delivery Networks field. This challenge becomes even more interesting once we move away from centralized cloud infrastructure, which is typically managed and monitored by a single party, to a decentralized network that is permissionless, possibly anonymous, constantly changing (i.e. with high node churn) and lacking access to the convenience of a third party mediator system that facilitates the fair exchange of goods (e.g. credit providers). The benefits of moving to a decentralised setting are significant, however: cheaper storage and delivery, resilience against business failures (i.e., the network and all its components are not dependent on business decisions made by a single entity), independence from the personal data-driven business models, as well as a significantly lower barrier to entry for new players who want to contribute. + +At the same time, problems already addressed in the past for traditional CDNs reappear, albeit in a different dimension, as we move to a decentralized network design space. These problems include the global orchestration of the system, the efficient allocation and use of the resources, and clear visibility of the content being retrieved from the network. + +We introduce Decentralized Data Delivery Markets (3DMs) as a new field of research and innovation and one with rapidly increasing relevance. Decentralised storage networks, such as Filecoin, have reached unprecedented storage capacity commitments (of over 2.5EB of cold storage in Jan 2021) and continue to grow rapidly. There is an urgent need to complement decentralised storage with decentralised data delivery as these storage networks seek a solution to deliver the data stored in their network to end-users while meeting the expectations users have of the centralised services of today. + +We envision a 3DM as being: + +* A permissionless and open data delivery market, composed of one or more crypto-economic models +* A fair exchange relationship between service providers and users +* A cost efficient way for publishers to distribute their content +* A unique set of business opportunities including: + * Delivering data to end users + * Carrying data between multiple disconnected locations or with high latency between them (Data Muling and/or Package Switching) + * Creating ephemeral distribution networks for large gatherings + * Powering the next generation of Web 3.0 applications + +This new field is ripe with unique business opportunities given the growing demand from users for access to large datasets, videos, and other data-heavy content, the panoply of powerful mobile devices that users carry, and the limitations imposed by physics in ultimately moving increasingly large data at high speeds. + +In this Open Problem definition, we introduce the multiple areas of research (or subOpenProblems), what we know so far for each one of them and link to some new ideas being explored. + +### Definition of the actors + +For convenience and shared understanding, we first define the agents within a 3DM network. These are: + +* **Clients** - Agents that fetch data from Providers. +* **Providers** - Agents that offer data delivery services to Clients. +* **Content Publishers** - Agents that create content and want to have it distributed (or intermediaries thereof). + +### Expected requirements + +We present this list as a guide to protocol designers of what we expect a 3DM implementation to meet. This list is non-exhaustive and presents some flexibility between MUST and SHOULD. The requirements identified are: + +* **_MUST be Decentralized and not just federated_** + * Anyone should be able to join and leave at any time, without requiring permission +* **_The exchanges of value_** **_MUST be verifiable_** + * The payment for bandwidth/latency in token should match what has been agreed and fulfilled + * The payment should be fulfilled if the SLA is fulfilled + * Parties should be able to verify that the service provided was correct (e.g. that they received the right file) +* **_SHOULD be Trustless_** + * Ideally, the network can perform the operations without having to leverage trust in third parties and/or the participants of the exchange + * Nevertheless, designs might benefit from an element of trust to increase the quality of service (e.g. reputation mechanisms may allow for relaxed constraints) +* **_The network SHOULD do resource allocation in an efficient way_** (or as efficiently as possible) + * Should not be an afterthought +* **_Providers SHOULD coordinate and accept pre-fetching of content (warm up the caches)_** + * This is in contrast to IPFS’s core principle of replicating only after explicit request by a peer + * This is done in order to make the network more efficient. However it is not mandatory with the setting being left up to the Provider. + +Additional expectations (i.e. bonus points): + +* Data retrieval SHOULD be anonymous and privacy-preserving. + +Other considerations: + +* It is expected that content will always be identified and retrieved by CIDs +* The provider network will likely not provide random access to files, i.e., items will be named at the object level. +* In case there is an auction market, it should always be available and have a fast way to match offers to bids + +# Areas of Work + +Throughout the exploration of the design space, we identified 3 areas of work that need to be explored in order to make 3DMs Fair, Decentralized and Efficient, as well as a bonus opportunity. The areas are listed below. + +## Data Delivery Metering & Fair Exchange + +We believe that this area of work is the main crux for Decentralized Data Delivery Markets. Without it, it will be impossible to have a fully permissionless, decentralized, and open market for data delivery. + +#### How it is done traditionally + +In traditional or Web 2.0 setups (e.g. CDNs, Mirrors, Static Servers and so on), the correct metering happens on the server and is handled by the provider, which the clients trust to perform correct measurements. This measurement translates into a statement of how much the service was, ultimately resulting in an invoice and request for payment. This payment is facilitated by a trusted third party (e.g. credit card company) and a legal contract that can be used to resolve disputes in case the client fails to pay to the provider and/or the provider fails to deliver the service promised to the client. + +In a decentralized environment that strives to be permissionless, we can’t expect to rely on such legal agreements and third parties and, therefore, need a way to prove fair exchange. That is, we need to be able to prove that the service was delivered correctly and the metering of the provided service was done correctly. This will, in turn, verify the exchange between the two parties and issue the correct payment (i.e. fair exchange). + +This trustless property is really important as we know that markets with no mediators, escrows, and other assurance services (e.g. underground markets) are prone to scams, as the users have to trust that the provider will execute on the agreement. In a trustless environment, once the transfer is completed, there is no way to dispute it. + +#### Properties desired + +* The exchanges of value MUST be verifiable and fair + * Fairness: + * The payment MUST be fulfilled if the SLA is fulfilled + * The payment for bandwidth/latency SHOULD match what was agreed and provided + * Verifiability: + * Both parties MUST be capable of verifying that the exchange was performed correctly + * Bonus: Third parties SHOULD be able to verify that the exchange was performed correctly +* SHOULD be Trustless + * Ideally, the network can perform the operations without having to leverage Trust + * Nevertheless in many designs it might be needed to leverage reputation as a trust vehicle to guarantee high quality of service and/or reliability. + +#### How is Data Delivery Metering different from Fair Exchange + +The field of Fair Exchange overlaps in part with Service Metering. Some notable differences are: + +* Fair Exchange can be used for non-digital goods also +* Fair Exchange is traditionally between 2 parties +* Metering should be available to properly measure the service quality and service provided by one or more parties (e.g. streaming from multiple endpoints) + +In review, the Metering & Fair Exchange fields end up contributing to each other and it is likely that the intersection of both is needed for a sound solution. + +#### Known challenges + +* Fairness + * Making it Trustless + * How to verify that the provider has the ability to delivery the file (e.g. has a local copy or is able to pull it from the network) + * How to verify that the file being transferred is the one requested, before a large amount of resources has been invested (i.e. before the entirety of the file has been transmitted) + * How to verify that the client has received the file + * How to verify and reward that SLA were met? + * How to avoid collusion when adding a third-party (e.g. Referee)? + * How to avoid griefing attacks (make one party pay for fees)? + * How to avoid a malicious actor causing un-rewarded work, hence wasting others’ resources? +* Experience + * How to make the transfers start instantaneously? + * How to support others (e.g. Content Publishers) paying for the usage? + * How to make it private (i.e. so that others don’t know which users are requesting what content)? +* Performance + * How to overcome send-and-halt in order to maximize bandwidth throughput + * How to support multipath (i.e. fetching from multiple sources) + +### State-of-the-art + +When it comes to the state of the art, we found that the existing solutions fall within one of the following categories: + +#### Pay-per-packet + +Pay-per-packet solutions offer granular control over what gets paid, enabling the option to verify if the SLA is being fulfilled. + +These type of solutions have been found/highlighted: + +- [Filecoin Retrieval Market](https://spec.filecoin.io/#section-systems.filecoin_markets.retrieval_market) +- [Theta Whitepaper](https://s3.us-east-2.amazonaws.com/assets.thetatoken.org/Theta-white-paper-latest.pdf?v=1612955013.791) + +**Known shortcomings of these approaches:** + * send-and-halt - Because the next packet (unit of service) will only be issued after payment for the previous is received, these solutions often fail to max out the bandwidth available in the connection, if the service provider defaults to being conservative, hence increasing the latency of delivery. + + +#### Lock/Unlock access to the resource + +Solutions of this type enable the client and the provider to verify the actual service before unlocking the payment. One of the main tools used is locking the access to the service by encrypting it with a key and then performing an exchange of the key that gives access to the service. + +These solutions can be found in several forms: +- [Solving the Buyer and Seller’s Dilemma: A Dual-Deposit Escrow Smart Contract for Provably Cheat-Proof Delivery and Payment for a Digital Good without a Trusted Mediator](http://anrg.usc.edu/www/papers/Dual_Deposit_ICBC_2019.pdf) +- [FairSwap: How to fairly exchange digital goods](https://eprint.iacr.org/2018/740.pdf) +- [Zero-Knowledge Contingent Payments Revisited: Attacks and Payments for Services](https://acmccs.github.io/papers/p229-campanelliA.pdf) + +**Known shortcomings of these approaches:** +* Vulnerable to griefing attacks. +* Generally, these solutions focus on the fulfilment of the delivery, more so than measuring the speed and rate of the transfer. + +#### Optimistic Fair Exchange + +These solutions build on top of the Fair Exchange ones and adopt an Optimistic approach where some sort of Reputation or Stake is used to give both parties trust that the exchange will occur correctly. If one misbehaves, they can still rely on a dispute system. + +- [OptiSwap: Fast Optimistic Fair Exchange](https://dl.acm.org/doi/10.1145/3320269.3384749) +- [Asynchronous Protocols for Optimistic Fair Exchange](https://ieeexplore.ieee.org/abstract/document/674826?casa_token=K7FXNFUKD7AAAAAA:cAcsxIQvxTqvUpRZ73PYBtG2lGtyrA0qMAEuBa46q4Zas4d3aD7yATmZhxYrhem0RxKGwlqn4g) +- [Optimistic Fair Exchange with Multiple Arbiters](https://eprint.iacr.org/2009/069.pdf) + +**Known shortcomings of these approaches:** +* Setup + Dispute resolution is non-trivial and takes significant time + +#### Reputation based + +Reputation-based solutions give the possibility to speed up the transfer and improve the quality of the service by leveraging the trust built over time between clients & providers. + +- [Proof of Delivery in a Trustless Network](https://ieeexplore.ieee.org/document/8751417) +- [Proof-of-Prestige: A Useful Work Reward System for Unverifiable Tasks](https://www.ee.ucl.ac.uk/~ipsaras/files/Proof_of_Prestige-icbc19.pdf) +- [Mechanisms for Outsourcing Computation via a Decentralized Market](https://scopelab.ai/files/eisele2020mechanisms.pdf) +- Other Gradual Fair Exchange Class of constructions + +**Known shortcomings of these approaches:** +* Not a direct shortcoming, but a challenge is that each participant my have different levels of sensitivity with regards to how much they are willing to rely on trust vs. strict verifiability. + +#### Privacy focused + +There are multiple solutions with the goal of delivering reader privacy and writer privacy. These incur an additional cost in setup and/or latency. Disclaimer: At the time of writing, we haven't dived deep into this set of solutions. + +- [Blockchain based Privacy-Preserving Software Updates with Proof-of-Delivery for Internet of Things](https://www.sciencedirect.com/science/article/abs/pii/S074373151930098X) +- [SilentDelivery: Practical Timed-delivery of Private Information using Smart Contracts](https://arxiv.org/abs/1912.07824) +- [Bulletproofs+: Shorter Proofs for Privacy-Enhanced Distributed Ledger](https://eprint.iacr.org/2020/735.pdf) + +#### Traditional approaches (close to Web 2.0 world) + + +These approaches suggest novel ways to add integrity checks to data transferred or to promote new ways for content distribution, but they don’t depart from the Web 2.0 world of assumptions such as trusted third parties and central points of control. + +- [The multimedia blockchain: a distributed and tamper-proof media transaction framework](http://www.cs.stir.ac.uk/~dbh/downloads/Multimedia-Blockchain-DSP17.pdf) +- [Reliable Client Accounting for P2P-Infrastructure Hybrids](https://www.cis.upenn.edu/~ahae/papers/accounting-nsdi2012.pdf) + +**Known shortcomings of these approaches:** +* Not fully aware of the challenges of building a decentralized network + +### Known attacks to be mitigated + +Here we list the attacks to consider when designing a solution. These are: + +* Malicious actor forces provider to spend bandwidth without issuing payment (also known as Griefing Attack) + * Type: client fraud + * Consequence: bandwidth is spent +* Malicious actor forces provider to pay to retrieve the file from storage point (e.g. Filecoin, Cloud Storage, etc) without the provider itself ever getting paid + * Type: client fraud + * Consequence: currency is spent +* Provider claims it has sent the file, but actually never did + * Type: Provider fraud + * Consequence: Client gets penalized without receiving the service +* Metering Inflation + * Providers report to have used more resources to serve content to clients than what they have actually done. +* Sybil / Throttle attacks + * Description: A set of nodes collude against a content-provider to try and make him go bankrupt (loss of all its “SLA stake”). They all try to download content from that content provider at the same time, so that Providers start consuming the stake from the content providers. If clients are not paying for the service we take the risk of enabling this attack. + +### New ideas being explored + +ResNetLab organized a Research Intensive Workshop on 3DMs, out of which, the following ideas, structured as RFCs emerged: + +* [RFC: ZKCP with Fair Exchanges of 1 bit](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_ZKCP%20with%20Fair%20Exchanges%20of%201%20bit.md) +* [RFC: ZKCP Optimizations](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_ZKCP%20Optimizations.md) +* [RFC_Optimistic ZKC(S)P](https://github.com/protocol/ResNetLab/pull/15) + +## Distribution Graph Forming + +The area of graph forming for Decentralised Data Delivery Markets revolves around three main areas, which together compose the different goals of the Graph Forming problem: + +* **Content discovery & routing:** the system can forward requests to find content +* **Content placement:** the system should have ways to proactively distribute/replicate copies of data across different Providers in different geographic regions in order to be able to serve content fast. +* **Content copy selection:** the system can choose the optimal copy of the content to serve if multiple copies exist in several Providers, where potentially: + * every copy has a different “asking” price (defined by the cryptoeconomic model) and delivery latency, and + * every miner achieves different performance and has a different reputation profile. + +In order to achieve the above goals, we need to answer certain architectural questions: where do end-users connect (e.g., to a Provider, or to a separate content resolution system), which entity in the architecture makes content resolution and request forwarding decisions and finally, which entity(ies) has(ve) the required knowledge to forward requests to the closest (to the requesting user) copy, _aka_ nearest replica routing. + +We generally consider that Providers are relatively powerful devices with local storage dedicated to storing hot copies of the content, high uptime (close to zero churn) and, ideally, a public IP address (reachable from anywhere). The architecture should be able to take advantage of storage in less powerful and intermittently-connected end-user devices, such as laptops and mobile phones - this is discussed as a different area further down. Although not a strict requirement, this is what will make the decentralised storage network take full benefit of planetary-scale unused storage. + +It is worth highlighting that the system should be able to serve different use-cases and therefore different setups and architectures could apply depending on the use-case. For instance, the content resolution system can be different between the cases where: i) the application operates based on a closed-content ecosystem (e.g., subscriber-based music or video streaming, where control of the content is solely with the content publisher) and ii) the application operates on a totally open content space, e.g., web. + +Last but not least, the system of overlay Provider nodes has to be permissionless and decentralised. No entity has full control of the network of nodes, as is the case with traditional CDNs, where a single entity is in charge of the network setup and the content served by the system. + +How it is done traditionally (CDNs, P2P CDNs) + +Traditionally, content is stored, hosted, replicated and delivered by centralised CDNs. In those setups, **the CDN entity has full control over: i) the formation of their network of servers**, e.g., in which geographic areas to place servers and how to interconnect them, both in terms of topology and in terms of bandwidth deployed between servers, **ii) the placement and replication of content**, i.e., where to put hot copies of content, as well as **iii) the network elements that are responsible for content resolution**. + +In a centralized setup it is much easier to optimise performance and provide guarantees, compared to the case where **every network node is operated by a separate (legal) entity that operates rationally in a profit-maximizing way**. + +Closer to our setup are P2P CDNs, where end-user equipment is “employed” to contribute to the storage and dissemination of content. Node churn, unreliable connectivity and low bandwidth speeds become a reality in P2P CDNs, but **the operation of a P2P CDN is still largely controlled and monitored by a single entity**. This makes decisions with regard to content placement and replication easier. The most important advantage of a single entity taking care of the P2P CDN setup is **monitoring**, that is, being able to observe the performance of peers and make decisions on where and when to replicate content or assign content distribution decisions. + +Properties + +* The system MUST always be able to discover content and satisfy content requests + * There should never be a “404 Content not Found” error for content available anywhere on the network +* The system MUST replicate content to different storage points in order to reduce delivery times and maximize performance. +* Providers MUST follow the crypto-economic model and the system MUST make sure that Providers do not misbehave. +* The network MUST be content-addressable and operate based on content identifiers. +* The system MUST be permissionless and trustless + * Anyone should be free to join and set up a Provider node to contribute to the network. + + +### State-of-the-art + +The target area and expected outcome of this topic is to form the network, the interactions between different network entities, as well as the basic protocols through which network entities will communicate. Given this target, we split our investigation in the following three main areas: + +**P2P CDNs:** the network architecture setup in these systems is similar in some ways to that of a 3DM + +* [LiveSky: Enhancing CDN with P2P](https://dl.acm.org/doi/pdf/10.1145/1823746.1823750) +* [P2P as a CDN: A new service model for file sharing](https://www.sciencedirect.com/science/article/pii/S1389128612002290?casa_token=uNw-gpbx5sIAAAAA:VYBa5QErOCS7DcduOwf1lsl3C174YlOIMmQii9NhKippX_Hm7I3QYnfeZ53LTVcp7Uj8y1JX4DI) +* [OblivP2P: An Oblivious Peer-to-Peer Content Sharing System](https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_jia.pdf) + +**P2P VoD:** video delivery is an ideal target use-case, with very strict requirements; therefore, the mechanisms proposed in this field and used for P2P VoD can be inspiring + +* [Challenges, Design and Analysis of a Large-scale P2P-VoD System](https://dl.acm.org/doi/pdf/10.1145/1402946.1403001) +* [A Framework for Lazy Replication in P2P VoD](https://dl.acm.org/doi/pdf/10.1145/1496046.1496068) +* [InstantLeap: Fast Neighbor Discovery in P2P VoD Streaming](https://dl.acm.org/doi/pdf/10.1145/1542245.1542251) +* [Proactive Video Push for Optimizing Bandwidth Consumption in Hybrid CDN-P2P VoD Systems](https://ieeexplore.ieee.org/abstract/document/8485962?casa_token=41CKY7irz4kAAAAA:V2cwdpnSCny5GB2GGn0Zj1-3-BefgmpJTs0qTPZiAdzMINYIJryr7tU2L5ue8S3kVcRRdyfOtg) + +**Content-Centric Networking:** there are several network architectures and protocols proposed for content-/information-centric networks which can be great inspiration for a permissionless, content-addressable P2P network. + +* [Named Data Networking (NDN)](https://dl.acm.org/doi/pdf/10.1145/2656877.2656887) +* [iCDN - An NDN based CDN](https://dl.acm.org/doi/pdf/10.1145/3405656.3418716) +* [Analysis and Improvement of Name-based Packet Forwarding over Flat ID Network Architectures](https://dl.acm.org/doi/pdf/10.1145/3267955.3267960) +* [High Throughput Forwarding for ICN with Descriptors and Locators](https://dl.acm.org/doi/pdf/10.1145/2881025.2881032) + +Our initial, but thorough investigation resulted in the following potential groups of designs for content discovery and resolution: + +#### DHT-based: + +DHTs are very popular constructions in P2P networks. The DHT system is used as a content resolution mechanism, as well as a content and peer routing system. The routing table is split between peers that participate in the DHT, providing higher resilience and scalability. In a DHT-based system, clients “walk” the DHT (iteratively, or recursively) to find the provider record and then directly connect to the peer included in the provider record. + +* **Pros:** tested in the past, engineers have lots of experience with these structures, can become faster and less expensive than the IPFS DHT setup, if: i) re-publishing is done at coarser granularities, which is reasonable if we assume stable connectivity, i.e., low peer churn (reasonable to assume for Providers), and ii) some payment is associated with publishing. It is also reasonable to assume public IP connectivity for Providers. +* **Cons:** slow, several round-trips needed to resolve content (especially in case of iterative DHT lookup). The solution will not scale in the longer term. + +#### Name-based routing: + +In Name-based routing systems, routing hints are integrated as part of the content name; routing tables are filled with routing hints at network setup time by a routing protocol. Routers make hop-by-hop forwarding decisions based on content names seen in requests and routing hints in their routing tables. Matching between hints and names depend on the structure of the names, e.g., hierarchical vs flat. + +* **Pros:** very fast, enables multicast-like opportunities and on-path caching, which can result in significant savings in case of popular and heavy content (e.g., HD VoD). Also makes routing from browser/limited connectivity clients more feasible. +* **Cons:** design can be(come) complex, security properties not fully studied in the literature. + +#### DNS-style, object-level indexing service: + +The Domain Name System (DNS) uses index tables to resolve names (URLs) to IP addresses of hosts that store the requested name. It is considered by many to be a decentralised system, in the sense that DNS servers can be deployed by anyone running an ISP. At the top-level, however, it is administered by [ICANN](https://www.icann.org/) which is in charge of assigning names and therefore, does not comply with the permissionless nature of 3DMs. We could envision a similar, index-based naming system, which is governed by the [Ethereum Name Service](https://docs.ens.domains/) (ENS) to bypass centralization concerns. Cloudflare has recently introduced a [Name Resolver for the Distributed Web](https://blog.cloudflare.com/cloudflare-distributed-web-resolver/), where IPFS content can be accessed through ENS. + +* **Pros:** can be very fast +* **Cons:** need to rely on ENS (which is fine given the ENS system is deployed and tested in the wild, albeit in smaller scales), or build a similar service + +#### PubSub-style: + +In pubsub systems information propagates in the network based on topics that nodes subscribe to. Notifications about new content being published in the system can propagate through a dedicated channel, or more realistically, applications can have their own topics to disseminate information about content published within the remit of their application. + +* **Pros:** simple approach, lots of literature and testing in dozens of systems, engineers have lots of experience with similar protocols +* **Cons:** too slow for real-time content delivery, can take seconds to find content, can’t scale in the longer term + +### Known attacks to be mitigated + +* Providers provide false content discovery and routing information +* Providers serve bogus content +* Providers provide false provider records (i.e., they claim that they have and can serve content that they actually do not have) +* Cache poisoning +* Privacy attacks: there are a number of privacy considerations and threat vectors to be taken into account + +### New ideas being explored + +ResNetLab organized a Research Intensive Workshop on 3DMs, from which the following ideas emerged and were structured as RFCs: + +* [RFC: Incentive alignment for collaborative routing](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_Incentive%20alignment%20for%20collaborative%20routing.md) +* [RFC: Name-based Request Forwarding for Nearest Replica Routing](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_Name-based%20Request%20Forwarding%20for%20Nearest%20Replica%20Routing.md) +* [RFC: Omniscient Routers](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_Omniscient%20Routers.md) + +## CryptoEconomic model for Data Delivery + +The aim of the cryptoeconomic model is to build an incentive system to ensure that all actors in the network are aligned towards the same goal: to deliver data efficiently. The cryptoeconomic model is tightly coupled to all of the areas presented above, offering a way to incentivize desired behaviors in the network. Ideally, the economic model should offer the substrate to build a decentralized and trustless infrastructure for data delivery able to compete against traditional CDN infrastructures in terms of cost, scalability, and price. + +#### How it is done traditionally + +In traditional and centralized setups, a set of legal agreements is used to govern the relationships between entities, align the incentives of all participants, and discourage disruptive behaviour. These agreements offer a basic level of trust to ensure good behavior by all entities in the system (clearly stating the SLA, and punishments for misbehaving or not fulfilling the contract). We currently see this setup in the multilateral agreements between CDNs, ISPs, and Cloud Providers to share resources from their infrastructures for a certain price in order to build a network of relationships that allow them to provide their content delivery services efficiently. + +In decentralized setups, it is not possible to use legal agreements to orchestrate the relationship between entities in the system. In recent deployments of hybrid CDNs, traditional, centralised CDNs leverage the resources of end users to deliver data with enhanced QoS reducing the load on their infrastructure and as a result the cost of upgrading/extending their infrastructure. End users are incentivized to contribute to the system with their upload bandwidth by making them eligible for better service. This additional reward is used to incentivize a behavior in the system (in this case users contributing with bandwidth in hybrid CDNs). However, the fact that there is still a centralized infrastructure managed by an entity with greater power over the rest of actors is enough to prevent misbehavior. This removes the need to design stronger reputation or reward systems. This does not represent a trustless setup. + +For decentralized and trustless setups, two main schemes have been traditionally used to align the incentive of the system’s participants and prevent misbehaviors: + +* Trustless networks governed by a common consensus, like public blockchains, use economic models based on token rewards and clear punishments built upon a secure consensus algorithm. The consensus algorithm gives a basic layer of trust where all participants “keep an eye on one another” preventing misbehaviors, and upon which the economic model is built to align all their goals. A good example of this model is Bitcoin, where the Proof of Work consensus algorithm prevents attacks, while the mining rewards incentivize miners to keep serving the network. + +* For trustless P2P networks where there is no common protocol run by all the participants of the network, other schemes such as reputation systems, or credit networks need to be implemented to orchestrate the relationships and good behaviors of all entities in the system. + +In the same way public blockchains leverage the schemes and security properties in place to design their economic model, 3DMs will have to leverage the metering and graph forming designs in place to design the right economic model. + +Finally, auction markets and game theoretical models have been traditionally used to study and achieve efficient resource allocation in P2P networks and decentralized systems. + +#### Properties desired + +* The model MUST foster fair competition and avoid the creation of monopolies. + * We should prevent superlinear advantages from winning/completing a high percentage of retrieval deals in the system. +* The model SHOULD discourage, prevent, and punish misbehaviors. + * Or the other way around, good behaving entities should be rewarded by the model. + * In its different layers, the system will include schemes to prevent attacks in the network (client metering, sybil attacks, data-ransoming, etc.). However, this economic model should disincentivize these kinds of misbehavior in advance (from a game theoretical approach, the system should target a Nash Equilibrium in the system where misbehaviors are disincentivized). + * It should also avoid “malicious economic attacks” such as a content provider trying to bankrupt its competition by draining the stake of his retrieval deal. +* The model SHOULD foster collaboration between entities to benefit users perceived QoS / QoE. + * The model should reward entities that collaborate to serve content efficiently and with high QoS to users. + * This will also result in an efficient allocation of resources in the network. +* If any part of the economics of the system ends up being orchestrated by an auction market, this market MUST be always available. + * And should match offers to bids in a fast and efficient way. + +### State-of-the-art + +As described above, the outcome of this topic should be to design an economic model that aligns the goals of all profit-maximizing entities in the system while preventing misbehaviors. Given this target, we split our investigation in the following areas: + +##### Economics of Hybrid CDNs and P2P networks + +The economics of current Hybrid CDN deployments give a good understanding of the system and cost models involved in the delivery of content at scale. This, added to the different analysis and design proposals of economic models for content delivery in P2P networks represent a good foundation to understand the complexity of the problem and potential ways to tackle it. + +**_Related Papers:_** + +* [An economic mechanism for request routing and resource allocation in hybrid CDN–P2P networks](http://www.cloudbus.org/papers/CDN-P2P-NetMan.pdf) +* [An economic replica placement mechanism for streaming content distribution in Hybrid CDN-P2P networks](https://www.sciencedirect.com/science/article/pii/S014036641400228X?casa_token=99HEhHX4Qx4AAAAA:HGUBKkgzqtRn-AwdIpjQmxFMmUd6YIwuWH4ESn3MC1Wv1xYjI5JemUOtJlPDpI20CqFwwlYbVqM) +* [How Neutral is a CDN? An Economic Approach](https://files.ifi.uzh.ch/stiller/CNSM%202014/pdf/54.pdf) +* [Value Networks and Two-Sided Markets of Internet Content Delivery](http://www.leva.fi/wp-content/uploads/ICN_2SM_TelecomPolicy_accepted_manuscript.pdf) +* [Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution](https://www.cs.princeton.edu/~mfreed/docs/bilateral-tr10.pdf) + +**_Learnings:_** + +* Protocols and systems involved in the cost model of a Hybrid-CDN. +* Models of P2P content delivery networks. + +##### Game theoretical models and reward/reputation systems in P2P networks + +How to design the right incentive and reputation models to incentivize certain behaviors in P2P networks has been a widely studied problem for some time now. From reputation systems where nodes track the level of good behavior of their peers and cooperate with them according to their grade; to incentive systems where good behavior is explicitly rewarded. Game-theoretical models are a way to evaluate the strength of these schemes and to understand if the proposals can achieve Nash Equilibrium. + +**Related Papers:** + +* [Scrivener: Providing Incentives in Cooperative Content Distribution Systems](http://cs.brown.edu/courses/csci2950-g/papers/scrivener.pdf) +* [Proof-of-Prestige: A Useful Work Reward System for Unverifiable Tasks](https://www.ee.ucl.ac.uk/~ipsaras/files/Proof_of_Prestige-icbc19.pdf) +* [Incentivizing Peer-Assisted Services: A Fluid Shapley Value Approach](https://dl.acm.org/doi/pdf/10.1145/1811099.1811064) +* [Incentive and Service Differentiation in P2P Networks: A Game Theoretic Approach](http://www.cse.cuhk.edu.hk/~cslui/PUBLICATION/incentive_tons.pdf) +* [On Incentivizing Caching for P2P-VoD Systems](http://www.cse.cuhk.edu.hk/~cslui/PUBLICATION/netecon12.pdf) +* [Bar Gossip](https://static.usenix.org/event/osdi06/tech/full_papers/li/li.pdf) + +**_Learnings:_** + +* Use different metrics to evaluate the level of good behavior of peers. +* Examples of game theoretic models to approach the evaluation of these protocols. +* Schemes to avoid attacks to which these systems can be vulnerable (such as sybil attacks). + +##### Credit networks and Token Designs + +Credit networks provide liquidity in markets where transaction volumes are close to balanced in both directions. They offer a way of performing payments and rewarding behaviors without the need of a common consensus or dedicated agreements between all the entities in the systems. Credit networks require no “a priori” trust relationships when they are backed by on-chain escrows, so they represent an interesting approach to tackle the economics of 3DMs. + +**_Related Papers:_** + +* [Collusion-resilient Credit-based Reputations for Peer-to-peer Content Distribution](https://dl.acm.org/doi/pdf/10.1145/1879082.1879085) +* [Liquidity in Credit Networks: A Little Trust Goes a Long Way](https://dl.acm.org/doi/pdf/10.1145/1993574.1993597) +* [Liquidity in Credit Networks with Constrained Agents](https://dl.acm.org/doi/pdf/10.1145/3366423.3380276) +* [CAPnet: A Defense Against Cache Accounting Attacks on Content Distribution Networks](https://arxiv.org/abs/1906.10272) +* [MicroCash: Practical Concurrent Processing of Micropayments](https://arxiv.org/pdf/1911.08520.pdf) +* [Proof-of-Prestige: A Useful Work Reward System for Unverifiable Tasks](https://www.ee.ucl.ac.uk/~ipsaras/files/Proof_of_Prestige-icbc19.pdf) + +**_Learnings:_** + +* Economics of credit networks. +* Using credit networks for fast payments and reputation systems. +* Credit networks for content distribution in P2P networks. + +##### Auctions, decentralized markets, and efficient resource allocation + +Auctions and decentralized markets have traditionally been a good way of allocating resources efficiently in a decentralized manner. These papers analyze different proposals of auction systems and decentralized markets that can aid the design of efficient resource allocation in 3DMs. + +**_Related Papers:_** + +* [Edge-MAP: Auction Markets for Edge Resource Provisioning](https://www.ee.ucl.ac.uk/~ipsaras/files/edgeMap-wowmom18.pdf) +* [A Market Protocol for Decentralized Task Allocation](https://www.computer.org/csdl/proceedings-article/icmas/1998/85000325/12OmNwlZu42) +* [On the Efficiency of Sharing Economy Networks](https://ieeexplore.ieee.org/document/8665937) +* [Resource Allocation in Market-based Grids Using a History-based Pricing Mechanism](http://ce-publications.et.tudelft.nl/publications/646_resource_allocation_in_marketbased_grids_using_a_historyba.pdf) +* [Content Pricing in Peer-to-Peer Networks](https://www.usenix.org/legacy/event/netecon10/tech/full_papers/Park.pdf) + +**_Learnings:_** + +* Use of auctions as an efficient way to allocate resources in a decentralized system. +* Local auction protocols. to drive the offer and demand of resources. +* Decentralized market protocols for efficient pricing and distribution of resources. + +### Known attacks to be mitigated + +* **Sybil attacks:** Actors trying to game the system by indiscriminately generating a large amount of entities and gaining large influence over the system (forging client retrievals, overestimating the use of resources of a provider, preventing access from some resource in the network, making actors in the system dedicate resources to useless work, etc.). +* **Collusion attacks:** Several independent actors in the system colluding to perform attacks in order to gain large influence in the network. The type of attacks that can be performed with a collusion attack are similar to the ones performed in a sybil attack, but without having to generate “pseudonymous identities”. +* **Data ransoming:** This attack takes place when a provider agrees to serve some data from a publisher, but ends up preventing its retrieval from clients until a ransom is paid. An alternative form of this attack occurs when a provider is serving chunks of content to a client, and doesn’t deliver the last couple of chunks until it pays a ransom. +* **Malicious economic attacks:** Attacks aimed at economically harming actors in the systems. For instance, clients trying to get a provider or content publisher bankrupt, providers offering abusive prices to content publishers, providers and clients colluding to avoid the economic rewards of target entities, etc. + +### New ideas being explored + +ResNetLab organized a Research Intensive Workshop on 3DMs, out of it, the following ideas, structured as RFCs emerged: + +* [RFC: QFIL Closed Retrieval Economy](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_QFIL%20Closed%20Retrieval%20Economy.md) +* [RFC: No market 3DMs](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_NoMarkets3DM.md) +* [RFC: Credit-based Retrieval Network](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_Credit-based%20Retrieval%20Network) + +## Additional: Opportunistic deployment + +An area adjacent to the Distribution Graph Forming one is that of extending the network beyond Providers, as defined earlier, to also include more ephemerally-connected, end-user devices. Such devices can include laptops, desktops, (futuristic) storage-equipped WiFi Access Points, or even mobile smartphone and tablet devices. + +We call these environments “opportunistic deployments” to reflect their unpredictability in terms of availability, uptime, resource quality and quantity. In contrast to Providers, as defined earlier that are expected to have stable, public and high-bandwidth connectivity, 3DM opportunistic deployments can utilise everyday user devices to extend the storage footprint of Provider nodes and create a wealth of new network connectivity and business opportunities. + +Although this area seems to sit on the periphery of 3DMs, it is actually a highly impactful area, as it realises the vision of Decentralised Storage Networks in general and Filecoin, in particular, of regular end-users sharing their own resources to contribute to the network. That said, we place high value in capturing this opportunity and offering to end-users the opportunity of being rewarded for their contribution to the network. + +### State-of-the-art + +The literature in the field of Opportunistic and Delay-Tolerant Networks is vast. We have narrowed down the scope of the literature that we reviewed and identified the following most-promising 3DM applicability areas. + +#### Transient Providers + +We consider end-users that store content in their ephemerally connected devices (laptops, or mobile phones) and provide them to the network through (one or more) Providers, as defined in the “Distribution Graph Forming” area. These service providers are called Transient Providers (TPs) to depict their ephemeral nature in resource availability and time. TPs can have some economic relationship with one or more Providers, e.g., Providers can “recruit” TPs to expand their storage capacity and footprint. In this case, Provider nodes have to maintain their own monitoring mechanisms and do resource allocation to ensure that they are getting desirable levels of service from their group of TPs. + +Opportunistic D2D + +This area of research and deployment is closer to traditional Delay-Tolerant Networks (DTNs). We envision applications where mobile devices contribute to the distribution of content in the mobile domain. These can be akin to previously proposed [User-Operated Mobile Content Distribution Networks](https://drive.google.com/file/d/1vvtQ7MJb4YFRXxeo2GewGh8EVYEqV-iJ/view?usp=sharing), or applications to realise concepts such as “[Floating Content](https://drive.google.com/file/d/1yOniHbAjLYv3q09pff7gnulOwT_NYCYG/view?usp=sharing)”. + +### Known shortcomings + +**User Mobility** + +When a user is “registered” under one address/network location to serve content (at least as far as its Provider is concerned) and then moves to another network location, the content is not discoverable/retrievable anymore. + +* Brings up content mobility issues and “content churn” +* Similar to node churn, this is a very tricky problem: how can you find content that was linked to some network address, when this address is not valid anymore. You can, of course, update the record if you know where the record lives, but if the record is “allowed” to be provided by anyone, then it’s difficult to even find it. +* Simple solution: route through “home” gateway, similar to Mobile IP. Increases delay, is not ideal, but can work +* There is a lot of work on “producer mobility” in the literature that we should look at. + +**Privacy - Efficiency Tradeoff** + +In mobile opportunistic communications, the two are often conflicting. On one extreme, you have efficient (low-replication) solutions that make use of a lot of node data (geolocation history/patterns, contact history/patterns, current geographic destination, current velocity vector). On the other extreme, you have naive epidemic dissemination solutions that require no node profiling but introduce data redundancy. There is a continuum in between, as well as a third variable, which is QoS: in the extreme, if willing to wait forever for delivery, you can blindly pass a message around without replication. + +Keeping with the ethos of our projects, we believe that the most suitable solutions are those that don’t require sensitive data, including data related to location or contact history, from which identity and behaviour can be easily extracted. Aggregate transitive metrics (e.g. total expected time to destination measures) can provide a measure of delivery likelihood without disclosing the full contact vector or the nature of the contact (direct vs. indirect), and may be a useful compromise. Other data, including interests/subscriptions, is akin to what is already used e.g. in pubsub protocols and not considered to be any more privacy-invading. + +**Security** + +Like with any other distributed routing/forwarding approach, malicious nodes can interfere with message propagation and delivery. Because of the disconnected nature of the network, such attacks are harder (or at least slower) to detect and counteract. Secure-by-design approaches are likely to suffer from additional overhead (either in data or complexity) and _may_ be precocious when compared to the security model of other, more critical, parts of the network. + +**Energy Efficiency and Battery Consumption** + +Opportunistic communications that rely heavily (or exclusively) on user devices always come with the challenge of usability from the energy/battery perspective. Of course, incentives through FIL will alleviate some of it, but even then the incentive would have to be such that it overcomes the “running out of battery” shock. Techniques to overcome this can be as simple as disabling user participation when the battery level is below X%. + +**Incentives** + +The incentive to participate and the cryptoeconomic model behind the opportunistic part of the network is quite likely going to be incompatible with the one in the main Distribution Graph Forming area that applies to large/stable Providers. Nevertheless, incentives is a core part of this area and has traditionally been a stumbling block for the deployment of opportunistic networks. + +### New ideas being explored + +ResNetLab organized a Research Intensive Workshop on 3DMs, out of which, the following ideas, structured as RFCs emerged: + +* [RFC: Hybrid CDN with Recruiter Providers](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_Hybrid%20CDN%20with%20Recruiter%20Providers.md) +* [RFC: OS-level OppNet Component](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_OS-level%20OppNet%20Component.md) +* [RFC: On-Demand Opportunistic Resource Deployment](https://github.com/protocol/ResNetLab/blob/master/3DM_RFC/RIW2021_RFC_On-Demand%20Opportunistic%20Resource%20Deployment.md) diff --git a/OPEN_PROBLEMS/HETEROGENEOUS_RUNTIMES.md b/OPEN_PROBLEMS/HETEROGENEOUS_RUNTIMES.md new file mode 100644 index 0000000..15b4692 --- /dev/null +++ b/OPEN_PROBLEMS/HETEROGENEOUS_RUNTIMES.md @@ -0,0 +1,190 @@ +# Networking in Heterogeneous Runtimes + +## Short Description +Annual global IP traffic will reach 396 exabytes per month. The number of devices connected to the internet is expected to double in the same time. Access to high speed broadband connectivity is limited by poor last mile connections. Optic-fiber deployments are cost-prohibitive and slow to deploy [1](https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html). + +Edge computing has emerged as a distributed computing paradigm to overcome practical scalability limits of cloud computing. The main principle of edge computing is to leverage computational resources outside of the cloud to perform computations closer to data sources, avoiding unnecessary data transfers to the cloud and enabling faster responses for clients. Given the enormous amount of data that is expected to be produced at the edge of the network (by end-user devices), the edge-computing principle builds on the fact that “it is cheaper to bring computation to data, rather than data to computation”. + +Data storage and computation points at the edge of the network, be it in cell stations, WiFi access points, or even the same user devices that produce the data ultimately form a peer-to-peer network. + +*Both IPFS and libp2p can have a pivotal role in this new trend of distributed technologies and expansion of the Internet beyond traditional boundaries*. Both technologies already have implementations in several programming languages and are compatible for their execution in a great gamut of runtimes (desktop, browser, cloud, etc.) and network conditions (public IP addresses, behind NATs, etc.). Nevertheless, there is significant work to be done to make IPFS and libp2p compatible with every runtime and network condition in a way that offers a flawless experience, just like one expects when using the net or http package of any language. This open problem is built upon *the vision of making libp2p / IPFS modules and protocols the de-facto distributed substrate for connected devices in the near-future Internet*. + +At high-level, the aim of this research endeavor is to: + +- Enable libp2p to leverage the available protocols from the multiple runtimes, so that it can execute and form a p2p network. This will require a coordinated approach to cover as much ground possible (possible approach inspired by [Are we yet?](https://wiki.mozilla.org/Areweyet) initiatives) +- Allow global connectivity between devices. Any two libp2p nodes should be able to communicate using a compatible network link available without having to rely on a connection to the broad Internet (e.g direct connectivity through wireless transports, mesh connectivity, etc.). +- (Bonus) Libp2p/IPFS should include a general execution framework to be easily extensible for any runtime using the same format. + - [Extensibility through Web Assembly](https://istio.io/latest/blog/2020/wasm-announce/) + - [Web Assembly for proxies (ABI specification)](https://github.com/proxy-wasm/spec) + - [Self-described blocks (embedded IPLD codecs)](https://hackmd.io/AAfd9WnWQZSC7HT7Pr5G9A?view) + +## Long Description +This open problem can be divided in the following subproblems: + +### 📲️ Area I: Runtimes +The aim of this open problem would be to have libp2p and essential IPFS modules and protocols running in different runtimes (ensuring their compatibility). A lot of work has already been done around this line with libp2p and IPFS running in the browser and implemented in several programming languages. + +By the end of this research effort we should consider at least supporting the following runtime categories (each category includes a list of related projects): +- Browsers and Desktop +- Embedded systems (IoT, low powered devices) + - [IPFS Tiny](https://gitlab.com/librespacefoundation/ipfs-tiny/-/wikis/home) + - [Embed-ipfs](https://github.com/ipfs-rust/ipfs-embed) +- Routing devices: + - [OpenWrt](https://openwrt.org/) + - [Gotenna](https://gotenna.com/) + - [Liberouter](http://www.liberouter.mobi/) +- Mobile devices: + - [Thali Project](http://thaliproject.org/) +- VR/AR devices. +- Trusted Execution Environments + - [Enarx](https://github.com/enarx/enarx/wiki/Enarx-Introduction) + - [T-Rust](http://t-rust.com/#/): [TEA (Trusted Execution and Attestation)](http://t-rust.org/#/doc_list/What_is_TEA%3F%2FREADME.md) +- Risc-V architectures + - [go-ipfs on Risc-V](https://github.com/ipfs/go-ipfs/issues/7781). [Blog post](https://blog.davidburela.com/2020/11/16/ipfs-on-risc-v/) + +We may be able to support many of the aforementioned with the same protocol implementations (e.g. routing devices and embedded systems). This work will set a strong foundation for the rest of the projects to come. + +#### What defines a complete solution? +- TBD + +#### References: +- Support for libp2p intra process: +- Build for gomobile: +- Use of vias for protocols: +- Enarx SDK: +- Are we distributed yet? +- Issues listing open problems to support full distribution: + +### 📦️ Area II: Data Link / Network Connectivity +Apart from being able to run libp2p nodes anywhere, we should also be able to connect two libp2p nodes using different types of connectivities. Not every node will have a proper Internet connection, or be able to directly connect to every single peer using the exact same connectivity data link. Analogously to how it is currently done in libp2p at a transport level with the transport upgrader, we should have similar support for different data link technologies. Nodes will try to fallback to the common data link for connectivity. Some of the data link technologies we should consider supporting are: + +- Wireless communications + - Bluetooth, Zigbee, WiFi direct. +- Side-channels + - NFC, RFID. + - Audio waves +- Relays and bridges + - See the next section. + +#### What defines a complete solution? +- Demonstrate that two libp2p nodes within range of each other can discover each other: + - through a local, fixed, network element, such as WiFi access point, raspberry pi, or similar + - directly, without relying on any fixed network element, i.e., device-to-device (D2D) + - the devices can connect on demand, i.e., when one of the two devices wants to request (pull) or send (push) content from the other +- The setup and supporting protocol stack should make use of at least one of the wireless connectivity technologies mentioned above. +- Bonus: the protocol setup should enable the option of seamless connectivity between the devices, that is, without the user having to accept incoming connections as is currently the case with bluetooth pairing. This is needed to enable large-scale mesh-networks and applications to disseminate content in mobile environments. + +#### References: +- Alternative transport / discovery protocols for libp2p: + - Bluetooth: + - +- Wave-share: + +### 🛣️ Area III: Routing + +One of the big aims of this open problem is to achieve global connectivity between libp2p nodes. Supporting several runtimes and connectivities will allow "anything" to become a connected libp2p node. Unfortunately, to be able to connect any two libp2p nodes independently of their network setting and location, and to offer node mobility, additional libp2p routing infrastructure may be needed (e.g. for the translation of data links, or bridge unreachable network segments). This infrastructure will be responsible for discovering and forwarding messages to directly unreachable peers. In this project, we can open several lines of research and exploration: +- Relay and bridges + - Use bridges as gateways for low-powered device with limited support for network ([IPFS tiny bridges](https://gitlab.com/librespacefoundation/ipfs-tiny/-/wikis/home)) + - Libp2p routers and gateways to discover and forward data to peers at unreachable segments of the network (extending the current implementation of relays). +- Mesh networking. + - Hierarchical routing of libp2p traffic through the construction of interconnected local mesh networks. + +#### What defines a complete solution? +- Set up an environment where low-powered devices can communicate between them through a gateway of some form (e.g., a WiFi access point). +- Demonstrate that one device can send data to the other, on demand. +- Bonus: more powerful mobile devices can act as relays, realising a multi-hop environment. + +#### References: +- OpenR - Distributed routing protocol for mesh networks: +- Liberouter: +- OpenWrt - Embedded firmware OS: + +### 📴️ Area IV: Offline-first + +Libp2p and IPFS nodes should be able to work offline-first and offer capabilities to easily implement offline-first applications and protocols over them. Being offline first means being able to easily recover from network interruptions, and operate seamlessly under unstable network conditions. To achieve this we may require the use of CRDTs and "synchronizable" datastores. + +#### What defines a complete solution? +- Assume an environment where several mobile devices run the same application and the application content between devices needs to be synchronised at all times. When one of the devices produces new content, the content should propagate to the rest of the devices realising a "push" model. +- Setup an environment with a sample application that produces content every so often. +- The devices can either be permanently connected in a mesh, or connect on demand and transfer content to each other. +- As nodes propagate content, they have to optimise the amount of data they exchange. Instead of just flooding full messages, nodes should first check whether their neighbours already have the latest content items. This can be achieved through CRDTs, or bloom-filters. At least one should be demonstrated. + +#### References: +- Improve offline support: +- Offline message queue: +- IPFS CRDT datastore: +- Optimize Circuit Relays to make it packet-oriented in libp2p: +- NAT Traversal effors around libp2p: + +### 🛠️ (Bonus) Area V: Extensibility / P2P VM a.k.a InterPlanetary Runtime +This area of research is still a work in progress, and it may becoming a full-fledge open problem of its own, but for now we still consider it as an area inside the heterogeneous runtime problem. +Maintaining and growing libp2p protocols for the gamut of runtimes and connections we want to support can be cumbersome. This is why we should consider embedding a common execution framework to every libp2p implementation so libp2p modules are compiled targeting this common framework and can be thus used in any of the runtimes. With this, protocols and modules are implemented once and used anywhere. + +Not only protocol implementations can benefit from embedding an *"InterPlanetary Runtime* in libp2p nodes, but also any other standard computation in the ecosystem such as IPLD codecs. IPLD codecs can be implemented targeting the *IP Runtime* so they don't have to also be implemented for every libp2p implementation and runtime. This opens the door to future lines of research such as *self-describing blocks*, where the block descriptions embeds the type of data and a pointer to the codec (and which can be another block in stored in the network); *content-addressable code executions*, where code snippets are stored in the network, and data is specified in the code as pointers to blocks in th network; *computation delegation*, as once every node in the network shares the same common runtime and has a way to retrieve code and data directly from the network with a standard interface, delegating the execution of code to other's is as easy as sharing a link to the code and the data. Some of these themes, such as the use of *self-describing blocks*, although useful for the interoperability and extensibility problems of networking heterogeneous runtimes, they belong to the [Distributed Type Systems](https://github.com/protocol/resnetlab#areas) open problem, and may be finally tackled there. + +#### What defines a complete solution? +- TBD + +### References +- Wasmtime (Wasm runtime): +- UnisonWeb: Example of content-addressable programming language: +- Wasmer (Wasm runtime): +- WASM ❤️ IPLD: +- For this project we can take inspiration from the[Wasm extensibility approach](https://istio.io/latest/blog/2020/wasm-announce/) currently introduced in Istio and Envoy. Thus, our libp2p node implementations would expose the basic operation with a common runtime framework from which new Wasm modules are instantiated and run to extend the operation of our node with additional modules and protocols. +- Wasm bindgen designs: + +### 🛰️ (Bonus) Area VI: High Latency Connection (e.g. Space) +In the future, we may want to extend the edge computing trend into space, and follow the same approach we've followed to support libp2p for existing runtimes and link connectivities but considering the requirements and limitations of space communications. This research are may also become in the future its own open problem. + +#### What defines a complete solution? +- TBD + +#### References: +- Libp2p for space. +- [[2021 Theme Proposal] IPFS ❤️ Starlink](https://github.com/ipfs/roadmap/issues/72#) +- SpaceX Starlink: +- Issue with ideas: + - Software update among satellites + - Load-balancing and rerouting when sat failure + - Sky-based bootstrap/ Peer discovery, if Starlink is cooperative + - **Retrieval Market Filecoin Nodes (@adlrocha: my personal favorite)** + - Organic CDN (because of IPFS) + - Provider paid CDN ('coz offer and demand, baby) + - Open source embedded devices coms / sat coms +- Efficient Telemetry Storage on IPFS: + +### Additional areas of research to potentially explore. +Other research areas and projects that may land this heterogeneous runtime open problem. Additional inspiration can be taken from [this paper](https://asc.di.fct.unl.pt/~jleitao/pdf/NewEdgeApplications.pdf). +- Libp2p in 5G RAN infrastructure +- Distributed monitoring +- Collaborative services in libp2p (execution, monitoring, relaying). +- Distributed Resource Management. + +## Use cases +### Use Case I: User-Operated Mobile CDN +#### Brief Overview +Assume an environment where content publishers publish content through mobile phone applications. Content publishers utilise the mobility of users, the mobile device storage and connectivity opportunities to disseminate content. Users’ devices connect with each other as they move around to check whether their app instance has the latest content available and if not they update with the neighbours’ content. +In this use case, we explore the potential of a user-operated, smartphone-centric content distribution model for smartphone applications. In particular, we assume source nodes that are updated directly +from the content provider (e.g., BBC, CNN), whenever updates are available; destination nodes are then directly updated by source nodes in a device-to-device (D2D) manner. We leverage on sophisticated information-aware and application-centric connectivity techniques to distribute content between mobile devices in densely-populated urban environments. +#### Business case +Content publishers utilise storage and connectivity opportunities of mobile devices as a medium of content distribution and save from CDN costs. They pay users for the amount of storage and mobility opportunities that they contribute to the mobile CDN. + +See [this paper](https://drive.google.com/file/d/1UpH6r3Q0gKaf00CgEImgK_SImENR8NQ9/view) for more details. + +### Use Case II: Local Social Network +#### Brief Overview +Imagine a crowded entertainment or business event, such as a concert, sports event, or conference. People gather having the same interest, that is, “consume” content from the event. Everyone is capturing content on their mobile device and everyone is interested in the best content available. Connectivity in these environments is normally terrible, if existent at all. Offline-first, D2D connectivity can facilitate content propagation between devices in the area. +#### Business case +Event organisers can save costs from expensive infrastructure setup (e.g., mobile cell stations, WiFi access points and bandwidth capacity) and instead subsidise users’ resources to disseminate content. + +### Use Case III: Industry 4.0. +#### Brief Overview +The next generation of factory networks is expected to rely on smart, automated setups for the operation of production lines, robots, sensors and actuators. Connectivity between devices can be intermittent due to challenged environments, e.g., mines, or intentionally disconnected for security reasons. At the same time, connectivity and communication between devices needs to be stable given the expensive and time-critical operations. +#### Business case +Factories and industrial plants are an aggressive environment for wide-range wireless communications. Companies generally deploy private LTE networks to meet their needs of connectivity and performance. + +### Use Case IV: Mobile Active Networks - aka Ubiquitous code +#### Brief Overview +Traditionally, applications are installed in the end-user devices. With mobile active networks we propose that instead of having to download an application in order to use it natively, users can just send a request to the network to run the application. The nearest devices would serve the code in the right format for the user to run the application seamlessly in his device. Instead of having to install apps, we “pin” apps. Furthermore, if the application in hand is computationally intensive, a device can delegate computations (like graphics rendering) to other capable devices in the surroundings (this is why I mentioned that the code is served in the “right format” according to the device. The code may include pointers to primitives that require the delegation of computation). This use case merges the Stadia approach (you don’t need hardware to play videogames because everything runs on the edge) with the browser (running an application means searching for an address to download the content). +#### Business case +Users in the edge can share their spare bandwidth/computational resources. Remove the need for expensive end-user devices to run expensive applications. Instead of storing code the device only stores user’s application-related data. This opens the door to the seamless sharing of data (in IPLD format) between different applications, and local-first applications. diff --git a/README.md b/README.md index e77c2b4..ae60f4c 100644 --- a/README.md +++ b/README.md @@ -15,8 +15,8 @@ - [Motivation & Description](#motivation--description) - [Research](#research) - [Areas](#research) - - [Projects](#research) - - [Collaborations](#collaborations) + - [Projects](#research) + - [Collaborations](#collaborations) - [Collab projects tracking threads](#collab-projects-tracking-threads) - [Publications, Talks & Trainings](#publications-talks--trainings) - [Team](#team) @@ -48,34 +48,41 @@ The lab's genesis comes from a need present in the IPFS and libp2p projects to a Open Problem(s) Short Description - + - Networks Observability + Resilliency in Adversarial Networks NEEDs OPEN PROBLEM - + - Resilliency in Adversarial Networks + Networks Observability NEEDs OPEN PROBLEM - + + + + Decentralized Data Delivery Markets + General open problem + Content storage and delivery services have traditionally been under the ownership, control and management of centralised entities. Decentralised storage platforms are emerging and are in operation today. However, without actual delivery of content to users, these networks can be of little value, limited to serve as cold storage solutions only. With this Open Problem, we introduce the field of decentralised delivery, or decentralised CDNs to complement decentralised storage systems and come one step closer to the whole set of solutions that traditional CDNs offer today. + Heterogeneous Runtimes - NEEDs OPEN PROBLEM - (e.g. Browsers, IoT, Low Powered and/or Battery powered devices) - + General open problem + Making libp2p / IPFS modules and protocols the de-facto distributed substrate for connected devices in the near-future Internet. Enable the execution of libp2p nodes and its underlying protocols anywhere (any device and any runtime). Allow global connectivity between devices. Enable “offline-first” applications. + + Content Addressing Routing at Scale (1M, 10M, 100M, 1B.. nodes) Content-addressable networks face the challenge of routing scalability, as the amount of addressable elements in the network rises by several orders of magnitude compared to the host-addressable Internet of today. - + - Preseve full users' privacy when providing and fetching Content + Preserve full users' privacy when providing and fetching Content How to ensure that the user's of the IPFS network can collect and provide information while mainting their full anonymity. @@ -83,40 +90,54 @@ The lab's genesis comes from a need present in the IPFS and libp2p projects to a Mutability Mutable Data (Naming, Real-Time, Guarantees) Enabling a multitude of different patterns of interactions between users, machines and both. In other words, what are the essential primitives that must be provided for dynamic applications to exist, what are the guarantees they require (consistency, availability, persistancy, authenticity, etc) from the underlying layer in order create powerful and complete applications in the Distributed Web. - - + + Human Readable Naming You can only have two of three properties for a name: Human-meaningful, Secure and/or Decentralized. This is Zooko's Trilemma. Can we have all 3, or even more? Can context related to some data help solve this problem? - + PubSub at Scale (1M, 10M, 100M, 1B.. nodes) As the IPFS system is evolving and growing, communicating new entries to the IPNS is becoming an issue due to the increased network and node load requirements. The expected growth of the system to multiple millions of nodes is going to create significant performance issues, which might render the system unusable. Despite the significant amount of related literature on the topic of pub/sub, very few systems have been tested to that level of scalability, while those that have been are mostly cloud-based, managed and structured infrastructures. - + Data Exchange Enhanced Bitswap/GraphSync with more Network Smarts Bitswap is a simple protocol and it generally works. However, we feel that its performance can be substantially improved. One of the main factors that hold performance back is the fact that a node cannot request a subgraph of the DAG and results in many round-trips in order to “walk down” the DAG. The current operation of bitswap is also very often linked to duplicate transmission and receipt of content which overloads both the end nodes and the network. - - + + - Distributed Type Systems + Distributed Type Systems Improved layouts to represent data in hash-linked graphs (using IPLD) Future™ ⚙️ - + ### Projects -- Hydra Booster -- Gossipsub v1.1 -- drand -- [Beyond Bitswap](BEYOND_BITSWAP) -- P2P Observatory +- ONGOING + - 3DMs: Decentralised Data Delivery Markets + - Content Routing Scale Up + - P2P Observatory + - ResNetLab on (Virtual) Tour: +We have built a half-day tutorial to introduce the DWeb, the IPFS ecosystem, the IPFS architecture and its supporting protocols, and the high-level design decisions of the Filecoin network. In 2020, we have participated in multiple conferences and other academic events to discuss the exciting projects we're working on and invite great researchers to collaborate with us. +The tutorial is primarily composed of lecture material, and many of our tutorials have been very interactive. In 2021, we are enhancing the tutorial with hands-on sessions, so it will be even more exciting for students and researchers with a passion to tinker as they learn. You can find summaries of our talks in the "Publications and Talks" section further down. + +- COMPLETED + - **Hydra Booster:** The Hydra booster is a new type of DHT node designed to accelerate the Content Resolution & Content Providing on the IPFS Network. This new type of peer exists to augment the network by creating multiple distributed identities across the DHT address space, enabling it to contribute to the storage and discovery of content provider records. All of these identities are linked by the same backend datastore, which from the other peers’ perspective, creates the effect of multiple peers being present and holding a vast collection of the provider records in the network. The Hydra Booster has been instrumental for the stability and fast content resolution of the IPFS network. + Hydra boosters have been designed, implemented and are in operation today in the public IPFS network as of [release go-ipfs 0.5](https://blog.ipfs.io/2020-04-28-go-ipfs-0-5-0/). The Hydra booster is open source, lives in [this repository](https://github.com/libp2p/hydra-booster) and can be deployed by anyone. + + - **Gossipsub v1.1:** Gossipsub is one of the many libp2p PubSub routers used to disseminate IPNS records in the IPFS network, enable real-time distributed applications, and much more. GossipSub was adopted as a messaging layer by Filecoin and ETH2.0, due to its functionality and fast performance on permissionless networks. This has led us to invest additional effort to protect it against sybil attacks and malicious behavior. +Together with the libp2p team, we embarked on a mission to harden the protocol’s behavior. The outcome is a hardened version of the Gossipsub protocol that integrates several mitigation strategies at the protocol level. +You can learn more about this project in this [blogpost](https://research.protocol.ai/blog/2020/gossipsub-an-attack-resilient-messaging-layer-protocol-for-public-blockchains), this [paper](https://research.protocol.ai/publications/gossipsub-attack-resilient-message-propagation-in-the-filecoin-and-eth2.0-networks/) and this [talk](https://www.youtube.com/watch?v=APVp-20ATLk&feature=youtu.be&t=3612) at the Matrix.org “Open Tech will save us all” event. Our extensive performance evaluation can be found in this [report](https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/). The specification of the protocol can be found [here](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md). + - **[drand](https://drand.love/):** drand is an unbiasable source of randomness which other platforms and applications can publicly verify. Randomness is at the core of many of the security-critical operations we perform online every day, and until 2020, there wasn’t a single reliable and trustworthy source. That changed in 2020 with the drand project. + drand is hosted by more than 15 independent members of the League of Entropy, and is available for any project that needs randomness to use. You can find more details in the platform's [launch blog post](https://drand.love/blog/2020/08/10/drand-launches-v1-0/). We organised a whole-day summit to talk about the technical details of drand and distributed randomness. You can watch all the recordings at [randomness2020.com](https://randomness2020.com/). + - **[Beyond Bitswap](https://github.com/protocol/beyond-bitswap):** File-transfer is at the core of IPFS and every subsystem inside IPFS is built to enable it in a fast and secure way, while maintaining certain guarantees (e.g. discoverability, data integrity and so on). The aim of the project was two-fold: to drive speed-ups in file-sharing for IPFS and other P2P networks; and to enable a framework for anyone to join the quest of designing, implementing and evaluating brand new file-sharing strategies in P2P networks. The outcomes of the project are impressive: we produced [10 improvement proposals](https://github.com/protocol/beyond-bitswap#enhancement-rfcs), many of them prototyped, a complete [testbed](https://github.com/protocol/beyond-bitswap/tree/master/testbed) to benchmark and debug file sharing in IPFS, a [paper](https://research.protocol.ai/publications/accelerating-content-routing-with-bitswap-a-multi-path-file-transfer-protocol-in-ipfs-and-filecoin/) that provides further details about our design proposals and [several blogposts](https://research.protocol.ai/blog/) to summarise our results. + ### RFPs @@ -128,26 +149,55 @@ The lab's genesis comes from a need present in the IPFS and libp2p projects to a ### Collaborations -- PulsarCast -- DClaims +- PRESENT + - [Prof. George Polyzos](https://www.aueb.gr/en/faculty_page/polyzos-george), [Dr. Spyros Voulgaris](https://acropolis.aueb.gr/~spyros/www/) and their team at the [Athens University of Economics and Business (AUEB)](https://mm.aueb.gr/), Greece. Our project focuses on the design and implementation of a Multi-Layer DHT for IPFS. + - [Dr. Hidehiro Kanemitsu](https://www.teu.ac.jp/grad/english/teacher/cs\_spc/index.html?id=45) and [Prof. Hidenori Nakazato](https://waseda.pure.elsevier.com/en/persons/hidenori-nakazato) and their teams at the [Tokyo University of Technology](https://www.teu.ac.jp/english/index.html) and [Waseda University](https://www.waseda.jp/top/en/), respectively. Our project focuses on optimisation of DHT lookup times. + - [Dr. Sreeram Kannan](https://people.ece.uw.edu/kannan_sreeram/) and [Dr. Shaileshh Bojja Venkatakrishnan](https://cse.osu.edu/people/bojjavenkatakrishnan.2) at the [University of Washington](https://www.ece.uw.edu/) and [Ohio State University](https://engineering.osu.edu/), respectively. Our project focuses on the optimisation of GossipSub protocol in terms of speed and scalability. + - [Dr Joao Leitao](https://asc.di.fct.unl.pt/~jleitao/) and his team at [NOVA University of Lisbon](https://www.fct.unl.pt/en/research/nova-laboratory-computer-science-and-informatics). + +You can read more about these projects in this [blogpost](https://research.protocol.ai/blog/2020/meet-the-latest-protocol-labs-research-grant-recipients/). + +- PAST + - [PulsarCast](https://github.com/JGAntunes/pulsarcast) + - [DClaims](https://research.protocol.ai/publications/dclaims-a-censorship-resistant-web-annotations-system-using-ipfs-and-ethereum/) ### Publications, Talks & Trainings -- 2019, Jul - [`Workshop` A Brief History of Information-Centric Networks](https://github.com/protocol/research/issues/14#issuecomment-517048457) -- 2019, Sep - [`Workshop` ACM ICN Tutorial: The InterPlanetary File System (IPFS)](https://conferences.sigcomm.org/acm-icn/2019/tutorial-IPFS.php) -- 2019, Dec - [`Paper` DClaims: A Censorship Resistant Web Annotations System using IPFS and Ethereum](https://arxiv.org/pdf/1912.03388.pdf) +#### Talks + +- February 2021: `Talk` FOSDEM (blogpost coming soon) +- December 2020: [`Tutorial` IEEE Globecom](https://research.protocol.ai/blog/2021/ieee-globecom-2020-the-interplanetary-file-system-and-the-filecoin-network/) +- November 2020: [`Tutorial` CNSM](https://research.protocol.ai/blog/2020/ieee/ifip-cnsm-2020-the-interplanetary-file-system-and-the-filecoin-network/) +- July 2020: [`Tutorial` IFIP/IEEE DSN](https://research.protocol.ai/blog/2020/ieee/ifip-dsn-2020-the-interplanetary-file-system-and-the-filecoin-network/) +- May 2020: [`Talk` NGN Group](https://research.protocol.ai/blog/2020/next-generation-networks-ngn-group-talk-a-high-level-overview-of-the-interplanetary-file-system/) +- May 2020: [`Tutorial` IEEE ICBC](https://research.protocol.ai/blog/2020/ieee-icbc-2020-the-interplanetary-file-system-and-the-filecoin-network/) +- April 2020: [`Talk` IRTF DINRG](https://research.protocol.ai/blog/2020/ipfs-talk-at-the-irtf-decentralised-internet-infrastructure-research-group-meeting) +- March 2020: [`Talk` NDN Project Consortium](https://research.protocol.ai/blog/2020/ndn-seminar-a-high-level-overview-of-the-interplanetary-file-system/) +- Spet 2019 - [`Tutorial` ACM ICN Tutorial: The InterPlanetary File System (IPFS)](https://conferences.sigcomm.org/acm-icn/2019/tutorial-IPFS.php) +- July 2019 - [`Talk` A Brief History of Information-Centric Networks](https://github.com/protocol/research/issues/14#issuecomment-517048457) + + +#### Papers + +- Dec 2020 - [`Paper` Accelerating Content Routing with Bitswap: A multi-path file transfer protocol in IPFS and Filecoin](https://research.protocol.ai/publications/accelerating-content-routing-with-bitswap-a-multi-path-file-transfer-protocol-in-ipfs-and-filecoin/) +- July 2020 - [`Paper` Gossipsub: Attack-resilient message propagation in the Filecoin and ETH2.0 networks](https://research.protocol.ai/publications/gossipsub-attack-resilient-message-propagation-in-the-filecoin-and-eth2.0-networks/) +- July 2020 - [`Paper` - PulsarCast](https://github.com/JGAntunes/pulsarcast/blob/master/paper/paper.pdf) +- Dec 2019 - [`Paper` DClaims: A Censorship Resistant Web Annotations System using IPFS and Ethereum](https://arxiv.org/pdf/1912.03388.pdf) - [zkcapital - paper of the week](https://zkcapital.substack.com/p/this-week-in-blockchain-research-4de) - This paper will be presented http://www.sigapp.org/sac/sac2020/ + +#### Training & Seminars - 2020, Feb - [The effective Distributed Systems developer curriculum, with Golang](https://github.com/protocol/ResNetLab-private/issues/51) - [Curriculum](https://docs.google.com/document/d/1nPcEWymKFAsGJJQiQb5PSiF_2EiAw7GCuLfZMV5r6Y8/edit#heading=h.l73q2rxlx59z) ### Team -- David Dias - Research Engineer & ResNetLab Lead (PI) -- Yiannis Psaras - Research Scientist -- Alfonso de la Rocha - Research Engineer -- Vasilis Giotsas - Research Engineer +- [David Dias](https://research.protocol.ai/authors/david-dias/) - Research Engineer & ResNetLab Lead (PI) +- [Yiannis Psaras](https://research.protocol.ai/authors/yiannis-psaras/) - Research Scientist +- [Alfonso de la Rocha](https://research.protocol.ai/authors/alfonso-delarocha/) - Research Engineer +- [Petar Maymounkov](https://research.protocol.ai/authors/petar-maymounkov/) - Research Scientist +- Adrian Lanzafame - Research Engineer ### Contact -You can reach out to us anytime with your question and interest in these projects by emailing [resnetlab@protocol.ai](mailto:resnetlab@protocol.ai) \ No newline at end of file +You can reach out to us anytime with your question and interest in these projects by emailing [resnetlab@protocol.ai](mailto:resnetlab@protocol.ai)