ARROW: Restoration-Aware Traffic Engineering
Overview
Fiber cut events reduce the capacity of wide-area networks (WANs) by several Tbps. In this paper, we revive the lost capacity by reconfiguring the wavelengths from cut fibers into healthy fibers. We highlight two challenges that made prior solutions impractical and propose a system called Arrow to address them. First, our measurements show that contrary to common belief, in most cases, the lost capacity is only partially restorable. This poses a cross-layer challenge from the Traffic Engineering (TE) perspective that has not been considered before: "Which IP links should be restored and by how much to best match the TE objective?" To address this challenge, Arrow's restoration-aware TE system takes a set of partial restoration candidates (that we call LotteryTickets) as input and proactively finds the best restoration plan. Second, prior work has not considered the reconfiguration latency of amplifiers. However, in practical settings, amplifiers add tens of minutes of reconfiguration delay. To enable fast and practical restoration, Arrow leverages optical noise loading and bypasses amplifier reconfiguration altogether. We evaluate Arrow using large-scale simulations and a testbed. Our testbed demonstrates Arrow's end-to-end restoration latency is eight seconds. Our large-scale simulations compare Arrow to the state-of-the-art TE schemes and show it can support 2.0x-2.4x more demand without compromising 99.99% availability.
Insights
  • 50% of fiber cut events last longer than 9 hours, and 10% of them last over a day.
  • Several Tbps network capacity is lost only because the cut fiber cannot carry light anymore, leaving the related routers ports and transponders sitting idle.
  • Contrary to common belief, in most cases, the lost capacity of fiber cut is only partially restorable, due to limited resources and wavelength continuity constraint on fibers.
  • Arrow tackles a new challenge posed by partial restoration: Which IP links should be restored and by how much to best match the TE objective?
  • Reconfiguring 14 wavelengths from the cut fiber to surrogate fibers can be done within 8 seconds in our wide-area network testbed.
  • Arrow is production-ready TE system, and can support 2.0x-2.4x more demand without compromising 99.99% availability.
  • Talk
    Artifact
    Your browser does not support SVG Code available (under code release process, coming soon): arrow
    Publication
    ARROW: Restoration-Aware Traffic Engineering
    Z. Zhong, M. Ghobadi, A. Khaddaj, J. Leach, Y. Xia, Y. Zhang
    ACM SIGCOMM 2021
    Paper | Slides | Video | Artifact | MIT News | BibTeX
    @inproceedings{zhong2021arrow, title={ARROW: restoration-aware traffic engineering}, author={Zhong, Zhizhen and Ghobadi, Manya and Khaddaj, Alaa and Leach, Jonathan and Xia, Yiting and Zhang, Ying}, booktitle={Proceedings of the 2021 ACM SIGCOMM 2021 Conference}, pages={560--579}, year={2021} }
    Related Publications
    BOW: First Real-World Demonstration of a Bayesian Optimization System for Wavelength Reconfiguration
    Z. Zhong, M. Ghobadi, M. Balandat, S. Katti, A. Kazerouni, J. Leach, M. McKillop, Y. Zhang
    OFC 2021
    Postdeadline Paper
    BOW Project Page | Paper | Slides | Video | Artifact
    @article{bow, author = {Zhong, Zhizhen and Ghobadi, Manya and Balandat, Maximilian and Katti, Sanjeevkumar and Kazerouni, Abbas and Leach, Jonathan and McKillop, Mark and Zhang, Ying}, title = {BOW: First Real-World Demonstration of a Bayesian Optimization System for Wavelength Reconfiguration}, journal = {Under Review}, year = {2021} }
    Press Coverage
    Contributors