skip to main content
article

Efficient and Reusable Lazy Sampling

Published: 14 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Modern analytical engines rely on Approximate Query Processing (AQP) to provide faster response times than the hardware allows for exact query answering. However, existing AQP methods impose steep performance penalties as workload unpredictability increases. While offline AQP relies on predictable workloads to a priori create samples that match the queries, as soon as workload predictability diminishes, returning to existing online AQP methods that create query-specific samples with little reuse across queries results in significantly smaller gains in response times. As a result, existing approaches cannot fully exploit the benefits of sampling under increased unpredictability.
    We propose LAQy, a framework for building, expanding, and merging samples to adapt to the changes in workload predicates. We propose lazy sampling to overcome the unpredictability issues that cause fast-but-specialized samples to be query-specific and design it for a scale-up analytical engine to show the adaptivity and practicality of our framework in a modern system. LAQy speeds up online sampling processing as a function of data access and computation reuse, making sampler placement after expensive operators more practical.

    References

    [1]
    P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In M. Benedikt, M. Kr¨otzsch, and M. Lenzerini, editors, Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20--24, 2012, pages 23--34. ACM, 2012.
    [2]
    S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: Queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, page 29--42, New York, NY, USA, 2013. Association for Computing Machinery.
    [3]
    A. Birler, B. Radke, and T. Neumann. Concurrent online sampling for all, for free. In Proceedings of the 16th International Workshop on Data Management on New Hardware, DaMoN '20, New York, NY, USA, 2020. Association for Computing Machinery.
    [4]
    M. T. Chao. A general purpose unequal probability sampling plan. Biometrika, 69(3):653--656, 12 1982.
    [5]
    S. Chaudhuri, G. Das, and U. Srivastava. Effective use of block-level sampling in statistics estimation. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, page 287--298, New York, NY, USA, 2004. Association for Computing Machinery.
    [6]
    S. Chaudhuri, B. Ding, and S. Kandula. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, page 511--519, New York, NY, USA, 2017. Association for Computing Machinery.
    [7]
    P. Chrysogelos, M. Karpathiotakis, R. Appuswamy, and A. Ailamaki. Hetexchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proc. VLDB Endow., 12(5):544--556, 2019.
    [8]
    G. Cormode, M. N. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Found. Trends Databases, 4(1--3):1--294, 2012.
    [9]
    J. a. Gama, I. "Zliobaitundefined, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM Comput. Surv., 46(4), mar 2014.
    [10]
    P. B. Gibbons, V. Poosala, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. Technical report, Technical report, Bell Labs, 1998.
    [11]
    H. Harmouch and F. Naumann. Cardinality estimation: An experimental survey. Proc. VLDB Endow., 11(4):499--512, Dec. 2017.
    [12]
    B. Hilprecht, A. Schmidt, M. Kulessa, A. Molina, K. Kersting, and C. Binnig. Deepdb: Learn from data, not from queries! Proc. VLDB Endow., 13(7):992--1005, Mar. 2020.
    [13]
    S. Igescu, V. Sanca, E. Zapridou, and A. Ailamaki. Improving k-means clustering using speculation. In R. Bordawekar, C. Cappiello, V. Efthymiou, L. Ehrlinger, V. Gadepally, S. Galhotra, S. Geisler, S. Groppe, L. Gruenwald, A. Y. Halevy, H. Harmouch, O. Hassanzadeh, I. F. Ilyas, E. Jim´enez-Ruiz, S. Krishnan, T. Lahiri, G. Li, J. Lu, W. Mauerer, U. F. Minhas, F. Naumann, M. T. ¨Ozsu, E. K. Rezig, K. Srinivas, M. Stonebraker, S. R. Valluri, M. Vidal, H. Wang, J. Wang, Y. Wu, X. Xue, M. Za¨?t, and K. Zeng, editors, Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28 - September 1, 2023, volume 3462 of CEUR Workshop Proceedings. CEUR-WS.org, 2023.
    [14]
    S. Kandula, K. Lee, S. Chaudhuri, and M. Friedman. Experiences with approximating queries in microsoft's production big-data clusters. Proc. VLDB Endow., 12(12):2131--2142, Aug. 2019.
    [15]
    S. Kandula, A. Shanbhag, A. Vitorovic, M. Olma, R. Grandl, S. Chaudhuri, and B. Ding. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, page 631--646, New York, NY, USA, 2016. Association for Computing Machinery.
    [16]
    M. Karpathiotakis, I. Alagiannis, and A. Ailamaki. Fast queries over heterogeneous data through engine customization. Proc. VLDB Endow., 9(12):972--983, 2016.
    [17]
    A. Kim, E. Blais, A. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. Proc. VLDB Endow., 8(5):521--532, Jan. 2015.
    [18]
    T. Kraska. Approximate query processing for interactive data science. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, page 525, New York, NY, USA, 2017. Association for Computing Machinery.
    [19]
    K. Li and G. Li. Approximate query processing: What is new and where to go? - A survey on approximate query processing. Data Sci. Eng., 3(4):379--397, 2018.
    [20]
    Q. Ma and P. Triantafillou. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, page 1553--1570, New York, NY, USA, 2019. Association for Computing Machinery.
    [21]
    P. Menon, A. Ngom, T. C. Mowry, A. Pavlo, and L. Ma. Permutable compiled queries: Dynamically adapting compiled queries without recompiling. Proc. VLDB Endow., 14(2):101--113, 2020.
    [22]
    R. B. Miller. Response time in man-computer conversational transactions. In American Federation of Information Processing Societies: Proceedings of the AFIPS '68 Fall Joint Computer Conference, December 9--11, 1968, San Francisco, California, USA - Part I, volume 33 of AFIPS Conference Proceedings, pages 267--277. AFIPS / ACM / Thomson Book Company, Washington D.C., 1968.
    [23]
    T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4(9):539--550, June 2011.
    [24]
    M. Olma, O. Papapetrou, R. Appuswamy, and A. Ailamaki. Taster: Self-tuning, elastic and online approximate query processing. In 35th IEEE
    [25]
    P. O'Neil, E. O'Neil, X. Chen, and S. Revilak. The Star Schema Benchmark and Augmented Fact Table Indexing, page 237--252. Springer-Verlag, Berlin, Heidelberg, 2009.
    [26]
    Y. Park, B. Mozafari, J. Sorenson, and J. Wang. Verdictdb: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, page 1461--1476, New York, NY, USA, 2018. Association for Computing Machinery.
    [27]
    J. Peng, D. Zhang, J. Wang, and J. Pei. Aqp++: Connecting approximate query processing with aggregate precomputation for interactive analytics. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, page 1477--1492, New York, NY, USA, 2018. Association for Computing Machinery.
    [28]
    A. Raza, P. Chrysogelos, A. G. Anadiotis, and A. Ailamaki. Adaptive HTAP through elastic resource scheduling. In D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, and H. Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, pages 2043--2054. ACM, 2020.
    [29]
    V. Sanca and A. Ailamaki. Sampling-based AQP in modern analytical engines. In DaMoN, pages 4:1--4:8. ACM, 2022.
    [30]
    B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the 1996 IEEE Symposium on Visual Languages, VL '96, page 336, USA, 1996. IEEE Computer Society.
    [31]
    P. Sioulas, V. Sanca, I. Mytilinis, and A. Ailamaki. Accelerating complex analytics using speculation. In CIDR, 2021.
    [32]
    A. Tahmasbi, E. Jothimurugesan, S. Tirthapura, and P. B. Gibbons. Driftsurf: Stable-state / reactive-state learning under concept drift. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 10054--10064. PMLR, 2021.
    [33]
    C. Wyman. Ray Tracing Gems II: Next Generation Real-Time Rendering with DXR, Vulkan, and OptiX, chapter 22, Weighted Reservoir Sampling: Randomly Sampling Streams, pages 345--349. Apress, Berkeley, CA, 2021.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 53, Issue 1
    March 2024
    90 pages
    ISSN:0163-5808
    DOI:10.1145/3665252
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 May 2024
    Published in SIGMOD Volume 53, Issue 1

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 28
      Total Downloads
    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)17

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media