diff --git a/_config.yml b/_config.yml index 77a681c..79d9c92 100644 --- a/_config.yml +++ b/_config.yml @@ -74,7 +74,7 @@ defaults: type: posts values: layout: single - author_profile: false + author_profile: true read_time: true comments: true share: true diff --git a/_data/authors.yml b/_data/authors.yml new file mode 100644 index 0000000..721be7f --- /dev/null +++ b/_data/authors.yml @@ -0,0 +1,23 @@ +Zhenglin Li: + name : "Zhenglin Li" + avatar : "/assets/images/2023-gsoc-contributors-Li.jpg" + links: + - label: "Email" + icon: "fas fa-fw fa-envelope-square" + url: "mailto:lizhenglin2001@gmail.com" + - label: "Website" + icon: "fas fa-fw fa-link" + url: "https://zhenglinli.me/" + - label: "LinkedIn" + icon: "fab fa-fw fa-linkedin" + url: "https://www.linkedin.com/in/zhenglin-li" +Yutan Yang: + name : "Yutan Yang" + avatar : "/assets/imgs-yutan-midterm/yutan-avatar.jpg" + links: + - label: "Email" + icon: "fas fa-fw fa-envelope-square" + url: "colin.young.taro@gmail.com" + - label: "github page" + icon: "fas fa-fw fa-link" + url: "https://github.com/ColinYoungTaro" \ No newline at end of file diff --git a/_posts/2023-06-04-gsoc-sqlancer.md b/_posts/2023-06-04-gsoc-sqlancer.md index ec8f775..dd3ac80 100644 --- a/_posts/2023-06-04-gsoc-sqlancer.md +++ b/_posts/2023-06-04-gsoc-sqlancer.md @@ -9,7 +9,7 @@ tags: ![alt]({{ site.url }}{{ site.baseurl }}/assets/images/2023-gsoc-contributors.jpg) -For the first time, SQLancer is participating as a [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com/programs/2023/organizations/sqlancer) organization. We are excited that we will be working with two contributors: [Zhenglin Li](https://github.com/HT-Tomas) and [Yutan Yang](https://github.com/ColinYoungTaro). Find out more about them below! +For the first time, SQLancer is participating as a [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com/programs/2023/organizations/sqlancer) organization. We are excited that we will be working with two contributors: [Zhenglin Li](https://github.com/ZhengLin-Li) and [Yutan Yang](https://github.com/ColinYoungTaro). Find out more about them below! # Yutan Yang diff --git a/_posts/2023-07-22-gsoc-sqlancer-final.md b/_posts/2023-07-22-gsoc-sqlancer-final.md new file mode 100644 index 0000000..0deaf0a --- /dev/null +++ b/_posts/2023-07-22-gsoc-sqlancer-final.md @@ -0,0 +1,81 @@ +--- +title: 'GSoC 2023: Final Report of test-case reduction' +date: 2023-8-30 +author: Yutan Yang +categories: + - blog +tags: + - gsoc +--- + +## Overview + +SQLancer generates a large number of statements, but not all of them are relevant to the bug. To automatically reduce the test cases, I implemented two reducers in SQLancer, the statement reducer and the AST-based reducer. + + +## Statement Reducer + +The statement reducer utilizes the delta-debugging technique, which is designed for efficient test case reduction, to reduce the statements. The set of bug-inducing test cases is divided into subsets. Each subset is then individually removed. If the bug is still triggered without a specific subset, this subset is considered to be irrelevant and thus can be eliminated. Conversely, if removing any subset does not reproduce the bug, the reducer will divide test cases into more subsets and repeat the process. This iterative process continues until the set cannot be further divided into more subsets. + +For more details, you can refer to the paper: [Simplifying and Isolating Failure-Inducing Input](https://www.cs.purdue.edu/homes/xyzhang/fall07/Papers/delta-debugging.pdf) or [this video tutorial](https://youtu.be/lGe2-y1xibY). + +Delta-debugging offers the advantage of potentially removing a significant number of statements in one turn, making it particularly efficient when the bug-inducing statements are sparsely distributed. + +Using the statement reducer, SQLancer reduces the set of statements to a minimal subset that reproduces the bug. + +## AST-Based Reducer + +The AST-based reducer can shorten a statement by applying AST level transformations. They are mostly implemented using [JSQLParser](https://github.com/JSQLParser/JSqlParser), a RDBMS agnostic SQL statement parser that can translate SQL statements into a traversable hierarchy of Java classes. JSQLParser provides support for the SQL standard as well as major SQL dialects. The AST-based reducer works for any SQL dialects that can be parsed by this tool. + +Currently, the AST-based transformations include: + ++ Remove union selects. e.g., + + `SELECT 1 UNION SELECT 2` -> `SELECT 1` + ++ Remove irrelevant clauses. e.g., + + `SELECT * FROM t OFFSET 20 LIMIT 5` -> `SELECT * FROM t` + ++ Remove list elements. e.g., + + `SELECT a, b, c FROM t` -> `SELECT a FROM t` + ++ Remove rows of an insert statement. e.g., + + `INSERT INTO t VALUES (1, 2), (3, 4)` -> `INSERT INTO t VALUES (1, 2)` + ++ Replace complicated expressions with their sub expressions. e.g., + + `(a+b)+c` -> `c` + ++ Simplify constant values. e.g., + + `3.27842156` -> `3.278` + +The AST-based reducer is designed with extensibility. It walks through the list of transformations and applies them to statements until a fixed point is reached. Adding new transformations is straightforward. Simply create a new transformation and include it in the list. This flexibility allows for easy customization and expansion of the reducer's functionality. Additionally, it is also possible to support transformations that do not depend on JSQLParser. + + + +## Framework for reducers Testing + +I designed a virtual database engine to facilitate the testing of reducers. Instead of executing the statements, this engine records them for analysis. One of its notable features is the ability to customize the conditions that trigger a bug. For instance, the testing framework allows specifying an interestingness check that tests whether certain words are part of the reduced SQL test case. + +This approach provides a testing environment and allows for thorough evaluation of the reducers' performance. It offers convenience and flexibility in refining and polishing the functionality of the reducers without impacting real databases. + +## Reduction logs + +If test-case reduction is enabled, each time the reducer performs a reduction step successfully,it prints the reduced statements to the log file, overwriting the previous ones. + +The log files will be stored in the following format: `logs//reduce/-reduce.log`. For instance, if the tested DBMS is SQLite3 and the current database is named database0, the log file will be located at `logs/sqlite3/reduce/database0-reduce.log`. + +## Usage +Test-case reduction is disabled by default. The reducers only works for DBMSs that have implemented the `Reproducer` class. + +The statement reducer can be enabled by passing `--use-reducer` when starting SQLancer. If you wish to further shorten each statements, you need to additionally pass the `--reduce-ast` parameter so that the AST-based reduction is applied. + +Note: if `--reduce-ast` is set, `--use-reducer` option must be enabled first. + +There are also options to define timeout seconds and max steps of reduction for both statement reducer and AST-based reducer. + +For more details, you can refer to this [doc](https://github.com/sqlancer/sqlancer/blob/7804a3adec0962ad6d24687c42ec473aa49669fe/docs/testCaseReduction.md). \ No newline at end of file diff --git a/_posts/2023-07-22-gsoc-sqlancer-midterm-Yutan.md b/_posts/2023-07-22-gsoc-sqlancer-midterm-Yutan.md new file mode 100644 index 0000000..46753b0 --- /dev/null +++ b/_posts/2023-07-22-gsoc-sqlancer-midterm-Yutan.md @@ -0,0 +1,120 @@ +--- +title: 'GSoC 2023: Midterm Report of test case reduction' +date: 2023-07-30 +author: Yutan Yang +categories: + - blog +tags: + - gsoc +--- +# Overview + +SQLancer generates a large number of statements, but not all of them are relevant to the bug. To automatically reduce the test cases, I implemented two reducers in SQLancer, the statement reducer and the AST-based reducer. +
+ logs +
+
randomly generated statements
+
+ +# Statement Reducer + +The statement reducer utilizes the delta-debugging technique, which is designed for efficient test case reduction, to reduce the statements. The set of bug-inducing test cases is divided into subsets. Each subset is then individually removed. If the bug is still triggered without a specific subset, this subset is considered to be irrelevant and thus could be eliminated. Conversely, if removing any subset does not reproduce the bug, the reducer will divide test cases into more subsets and repeat the process. This iterative process continues until the set cannot be further divided into more subsets. + +For more details of delta-debugging, you can refer to the paper: [Simplifying and Isolating Failure-Inducing Input](https://www.cs.purdue.edu/homes/xyzhang/fall07/Papers/delta-debugging.pdf). This [video tutorial](https://youtu.be/lGe2-y1xibY) also helps. + +Delta-debugging offers the advantage of potentially removing a significant number of statements in one turn, making it particularly efficient when the bug-inducing statements are sparsely distributed. Using the statement reducer, SQLancer could effectively reduce the set of statements to a minimal subset that reproduces the bug. + +An earlier version of sqlite3 (version 3.28.0) are used to test the statement reducer. An illustrative example is provided below. Given a test case containing thousands of statements, the reducer was able to effectively remove irrelevant ones and retained only four bug inducing statements. + +
+ logs +
+
+ A bug inducing test case for sqlite3 version 3.28.0
+
+ +```sql +-- reduced queries: +CREATE VIRTUAL TABLE rt0 USING rtree(c0, c1, c2); +CREATE VIRTUAL TABLE vt1 USING fts4(c0, compress=likely, uncompress=likely); +INSERT OR FAIL INTO vt1 VALUES ('g&'), (0.16215918433687637), ('1帬trBWP?'); +INSERT OR IGNORE INTO rt0(c1) VALUES (x''); +``` + + + +# AST-Based Reducer + +After the statement reducing pass, the rest of the statements could still be long and complex. The AST-based reducer focuses on eliminating redundant or unnecessary parts of queries. +
+ unreduced +
+
+ a complicated statement
+
+ + +This reducer operates by parsing the SQL string into an abstract syntax tree (AST) using [JSQLParser](https://github.com/JSQLParser/JSqlParser), which supports the SQL standard as well as major RDBMS, and recursively visiting each AST node to apply transformations. These transformations include removing unnecessary clauses, irrelevant elements in a list, and replacing complex expressions with simpler ones. The reduced queries are then executed, and if the bug is still triggered, the transformation is retained. +
+ ASTreduced +
+
result of AST-based reduction
+
+ +Currently the reducer only performs generic transformations which works for any SQL dialects. However, different dialects may require special handling for unique features. Thus, expanding the capabilities of this reducer to provide support for various SQL dialects would be part of the future work. + +# Reducers Testing + +A virtual database engine is designed to test the functionality of reducers. Instead of really executing, the engine would record the input statements. The bug inducing condition can be customized to check the recorded statements, making it easy and convenient to observe if the reducers work as expected. For example, we could define that the coexistence of certain lines would result in a bug and see if the reducer isolated them. +
+ test-statement-reducer +
+
test statement reduction via virtual engine
+
+We also apply the reducers to the old version of DBMSs to observe its capabilities in the real world. + +# Usage + +The statement reducer is now available. The AST-based reducer, however, is still experimental and has not been integrated into the project. + +This feature is disabled by default. To enable the reducer (actually only the statement reducer), you could add `–use-reducer` option when start SQLancer. Also, it supports defining limits of time and reducing steps by using `statement-reducer-max-steps=` and `statement-reducer-max-time=` + +# Challenges of Reducing + ++ SQL dialects + + The dialects vary among DBMSs. At present, the AST-based reducer only offers support for generic transformations that can be applied to any dialect. However, accommodating distinct SQL syntaxes will be included in the future development plans. + ++ Enhancing extendibility for AST-based reducer + + Rules of transformation are defined to shorten an SQL statement. Ensuring the ease of defining and integrating new rules of transformation would be highly advantageous as well as challenging. diff --git a/_posts/2023-07-22-gsoc-sqlancer-midterm-zhenglin.md b/_posts/2023-07-22-gsoc-sqlancer-midterm-zhenglin.md new file mode 100644 index 0000000..cb526d3 --- /dev/null +++ b/_posts/2023-07-22-gsoc-sqlancer-midterm-zhenglin.md @@ -0,0 +1,88 @@ +--- +title: 'GSoC 2023: Midterm Report on Support of StoneDB' +date: 2023-06-04 +author: Zhenglin Li +categories: + - blog +tags: + - gsoc +--- + +# Support of StoneDB Midterm Report + +SQLancer is an open-source tool for testing the correctness of SQL database systems and supports close to 20 database systems. The goal of this project is to add support for StoneDB to SQLancer and test StoneDB to find potential bugs. + +StoneDB is an open-source hybrid transaction/analytical processing (HTAP) database designed and developed by StoneAtom based on the MySQL kernel. It provides features such as high efficiency and real-time analytics, offering a one-stop solution to process online transaction processing (OLTP), online analytical processing (OLAP), and HTAP workloads. + +## Things Achieved + +See the detailed description of supported syntax here: [Functions Supported by SQLancer for StoneDB](https://docs.google.com/document/d/12OpiDYs_Civor-saKZFmZPZd5ElVJAc9RDpDIwikh9Y/edit?usp=sharing) + +See the detailed description of bugs found here: [Bugs Found in StoneDB by SQLancer](https://docs.google.com/document/d/1N-oUGVATV0l6tG87uOtPNmfLS7g_fuo7HIckFobD-Yo/edit?usp=sharing) + +## Encountered Challenges and Solutions + +This section will cover challenges encountered and the way I addressed them. + +### Choose which StoneDB version to support + +During the **initial stages** of our project, our plan was to support the **latest version** of StoneDB, which is **version 8.0**. + +However, we encountered several challenges that hindered our ability to work with the 8.0 version. Firstly, there was no available Docker image to facilitate the easy execution of the 8.0 version. Additionally, building StoneDB from source code presented difficulties related to missing packages and incompatible versions within the environment. + +Considering these obstacles, we made the decision to **prioritize the support for StoneDB version 5.7 initially**. And by the midterm evaluation, we have successfully implemented the basic support of 5.7 version. + +Looking ahead, as we **plan to extend our support to the 8.0 version**, we have identified **two potential approaches**. + +1. If the **differences** between the 5.7 and 8.0 versions are **minimal**, we can enhance the existing implementation by introducing a **parameter flag** that handles the version-specific variations. This approach allows us to maintain a single unified implementation that can accommodate both versions. +2. If the **differences** between the versions are **large**, we need to **implement separate code paths** for each version, treating them as distinct entities. + +### Which generator to use: typed or untyped + +Looking at the three files: + +- [sqlancer/common/gen/ExpressionGenerator.java](https://github.com/sqlancer/sqlancer/blob/main/src/sqlancer/common/gen/ExpressionGenerator.java) +- [sqlancer/common/gen/TypedExpressionGenerator.java](https://github.com/sqlancer/sqlancer/blob/main/src/sqlancer/common/gen/TypedExpressionGenerator.java) +- [sqlancer/common/gen/UntypedExpressionGenerator.java](https://github.com/sqlancer/sqlancer/blob/main/src/sqlancer/common/gen/UntypedExpressionGenerator.java) + +The `ExpressionGenerator` is an interface, and `TypedExpressionGenerator` and `UntypedExpressionGenerator` are abstract classes that implement the `ExpressionGenerator` interface. + +When implementing the concrete generator, we have to decide which generator to inherit from, the `TypedExpressionGenerator` or the `UntypedExpressionGenerator`. + +A **typed generator** is a generator which is **associated with a specific data type or set of data types**. One characteristic is that it allows for generating values of the type that is expected in a specific context. On the other hand, an **untyped generator** **does not** have a specific data type associated with it. It can yield values of any type. + +The choice between using a typed or untyped generator **depends on the requirements of the DBMS under test**. Typed generators provide the advantage of generating expected types when this is important. Untyped generators, on the other hand, offer more **flexibility** and can be useful in scenarios where the type of yielded values may vary or is not known in advance. + +Ultimately, I decided to use the untyped generator because it can yield values of any type, allowing us to test the support of unexpected types of a DBMS. + +### Read and understand the code of the implementation of other DBMS in SQLancer + +During the initial phase of our project, I delved into the codebase of various DBMS implementations within SQLancer to gain a comprehensive understanding. + +The first challenge I encountered is due to the limited comments within the code. Consequently, I had to rely on studying the source code to comprehend the design and functionality of classes, methods, and functions. The good news is though the implementation details may differ from one database to another, there are still some common rules to follow. So, I read the code of many other DBMS and identified the common parts and started my implementation. + +## Interaction With StoneDB Community + +This section will cover our interaction with the StoneDB community. + +We actively engaged with the StoneDB community to seek assistance, share our findings, and collaborate on improving the system. Our interaction primarily took place through the **Slack channel** and **GitHub platform**. + +In addition to Slack, we also utilized the issue and discussion sections on GitHub to communicate with the StoneDB community. All questions or comments got replies. For example, + +- They answered our question here: [stonedb-8.0-v1.0.1-beta](https://github.com/orgs/stoneatom/discussions/1849#discussioncomment-6138954) +- They fixed [the docker size bug](https://github.com/stoneatom/stonedb/issues/1923) we found previously +- They are trying to fix the [bad dpn index when deleting rows bug](https://github.com/stoneatom/stonedb/issues/1933) +- They are trying to fix the [crash: StoneDB crash when executing the command (SELECT, HAING, GROUP BY, IS NULL)](https://github.com/stoneatom/stonedb/issues/1941) +- They are trying to fix the [bug: query result not correct](https://github.com/stoneatom/stonedb/issues/1942) + +## Future Work + +This section will cover opportunities for potential future changes. + +1. Test StoneDB, add more expected errors, and report potential bugs. + +2. Support more statement categories and syntax. For example, referring to the docs of MySQL, there are some optional arguments that we may not currently support. We can support them in the future. + +3. Another example is that, we can support more functions, refer to: [https://stonedb.io/docs/SQL-reference/functions/advanced-functions](https://stonedb.io/docs/SQL-reference/functions/advanced-functions) + +4. Refine the code implementation. For example, we can discuss if we can reuse some code that is common to all DBMS implementations. diff --git a/assets/images/2023-gsoc-contributors-Zhenglin.jpg b/assets/images/2023-gsoc-contributors-Zhenglin.jpg new file mode 100644 index 0000000..c36d392 Binary files /dev/null and b/assets/images/2023-gsoc-contributors-Zhenglin.jpg differ diff --git a/assets/imgs-yutan-midterm/ASTreduced.png b/assets/imgs-yutan-midterm/ASTreduced.png new file mode 100644 index 0000000..5c935d5 Binary files /dev/null and b/assets/imgs-yutan-midterm/ASTreduced.png differ diff --git a/assets/imgs-yutan-midterm/logs.png b/assets/imgs-yutan-midterm/logs.png new file mode 100644 index 0000000..822a63f Binary files /dev/null and b/assets/imgs-yutan-midterm/logs.png differ diff --git a/assets/imgs-yutan-midterm/sqlite3-original.png b/assets/imgs-yutan-midterm/sqlite3-original.png new file mode 100644 index 0000000..1f51a4d Binary files /dev/null and b/assets/imgs-yutan-midterm/sqlite3-original.png differ diff --git a/assets/imgs-yutan-midterm/test-statement-reducer.png b/assets/imgs-yutan-midterm/test-statement-reducer.png new file mode 100644 index 0000000..b824b98 Binary files /dev/null and b/assets/imgs-yutan-midterm/test-statement-reducer.png differ diff --git a/assets/imgs-yutan-midterm/unreduced.png b/assets/imgs-yutan-midterm/unreduced.png new file mode 100644 index 0000000..e33b371 Binary files /dev/null and b/assets/imgs-yutan-midterm/unreduced.png differ diff --git a/assets/imgs-yutan-midterm/yutan-avatar.jpg b/assets/imgs-yutan-midterm/yutan-avatar.jpg new file mode 100644 index 0000000..606f06b Binary files /dev/null and b/assets/imgs-yutan-midterm/yutan-avatar.jpg differ