Project

General

Profile

Actions

MFS4UDB

MFS4UDB (Minimal Failing Subqueries 4 Uncertain DataBases) deals with the empty-answer problem. A large number of applications manage uncertain data and usually, users expect high-quality results when they pose queries with strict conditions over these data. However, their queries may be failing as they may not be clear about the contents of the database (i.e., they may return no result or results that do not satisfy the expected degree of certainty).

In this project, we deal with this problem in the field of uncertain databases, by proposing efficient approaches that identify a set of subqueries called Minimal Failing Subqueries (MFSs) and maXimal Succeeding Subqueries (XSSs). We propose two new approaches: MBS and MDMB to compute such cooperative answers and compare them to some related work alternatives.

Software requirements

  • Java version >= 8.
  • PostgreSQL >= 9.x.
  • Linux OS (if you want to generate dataset yourself).
  • Sparse Hypergraph Dualization Algorithm SHD (for the use of MDMB approach). Download here.

Download

MFS4UDB can be downloaded here.

The archive file (mfs4udb.zip) is composed of:

  • mfs4udb.jar: a jar file that contains both the MFS4UDB binary code and all required dependencies (i.e. PostgreSQL JDBC driver),
  • mfs4udb-dataset.sql: a ready to use dataset with 100 rows,
  • install-randdataset.sh (optional): a bash script to install Random dataset generator for SKYLINE operator evaluation. Find it here,
  • mfs4udb-generate.sh (optional): a bash script to generate a new dataset,
  • mfs4udb-realdataset.sh (optional): a SQL script for inserting a real dataset.

Prepare and insert dataset into PostgreSQL

You can start the execution by using the ready to use dataset (100 rows) or by generating a new dataset.

From a ready to use dataset

  • Import mfs4udb-dataset.sql dump into PostgreSQL database. Note: $USER_POSTGRES must be replaced by your PostgreSQL user.
$ psql -U $USER_POSTGRES postgres < mfs4udb-dataset.sql

From scratch

  • Install randdataset tool
$ sudo ./install-randdataset.sh
  • Create dataset by using the following command (100 is the number of rows generated, $USER_POSTGRES is your PostgreSQL user and $PASSWORD_POSTGRES your PostgreSQL password).
$ ./randdataset-generate.sh 100 $USER_POSTGRES $PASSWORD_POSTGRES

From a real dataset

For reproducibility of the experiments, we supply three datasets (Weather, NBA and House) obtained from the SkyBench GitHub repository.

  • Download the datasets
$ wget https://raw.githubusercontent.com/sean-chester/SkyBench/master/workloads/elv_weather-U-15-566268.csv
$ wget https://raw.githubusercontent.com/sean-chester/SkyBench/master/workloads/house-U-6-127931.csv
$ wget https://raw.githubusercontent.com/sean-chester/SkyBench/master/workloads/nba-U-8-17264.csv
  • Import mfs4udb-realdataset.sql SQL script into PostgreSQL database. Note: $USER_POSTGRES must be replaced by your PostgreSQL user.
$ psql -U $USER_POSTGRES postgres < mfs4udb-realdataset.sql

Execute the algorithms

Step 1: classpath Initialization instructions

  • Create a Java project from your favorite IDE.
  • Copy mfs4udb.jar file into the root of your Java project.
  • Add mfs4udb.jar file to the classpath.

Step 2: create class instructions

  • Create a Java class and copy the following content.
public class MFS4UDBSample
  public static void main(String[] args) throws Exception {
    Class.forName("org.postgresql.Driver");
    Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5434/postgres", $USER_POSTGRES, $PASSWORD_POSTGRES);

    char[] chars = { 'a', 'c', 'i' };
    for (int i = 0; i < chars.length; i++) {
      BufferedWriter fichier = new BufferedWriter(
      new FileWriter("exp1-" + chars[i] + ".csv"));
      Query q = new Query("P1 < 0.1 AND P2 < 0.1 AND P3 < 0.1 AND P4 < 0.1", "lasttab" + chars[i], c);
      List<Query> foundMFS = q.$MFS4UDB_ALGO
      fichier.write(foundMFS.size() + "\n");
      fichier.close();
    }
  }
}
  • Adapt the content to replace $USER_POSTGRES by your PostgreSQL user and $PASSWORD_POSTGRES by your PostgreSQL password.
  • For $MFS4UDB_ALGO select the algorithm that you want to run (degree is the expected degree of certainty and useMatrix specifies whether the matrix optimization should be used):
    • getAllMFSWithDFS(double degree, boolean useMatrix): a depth-first search algorithm for traversing the subquery lattice for uncertain databases.
    • getAllMFSWithLBA(double degree, boolean useMatrix): the algorithm published in [1] adapted for uncertain databases.
    • getAllMFSWithMCS(double degree, boolean useMatrix): the algorithm published in [2] adapted for uncertain databases.
    • getAllMFSWithMBS(double degree): our proposed algorithm MBS.
    • getAllMFSWithMDMB(double degree, int indice (default 1), String SHD_PATH): our proposed algorithm MDMB. SHD_PATH is the filesystem path where SHD tool is installed. Please refer the Software Requirements section to install and configure SHD tool.

[1] Fokou, G., Jean, S., Hadjali, A., Baron, M.: Handling Failing RDF Queries: From Diagnosis to Relaxation. KAIS 2016.
[2] McSherry, D.: Incremental Relaxation of Unsuccessful Queries. In: Advances in Case-Based Reasoning. 2004 131–148.

  • You can change the query. Default value is: P1 < 0.1 AND P2 < 0.1 AND P3 < 0.1 AND P4 < 0.1

Step 3: results

  • All results are supplied into three files: exp1-a.csv, exp1-c.csv and exp1-i.csv.

Publications

Historic Contributors

Code Analysis

  • Lines of Code: 1500
  • Programming Languages: Java and shell Bash

Updated by Mickael BARON about 3 years ago · 73 revisions