The topics we particularly focus on under the “Theoretical Foundations” section include, but not limited to:
Lossless Data Compression
Modeling: Trying to model how the target data is generated. Define the structure of the data, identify the redundancies to be removed.
Coding: Actual coding of the data by removing redundancies according to the model. Huffman coding, arithmetic coding, fixed-to-variable (e.g.,Elias) and variable-to-fixed codes (e.g.,Tunstall), and etc.
Diverse applications of compression: Compression can be used in prediction, classification, identification, and etc. We are interested in finding new application areas of compression other than representing data in small space.
Compressed Data Structures and Processing Directly on Compressed Text
Representing the data structures in space as small as possible without a loss in its functionality e.g., compressed arrays, trees, queues, and lists. Particularly, compressed data structures are used in text indexing, such that the original data can be extracted from its index. Thus, no need to store the original data but only its index, which is in size proportional to the entropy of the source data. In summary, keep your data compressed, and gain the capability of direct access. some of the topics we consider include, but not limited to compact representation of integer sequences, inverted-index compression, and compressed data structures in coding.
Detecting structures, patterns, signatures on data
Detecting structures such as maximal repeats, shortest unique strings, and other transformations, helps to understand the knowledge inside the data. We try to create efficient solutions for such tasks.
Privacy preserving text searching, indexing, and compression
In cloud computing and regarding applications, security and privacy are the top concerns. We investigate methods to keep data secure in compressed form without limiting efficient search capabilities. We also study how to achieve privacy preserving text indexing. We pay special attention to come up with solutions that can really work in practice and ready to be integrated into the applications.