Generating and comparing low-rate deletion correcting codes.
“Random-Greedy” is the short name for this project, aiming to generate and compare low-rate deletion correcting codes. This short name is also the name of out main artifact, a simple algorithm that provides better deletion correcting abilities, compared to other codes we checked. In the project, eight methodologies for generating codes (some of them with meta-parameters) were created, and compared for their deletion correcting abilities (so no insertions nor substitutions are allowed). The codes are low-rate, so each codeword has $n$ bits in total, and $O(log(n))$ of them are information bits. We achieve this by generating the codes with the number of codewords to be equal to the codeword length. The full description of the project is available in the project report PDF file.
This work is submitted as the final project in the course “Coding and Algorithms for Memories” (236379), at Taub Faculty
of Computer Science, Technion - Israel Institute of Technology. The project was written by Orel Adivi
(orel.adivi [at] cs.technion.ac.il)
and Daniel Noor (daniel.noor [at] cs.technion.ac.il)
, and under the supervision
of Daniella Bar-Lev and associate professor Eitan Yaakobi. The work was done in semester winter 2022-2023. The project
is released under MIT license.
In order to generate the codes and use them, an installation of
CPython 3.10
is required (the implementation is platform
independent, and was tested on both Windows and Linux).
The project uses NumPy
, Matplotlib
, overrides
, pylcs
, and tqdm
Python libraries, which can be installed using
Pip
package installer:
python -m pip install -r requirements.txt
Then, the codes can be imported, generated and used. For example, for generating RandomGreedyCode
of length 100 and
with 2 options for each selection point, and using it, the following Python
code can be used:
import numpy as np
from codes.RandomGreedyCode import RandomGreedyCode
my_code = RandomGreedyCode(length=100, options=2) # generate the code
print(f'Codeword for the value 0:\t{my_code.encode(24)}') # encode a value
print(f'Value for a codeword:\t{my_code.decode(np.zeros((100,)))}') # decode a codeword
print(f'Maximal number of deletions:\t{my_code.max_deletions()}') # calculate deletion-distance
Other codes can be used similarly, as they share the interface defined in the
Code
class.
In this project, eight code classes are available:
RandomCode
– this class generates code
by selecting serially random codewords that were not previously chosen.GreedyCode
– this class generates code
selecting codewords to be as different as possible, by flipping bits during a serial traversal of the previouslyLinSpaceCode
– this class generates
code by repeating a pattern of alternating $0$-s and $1$-s with a length that increases linearly.LogSpaceCode
– this class generates
code by repeating a pattern of alternating pattern_false
-s ($0$ is the default) and pattern_true
-s ($1$ is the
default) with a length that increases linearly.RepetitionCode
– This class
generates code by taking all words of length $log(n)$ for $n$ the codeword length, and repeating each letter $n/log(n)$
times, thus generating codewords of length $n$.VTRepetitionCode
– this class
generates code by taking all words from $VT_0(m)$ for a given parameter m
and multiplying each letter $n/m$ times.VTRepetitionNaryCode
– this
class generates code by taking all words from $VT_0(m, q)$ for given parameter m
and a base q
, converting the words
to binary and multiplying each letter $n/m$ times.RandomGreedyCode
– this class
generates code serially by generating a set of codeword candidates of size options
, and then selecting the candidate
with the largest distance from the previous codewords.All these classes are derived from the class
Code
, which allows the selection of codeword
length in the construction (with the parameter length
) and provides the methods encode
(for encoding a value),
decode
(for decoding a codeword), and max_deletions (for calculating the maximal number of
deletions allowed).
In the utils
folder, additional classes and functions,
used in the implementation of the different codes, are provided:
LevenshteinDistance.py
– this
file contains a function that computes Levenshtein deletion distance between an expected string and a given string.LongestCommonSubsequence.py
–
contains a function that computes the length of the longest common subsequence of given two strings (the function uses a
dynamic programming algorithm, and was replaced with the implementation in pylcs
).VTCode.py
– contains a VT code generator,
based on an implementation from a previous work
.In order to compare the different code generation methodologies, and to tune meta-parameters for the relevant codes, we
wrote a Python
script that runs these experiments, and another Python
script that generates the figures from these
experiments. For running the two files, the following script can be executed:
python Experiments.py
python GenerateFigures.py
Running the experiments is expected to last several hours, so we also run this script in a
GitHub action. Additionally, the results are
available in the artifacts
folder. Furthermore,
for testing the utils
folder files, we wrote a test file
that can be executed using the following command:
python -m unittest Unittests.py
In the execution of the first two files, the following figures are generated:
VTRepetitionNaryCode
in different
bases.RandomGreedyCode
with different
number of options to choose from in each iteration.The figures are described and discussed in detailed in the project report PDF file.
The project was designed in accordance with the object-oriented programming (OOP) principles. For documentation, a website was created and a SUPPORT.md file was written. The project was written using PyCharm Professional and was managed using GitHub.
In order to ensure the correctness of commits sent to the GitHub server, a continuous integration pipeline was set. These checks are run automatically for each pull request and each push. The following actions were set:
1) Run experiments - this action
runs all the experiments and generates the related figures.
2) Run unittests - this action runs all
the unittests for the utils
folder files.
3) Style check - this action performs
basic checks of the Python
files for detecting syntax errors.
4) Compile report - this action compiles
the LaTeX file to the
PDF report file.
5) Website - this action updates the
Random-Greedy website with the content of this
README.md
file.
6) Vulnerability check - this
action help managing the Python
code and its dependencies.
7) Dependency review - this
action help managing the Python
code dependencies for pushes.
8) Dependabot - this action helps
update the versions of the dependencies.
For the relevant actions, the checks were run in all the supported Python version CPython 3.10
and on both Windows
(Windows Server 2022) and Linux (Ubuntu 20.04).
Please feel free to contact us with any questions you have about this project.