Firewall

Per the rules of the University of California, Santa Cruz, the source code for this project can not be shared publicy. To view the source code please send me an email, which can be found in the contact section of the website, and state a reason for requesting source code.

Introduction

A project from my Computer Systems and C Programming course in which I designed a bloom filter using Abstract Data Types to filter text for banned words and replace that text with safe words. The assignment was presented, in terms of the firewall filter, with a theme of 1984 by George Orwell. This program was completed in the C programming language in a Linux Virtual Machine.

Overview

In this program, the name of a txt file is inputted through stdin after running make. The file contains any number of words which will be scanned to determine if they are badspeak or oldspeak. Badspeak is defined as words that are not allowed, and will result in the user being told not to use them. Oldspeak words are also not allowed, but have a translation to a newspeak word. The program uses a bloom filter with an underlying bit vector to determine if a word scanned is most likely badspeak or oldspeak. In the case of which a word is most likely in the bloom filter, the hash table is then searched to ensure that the result is correct. The hash table uses an array of binary search trees to store each word in the node abstract data type.

Node

The node, and abstract data type, will be used in the binary search tree. It holds values, left and right, for its children as well as character pointers to oldspeak and the translation to newspeak.

Binary Search Tree

The binary tree contains nodes with values of the left being less than the root and values on the right being greater than the root. The comparison values are obtained using strcmp() which determines the position in the tree.

Hash Table

The hash table contains a set number of bits of values determined by the hash function. In comparison to the bloom filter, it contains less elements and is therefore less accurate. The hash table contains a 128 bit salt represented by two uint64t’s, a size uint32t as well as a pointer to a pointer to a Node of trees. The salt is used as a parameter for the hash function and will therefore determine the position of the Node.

Bit Vector

The abstract data type that represents an array of binary bits whose size is determined by the length parameter. Edit elements in the array using bitwise operators. To allow for efficient memory usage, use n/8 uint8ts in the array.

Bloom Filter

The bloom filter is created using the bit vector ADT along with the hash function to determine which bit in the array to set. The hash function used in this implementation, speck cipher, was created by the NSA and comes from the speck.h file. It is run through three different functions for three total bit inserts. The bloom filter struct contains three 128 bit salts stored in two uint64t’s as well as the bit vector storing the bits that have to be changed.

Sources

Firewall Cover Image: https://pixabay.com/illustrations/cyber-security-information-security-3400657/