Saturday, November 21, 2015

c++ hashtable with linked list

C++ hashTable with linked list


Hash table is a data structure that allows use to search, edit, delete data nearlyO(1).
If you do not know what the O(1), it is ok.. keep it in you mind that it is super efficient.

Lets dive in with real example!!

let's say that you have list (you can think it as array).
Every element of this list contain name of person.
If you want to search Jane from the start of the list, you have to check every element.
Above example, we just need to check 100000 names.
Technically, it is quite fast if you are searching for 100000 names.
However, if the each elements contain lots of inforamtion, and the number of element is goes up to extrememly large numer, it is a different story. It wil take mintue, hour or more than that.

Let's look at the following.




This time, before we put names in the list, we use a function called hash function.
What the function does is that it takes name as input and return integer number.
This number is going to be the index of name in the list.

If we use this way, search can be done extremely fast!!

How?
If you want to know whether "john" is in the list, you can use hash function to the name. You will get the index, and you will get the result instantly, without looking at other names in the list.

It does not matter how many elements are in the list, you will get what you want from the list instantly

However, this hashfunction has problem...


The above example show that two different name ends with same index number.
If your list can handle onely one name, you are in trouble...
This is called hash collision.

There are severy way to avoid this.

My way is using linked list

There are different implimenation in using hash function with linked list.
My way is following...
    1) The very first element of list is empty
    2) get the hash val
    3) use hash val as index of list
    4) make linked list.

This way, it is not O(1)... but it is still close to O(1).


In the real world, there is no way that we can build hash function without hash collision.
It is like you are predict lottery number correctly every week..

Depend on number of data, field, culture and many other aspects, we  design the hash function differently; there is no right answer.

Lots of smart computer people, professors write research papers to find the best way of builiding hash function.


I made some sample coding for this topic.
---------------------------------------------------------------------------------------------------------------------------


The following is practice code. If you have suggestion or correction, please feel free to leave comment.

#include <iostream>
#include <functional> // i am using mac so i need this to use namespace std
#include <string>

using namespace std;

typedef struct student
{
    int id;
    string fName;
    string lName;
    student * nextStudent;
}Student;

void initTable (Student *table, int len);
void addStudent(int studentId, string firstName, string lastName, Student *table);
int hashFunction(int id);
void searchAndPrint(int studentId, Student *table);
void deleteStudent(Student *table, int len);

int main()
{
    Student * table = new Student [10];
    int len = 10;

    initTable(table, len);


    // insert
    addStudent(12345, "tony", "stark", table);
    addStudent(55555, "bruce", "wayne", table);

    // search and print
    searchAndPrint(55555, table);
    searchAndPrint(34524, table);


    cout << endl;
    // all students info are stored in dynamic memory
    // therefore you have to visit every single student info for the deletion
    // for those student info by the linked listed are deleted by the deleteStudent function
    // the table array is delete by delete []
    // pointers for table and pointers for student info are different pointer type.
    deleteStudent(table, len);

    delete []  table;


}

void initTable (Student *table, int len)
{
    for (int i = 0; i < len; ++i)
    {
        table[i].nextStudent = NULL;
    }
}

void addStudent(int studentId, string firstName, string lastName, Student *table)
{
    // using hash function, convert the id into hashed id
    int hashedID = hashFunction( studentId );   

        Student * pointer = &table[hashedID];

        while(pointer->nextStudent !=NULL){
            pointer = pointer->nextStudent;
        }

        // once we reach to the student who has NULL in nextStudent
        // add student
        Student *tmp = new Student;
        tmp->id = studentId;
        tmp->fName = firstName;
        tmp->lName = lastName;
        tmp->nextStudent = NULL;

        // link
        pointer->nextStudent = tmp;

}

int hashFunction(int id)
{
    int tmp = id;
    int ans =0;

        // add each digit's number ex) 12345 -> 1+2+3+4+5 = 15
        while( tmp != 0 ){
            ans += (tmp % 10);
            tmp /=10;
        }
        // the loop does not handle the last one
        // one more addition is required
        ans += (tmp % 10);
       
        // make the addtion into range of 0 to 9
        ans %= 10;

        return ans;
}

void searchAndPrint(int studentId, Student *table)
{
    // get the hashed id
    int hashedId = hashFunction(studentId);

    Student * tmp = &table[hashedId];

    // search for student
    while ( tmp -> nextStudent !=NULL){
        if (studentId == tmp->id){
            cout << "* found * " << studentId <<endl;
            cout << tmp -> fName << endl;
            cout << tmp -> lName << endl;
            return;
        }
        else{
            tmp = tmp -> nextStudent;
        }
    }

    // check the last one in the hash
    if (tmp ->id == studentId){
            cout << "* found * " << studentId << endl;
            cout << tmp -> fName << endl;
            cout << tmp -> lName << endl;
            return;
    }

    // if you cannot find even at the end
    // there is no such student in the hash
    else{
        cout << "* id is not found * " << studentId <<endl;
        return;
    }

}

void deleteStudent(Student *table, int len)
{

    for (int i = 0; i < len; i++)
    {
        Student* tmp = &table[i];
       
        // delete student info except the last one
        while ( tmp -> nextStudent != NULL){
            Student* tmp2 = tmp->nextStudent;
            tmp->nextStudent = tmp2->nextStudent;
            delete tmp2;
        }
    }



    cout << "deletrion successful" << endl;
   
}










No comments:

Post a Comment