The ProgrammersTalk Community
Forum Register Search Today's Posts Mark Forums Read
Register

Go Back   The ProgrammersTalk Community > General Programming > C#


Welcome to the The ProgrammersTalk Community forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.
Reply
 
LinkBack Thread Tools    Display Modes   
  #1 (permalink)  
Old 09-21-2007, 02:59 PM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
CSV Database Class

Hey all. I've got a project on which I'm working and for which I need to be able to store non-binary data in a scalable, efficient fashion without the use of a database such as SQL lite or SQL Server. To do this, I've come up with the following methodology which I'm implementing. As I implement it, I'll post the code here (should be only over the next couple of days). Your feedback and suggestions are welcomed and encouraged.

Let's start with the thought process:

Definitions used below:

database: A collection of CSV files.
table: A single CSV file.

Assumptions:
I do not need all functionality of a database. I will have multiple databases of information with the same structure and different values. While I could store all the information in a single database and just add in a field to indicate the row's database, this would subsequently make the CSVs grow far too large over a 10 year period for efficient data access, so subdividing is preferable. Such subdivision applied to very large tables may be warranted, but I'm rather doubting that it'd be necessary or productive while it would make the coding more challenging, but, again, your feedback would be very much appreciated.

Now, each table could grow fairly large over time (some will be very small, but some will be very large), so some form of indexing is appropriate.

Additionally, each row will need a primary key (i.e. an autoincremented value) which shall always be the first value in the row.

No support for joins or compound keys will be added as this project will need to be finished fairly soon and the advantages provided by such constructs don't warrant implementation for this particular project at this time.

Row data shall be returned as an ArrayList. The header names will be available separately as an ArrayList where the index in the header ArrayList matches the index in the row data's ArrayList.

Row data which contains newlines will not be allowed to wrap to the next line. Instead, newlines will be replaced with the XHTML entity <br />.

Proposal:

1) Each database, being a collection of tables will actually be a directory.
2) Within each database directory there shall be 2 system subdirectories: _indexes and _table_summary.
3) In the _indexes directory there will be a file for each table with an index named as table_name_column_name.
4) Each index file will have rows of a fixed length which contains the following information in this order: column_value,primary_key,row_start_byte. Each primary key will be of type uint and the start byte shall be of type uint. The type for column_value will be determined by the column's type. In the case where the index is for the autoincrement value (i.e. the primary key), this file will not contain the column_value field.
5) In the _table_summary directory there will be a file for each table named the same as the table itself.
6) Each table summary file shall contain the following information in this order: next_auto_increment,row_length,header_row,indexed_columns. Here's the type of each of those values:

next_auto_increment: uint
row_length: uint
header_row: csv
Each header row will have the following characteristics stored here: name,column_type,max_length. name shall be of type uint,column_type shall be of type string (for use with Type.GetType()), and max_length shall be of type uint with 0 indicating no fixed length or that fixed length is not applicable. To store in a CSV format, these fields will be joined by a colon. For example,
Code:
  autoincrement:uint:0,name:string:50,street_address:string:255
indexed_columns:csv
7) In the root of each database directory shall be a file for each table without the header row (note that the header information is stored in the summary file and would subsequently be redundant).
8) Unless absolutely necessary all tables will be composed of fixed length records to speed data access. It is, of course, not a requirement of C# that strings be of a maximal length -- this is done only for speed of data access.

Unresolved Issue:

In inserting values into an index, I would prefer to have the index ordered by the column value. However, I'm not sure if C# allows random file access (going forward and backward) to facilitate a binary search algo and I'm not sure about the calculational complexity (think "Big-O" here) of inserting into the middle of a file versus appending and dropping the binary search. Your ideas and suggestions are very much appreciated. For the tables themselves, I plan on appending data and then storing the start byte which will subsequently be available in the index.

Stubbed Code:

This C# code provides the structure of a class designed to facilitate such interaction. In addition to the above, a static class has been added which will be responsible for parsing a single line of CSV data and returning such data in an ArrayList. I have some of the code for the class below constructed for a separate, less structured class which will be replaced by this methodology, but have not ported it into this class.

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace MyCustomNamespace {
    class csv_db {
        private string database_path = "";
        private string current_database = "";
        private ArrayList current_rows = new ArrayList();
        private ArrayList current_indexed_columns = new ArrayList();
        private csv_table_summary current_table_summary;

        public csv_db(string databasePath) {
            this.database_path = databasePath;
        }

        public bool connect(string databaseName) {
            //returns true if database exists, false otherwise
            return false;
        }
        public uint insert(Hashtable rowValues) {
            //returns autoincrement value of new row
            return 0;
        }
        public bool update(uint primaryKey, Hashtable rowValues) {
            //returns true if row updated, false otherwise
            return false;
        }
        public uint update(Hashtable whereValues, Hashtable rowValues) {
            //returns number of rows updated
            return 0;
        }
        public uint query(Hashtable whereValues) {
            //returns number of rows matching query
            return 0;
        }
        public bool delete(uint primaryKey) {
            //returns true if row deleted
            return false;
        }
        public uint delete(Hashtable whereValues) {
            //returns number of rows deleted
            return 0;
        }
        public bool create_db(string databaseName) {
            //returns true if db created
            return false;
        }
        public bool create_db(string databaseName, string sourceDatabaseName) {
            //returns true if db created
            return false;
        }
        public bool create_table(string tableName, ArrayList columns) {
            //returns true if table created
            return false;
        }
        public bool create_table(string tableName, ArrayList columns, uint rowLength) {
            //returns true if table created
            return false;
        }
        public bool index_insert(string columnName) {
            //returns true if index created
            return false;
        }
        public static ArrayList parse_csv_row(string csvRow) {
            //parses a line of CSV data and returns as an ArrayList
            return new ArrayList();
        }
    }
    class csv_column {
        private string name = "";
        private Type column_type;
        private uint max_length = 0;

        public csv_column(string columnDetails) {
        }

        public string columnName {
            get {
                return this.name;
            }
        }
        public Type columnType {
            get {
                return this.column_type;
            }
        }
        public uint maxLength {
            get {
                return this.max_length;
            }
        }
    }

    class csv_table_summary {
        private uint row_length = 0;
        private uint next_auto_increment = 0;
        private ArrayList headers = new ArrayList();
        private ArrayList indexed_columns = new ArrayList();

        public csv_table_summary(string summaryFileName) {
            //Parse file and populate variables
        }

        public uint rowLength {
            get {
                return this.row_length;
            }
        }
        public uint nextAutoIncrement {
            get {
                return this.next_auto_increment;
            }
        }
        public ArrayList tableHeaders {
            get {
                return this.headers;
            }
        }
        public ArrayList indexedColumns {
            get {
                return this.indexed_columns;
            }
        }
        public bool isIndexed(string columnName) {
            if (this.indexed_columns.Contains(columnName)) {
                return true;
            } else {
                return false;
            }
        }
    }
}

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!

Last edited by TeraTask : 09-21-2007 at 09:14 PM. Reason: Updated code to use some Hashtables
Reply With Quote
  #2 (permalink)  
Old 09-23-2007, 08:13 PM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
No feedback at all? I thought you all would love something like this.

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #3 (permalink)  
Old 09-23-2007, 09:37 PM
HelloWorld's Avatar
HelloWorld HelloWorld is offline
PT Admin
Awards Showcase
Quality Tutorial 
Total Awards: 1
Join Date: Jun 2007
Location: In front of computer...
Posts: 1,119
iTrader: (0)
HelloWorld is a jewel in the roughHelloWorld is a jewel in the roughHelloWorld is a jewel in the rough
are you trying to represent database as directories..?

__________________
PHP Code:
System.out.println("Hello World!"); 

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #4 (permalink)  
Old 09-23-2007, 09:48 PM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
Yep, that's #1 in the proposal section.

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #5 (permalink)  
Old 09-24-2007, 06:39 AM
ccoonen ccoonen is offline
PT Staff
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 4
Join Date: Jun 2007
Location: Wisconsin
Posts: 317
iTrader: (0)
ccoonen is on a distinguished roadccoonen is on a distinguished roadccoonen is on a distinguished roadccoonen is on a distinguished road
I like the idea of this:

public uint nextAutoIncrement {
get {
return this.next_auto_increment;
}

You get props from me
}
Reply With Quote
  #6 (permalink)  
Old 09-24-2007, 03:17 PM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
Thanks. I got behind in the coding of this, but should have it completed in the next couple of days. I've made a couple of tweaks as I've been implementing this and will share all as more is ready to be shared.

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #7 (permalink)  
Old 09-26-2007, 02:23 AM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
Code Update

Here's the latest code. Underneath is code I'm using to test features so far.

As always, feedback is loved!!!!!!!

Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace MyCustomNamespace {
    class csv_db {
        private object database_lock = new object();
        private string database_path = "";
        private string current_database = "";
        private string current_table = "";
        private ArrayList current_rows = new ArrayList();
        private ArrayList current_indexed_columns = new ArrayList();
        private csv_table_summary current_table_summary;

        public csv_db(string databasePath) {
            try {
                if (!Directory.Exists(databasePath) && databasePath.Length > 0) {
                    Directory.CreateDirectory(databasePath);
                }
                this.database_path = databasePath;
            } catch {
                //Directory creation failed, so just use the current directory.
                this.database_path = "";
            }
        }

        public bool connect(string databaseName) {
            //returns true if database exists, false otherwise
            if (Directory.Exists(this.database_path + "\\" + databaseName)) {
                this.current_database = databaseName;
                this.database_path += "\\" + databaseName;
                return true;
            }
            return false;
        }

        public uint insert(string table, Hashtable rowValues) {
            //returns autoincrement value of new row
            return 0;
        }

        public bool update(string table, uint primaryKey, Hashtable rowValues) {
            //returns true if row updated, false otherwise
            return false;
        }
        public uint update(string table, Hashtable whereValues, Hashtable rowValues) {
            //returns number of rows updated
            return 0;
        }

        public uint query(string table, Hashtable whereValues) {
            //returns number of rows matching query
            return 0;
        }

        public bool delete(string table, uint primaryKey) {
            //returns true if row deleted
            return false;
        }
        public uint delete(string table, Hashtable whereValues) {
            //returns number of rows deleted
            return 0;
        }

        public bool create_db(string databaseName) {
            //returns true if db created
            if (!Directory.Exists(this.database_path + databaseName+"\\")) {
                Directory.CreateDirectory(this.database_path + databaseName + "\\");
                Directory.CreateDirectory(this.database_path + databaseName + "\\_indexes\\");
                Directory.CreateDirectory(this.database_path + databaseName + "\\_table_summary\\");
                return true;
            }
            return false;
        }
        public bool create_db(string databaseName, string sourceDatabaseName) {
            //returns true if db created
            if (this.create_db(this.database_path + databaseName + "\\")) {
                //Now, need to copy files from sourceDatabaseName
            }
            return false;
        }

        public bool create_table(string tableName, ArrayList columns) {
            //returns true if table created
            
            return false;
        }
        public bool create_table(string tableName, ArrayList columns, uint rowLength) {
            //returns true if table created
            if (this.current_database.Length > 0) {
                if (!File.Exists(this.database_path + "\\" + this.current_database + "\\"+tableName + ".csv")) {
                    File.Create(this.database_path + "\\" + tableName + ".csv");
                    this.current_table = tableName;
                    this.current_table_summary = new csv_table_summary(this.database_path, tableName);
                    this.current_table_summary.tableHeaders = columns;
                    this.current_table_summary.rowLength = rowLength;
                    this.save_table_summary();
                    return true;
                }
                return false;
            }
            return false;
        }

        public bool index_insert(string columnName) {
            //returns true if index created
            return false;
        }

        private void load_table_summary() {
            //Loads current table summary into memory for fast access.
            this.current_table_summary = new csv_table_summary(this.database_path, this.current_table);
        }
        private void save_table_summary() {
            FileStream fs = new FileStream(this.database_path + "\\_table_summary\\" + this.current_table + ".csv", FileMode.Create, FileAccess.Write, FileShare.None);
            using (StreamWriter appDataWriter = new StreamWriter(fs)) {
                appDataWriter.WriteLine(this.current_table_summary.nextAutoIncrement);
                appDataWriter.WriteLine(this.current_table_summary.rowLength);
                appDataWriter.WriteLine(this.current_table_summary.tableHeadersString);
                appDataWriter.WriteLine(this.current_table_summary.indexedColumnsString);
            }
        }
    }
    class csv_column {
        private string name = "";
        private Type column_type;
        private uint max_length = 0;

        public csv_column(string columnDetails) {
        }

        public string columnName {
            get {
                return this.name;
            }
        }
        public Type columnType {
            get {
                return this.column_type;
            }
        }
        public uint maxLength {
            get {
                return this.max_length;
            }
        }
    }

    class csv_table_summary {
        private uint row_length = 0;
        private uint next_auto_increment = 0;
        private ArrayList headers = new ArrayList();
        private ArrayList indexed_columns = new ArrayList();

        public csv_table_summary(string database_path,string summaryFileName) {
            //Parse file and populate variables
            if (File.Exists(database_path + "\\_table_summary" + summaryFileName)) {
                using (StreamReader appDataReader = new StreamReader(database_path + "\\_table_summary\\" + summaryFileName + ".csv")) {
                    this.next_auto_increment = Convert.ToUInt32(appDataReader.ReadLine().Trim());
                    this.row_length = Convert.ToUInt32(appDataReader.ReadLine().Trim());
                    this.headers = CSVParser.Parse(appDataReader.ReadLine());
                    this.indexed_columns = CSVParser.Parse(appDataReader.ReadLine());
                }
            }
        }

        public uint rowLength {
            get {
                return this.row_length;
            }
            set {
                this.row_length = value;
            }
        }
        public uint nextAutoIncrement {
            get {
                return this.next_auto_increment;
            }
            set {
                this.next_auto_increment = value;
            }
        }
        public ArrayList tableHeaders {
            get {
                return this.headers;
            }
            set {
                this.headers = value;
            }
        }
        public string tableHeadersString {
            get {
                string rValue = ""; string prefix = "";
                foreach (csv_table_header aHeader in this.headers) {
                    rValue += prefix + aHeader.ToString();
                    prefix = ",";
                }
                return rValue;
            }
        }
        public ArrayList indexedColumns {
            get {
                return this.indexed_columns;
            }
            set {
                this.indexed_columns = value;
            }
        }
        public string indexedColumnsString {
            get {
                string rValue = ""; string prefix = "";
                foreach (string aColumn in this.indexed_columns) {
                    rValue += prefix + aColumn;
                    prefix = ",";
                }
                return rValue;
            }
        }
        public bool isIndexed(string columnName) {
            if (this.indexed_columns.Contains(columnName)) {
                return true;
            } else {
                return false;
            }
        }
    }

    class csv_table_header {
        private string name;
        private string column_type;
        private uint max_length;

        public csv_table_header(string columnName, string columnType, uint maxLength) {
            this.name = columnName;
            this.column_type = columnType;
            this.max_length = maxLength;
        }

        public string columnName {
            set {
                this.name = value;
            }
        }
        public string columnType {
            set {
                this.column_type = value;
            }
        }
        public uint maxLength {
            set {
                this.max_length = value;
            }
        }

        public override string ToString() {
            return this.name + ":" + this.column_type + ":" + this.max_length.ToString();
        }
    }
}

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #8 (permalink)  
Old 09-26-2007, 02:24 AM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
Here's the usage code:
Code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

//uint is 32 bits = 8 bytes
namespace MyCustomNamespace {
    public partial class MainScreen : Form {
        public MainScreen() {
            InitializeComponent();
        }

        private void MainScreen_Load(object sender, EventArgs e) {
            //For testing, create a database called _system
            csv_db loadDb = new csv_db(Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData)+"\\SMARTSystemsPro\\");
            loadDb.create_db("_system");

            loadDb.connect("_system");
            ArrayList columnHeaders = new ArrayList();
            columnHeaders.Add(new csv_table_header("my_table_id","uint",0));
            columnHeaders.Add(new csv_table_header("name","string",100));

            loadDb.create_table("my_table", columnHeaders, 8 + 100 + 2);            
        }
    }
}

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #9 (permalink)  
Old 09-28-2007, 07:23 PM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Admin
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 442
iTrader: (0)
TeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to beholdTeraTask is a splendid one to behold
I think I found a problem and would like your feedback on it. If I access fields by their byte position in the file, then records need to be stored in byte form. When working with numbers, this could mean that a newline (byte 13) is stored to represent 1101 which would happen, for example, on record 13. This could be a real pain as the records would not then be CSVs. Subsequently, I'm thinking of not storing by byte and just storing strings and then reading in one line at a time, comparing the first field to see if it contains the primary key and looping through the whole file that way.

Does anyone see a more efficient means of accessing data randomly while addressing problems of bytes? Am I missing something? Am I making it too complicated?

Your feedback and suggestions are very much appreciated.

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

   Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -7. The time now is 07:02 AM. Powered by vBulletin
Copyright © 2000 - 2007, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO © 2007 ProgrammersTalk Sedo - Buy and Sell Domain Names and Websites project info: programmerstalk.net Statistics for project programmerstalk.net etracker® web controlling instead of log file analysis


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50