ardawan.com

SQL Generator



update June 21, 2017

Live: github.com/ardawan/SQL-Generator

This is a simple SQL random data generator in python 3. The goal was generating 1GB dummy SQL data but turns to 7GB by accident!

I wanted to test my search engine with a large SQL data. I couldn’t find any free online generator that can provide 1GB dummy SQL data, so I decided to do it by myself.


feature

My first choice was PHP because my search engine was written in PHP language. I set the [maximum_memory = -1] and [execution_time = 3600] as i knew it’s gonna take more time to execute than the default execution time.

To make the SQL data I needed to have random numbers, dates, times, words and sentences. The random numbers, date and time can be pretty easy to generate but my question was how can I generate random sentences! The sentences shouldn’t be meaningful in terms of reading just I wanted to have those in the same category.

I started to create the functions that return random values. First, date and time function.

PHP
I used date() and mt_rand() to get random data and time in between a range from 2010 till 2017.

function dtGenerator(){
    $start = strtotime("2010-01-01");
    $end = strtotime("2017-12-30");
    $timestamp = mt_rand($start, $end);
    $randomDate = date("Y-m-d", $timestamp);
    $time = date('H:s:i',strtotime(mt_rand(0,24).mt_rand(0,59).mt_rand(0,59)));
    return $randomDate.' '.$time;
}

Python

def rdatetime():
    dateFormat = '%Y/%m/%d %H:%M:%S'    
    sdate = time.mktime(time.strptime('2010/01/01 00:00:00', dateFormat))
    edate = time.mktime(time.strptime('2017/12/30 23:59:59', dateFormat))
    return time.strftime(dateFormat, time.localtime( sdate + random.random() * (edate - sdate) ) )

Next was the challenging part. Random sentences. After research over internet i saw many smart generators exist that can generate meaningful sentences base on english grammer, but i dont wanted any meaningful sentence as I wanted to only test my search engine.

I got three different article about different cars and added to the different variables. like so

PHP

$txt = "Lotus' is giving its perennial Elise one final round of spit ’n’ polish to keep it as fresh as possible in the focused sports car market, ahead of an all-new model arriving in 2020.While sprucing up the entire range for 2017, the Hethel-based outfit has also resurrected one of its most evocative badges for an even lighter derivative: this is the Elise Sprint, that latter epithet ... ";

To get the random sentences from the article variable I need to separate them by dots, keep them in an array, shuffle the array and return the sentences. The function that I called content will get two parameters. ( $txt[which text variable], $qty[how many sentences should return] )

PHP

function content($txt=FALSE, $qty=FALSE){
    $stxt = explode('.', $txt);
    shuffle($stxt);
    foreach ( array_rand($stxt, $qty) as $key => $value){
        $result[] = htmlentities($stxt[$key]);
    }
    return implode(' ', $result);
}

Python

def content(txt = None, qty = None):
    global content
    stxt = txt.split('.')
    rand = random.sample(stxt, qty)
    return ' '.join(rand)

The random title is using the same function as content() with different quantity.

Lastly, I should make the loop to iterate the random contents in the SQL query and save it in the “data.sql” file. Before generate n times random content, needed to know how many times iteration will make 1GB data for me. So I generated 1 line of the dummy content in the “data.sql” file and the size was nearly 2Mb but every time generating the content was different size because of the dynamic data.

Well, base on the rough size, I was targeting for 500,000 times iteration. I tried to run and the PHP code started to generating after few minutes I got an error. Tried to choose lower numbers for the iteration such as 30,000 and this time it works. but it wasn’t my goal.

So I rewrite my generator in python 3 to run the generator on my laptop. Rewriting the generator in python 3 was much easier and simpler than PHP. My functions turn to four lines instead of six lines in python.

PHP couldn’t handle 500,000 times iteration but python handled 5 Million times iteration and the result you can see from the screenshot.

python generates 5 million line of random data in 9 minutes in 7.6Gb size

python generates 5 million line of random data in 9 minutes in 7.6Gb size


access

If you’re looking for dummy data generator like me, you can get my generator code on my GitHub and manipulate as you wish. If you use it don’t forget to share your feedback with me on @ardawanizadi.