Category Archives: Code

How To Convert Large XML Files to CSV

[ad]

I usually struggle with converting very large XML files to other formats just because they are in a dynamic format and most programs you find run out of memory before properly parsing them.¬†Well, I’m happy to say, I found a FAST and EASY solution. Of course this will work for small files as well as big files.

You’ll want to grab a copy of the msxsl command line utility from Microsoft.

After you’ve got that, you’ll need to setup a XSL file to tell the program how to format your file. If you’re unfamiliar with XSL, you can familiarize yourself here.

After you’ve got your XSL file created, it’s a simple command line entry:

msxsl xml_file.xml xsl_file.xsl -o output_file.csv

The following is a sample XML and XSL file that I used.

XML File:

	
		772500
		Tue, 20 Jan 2009 16:28:08 CST
		Tue, 20 Jan 2009 16:51:01 CST

			61951
			The Hills Season 1

		773000
		Tue, 20 Jan 2009 16:28:08 CST
		Tue, 20 Jan 2009 16:53:54 CST

			61926
			Hogan Knows Best Season 2

		775500
		Thu, 22 Jan 2009 14:49:12 CST
		Thu, 22 Jan 2009 14:51:35 CST

			62068
			Carlos Mencia 2007

			1402
			Comedy Central

XSL File (Creates tab-delimited file)








	

PHP File-System Cache

[ad]

I’m just sharing a couple of functions I created a while back to cache some resource-intensive processed dada for quick and easy access. This is pretty ideal for large amounts of data and it’s very simple to set up. The beauty of this is that you can store just about any data type – it doesn’t have to be a string.

Just setup the cache directory to have proper read/write permissions (preferably in a non-accessible from web directory). Then use the two functions.

Example usage:

1
2
3
4
5
6
7
8
9
// The 2nd argument, $hours is how long to retain data before getting new
$contents = get_cache('TEST_KEY', 24);
if ($contents === false) {
	// This is where you'd get data from an API, DB or whatever
	$contents = 'Just some example contents';
	set_cache('TEST_KEY', $contents);
}
echo $contents;
}

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
define('CACHE_DIR', 'cache/'); // Include trailing slash
 
function get_cache($key, $hours) {
	$file = CACHE_DIR . md5($key) . '.cache';
	if (!file_exists($file) || filemtime($file) < time() - $hours * 3600)
		return false;
	return unserialize(file_get_contents($file));
}
 
function set_cache($key, $value) {
	$file = CACHE_DIR . md5($key) . '.cache';
	file_put_contents($file, serialize($value));
}