google-refine

Mashable: If you live for data, slave over spreadsheets and constantly find yourself sifting through endless rows and columns of facts and figures, Google’s got a lovely new product just for you — and it’s free and open-source, too.

Google Refine is a project born of Freebase Gridworks, a data-cleaning tool Google acquired when it bought Metaweb during the summer. Google has since renamed Gridworks and relaunched it as Refine.

Basically, Refine makes it much easier for data geeks to clean up and use big sets of data.

For example, if you’re writing an academic paper, government study or news article that requires you to download and parse spreadsheets from Data.gov or similar source of free information, you might notice all kinds of inconsistencies when you try to sort the data. This is a particular problem when you’re using free, open-to-the-public data that no one has maintained or cleaned up in the past.

Google Refine builds on its Gridworks roots by helping its users correct inconsistencies, changing data formats, extending data sets with data from web sources and other databases and much more. Refine also brings “a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase) and a ton of new transformation commands and expressions,” according to the official Google Open Source blog.

Here’s the first of three demo videos showing off Refine’s new and improved data-cleaning capabilities:

We can imagine this tool will allow non-programmers who deal with lots of data, including students and journalists doing research, to manipulate and sort data much more quickly.

What do you think of Google Refine so far?