Files
thpeetz-notes/Quellen/IT/Processing Files In Place With Groovy.md
T

273 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Processing Files In Place With Groovy | Mindful Mischief
tags:
- IT/Development/Groovy
---
I recently had the need to process a bunch of files and directory structures that were emitted from a generation process that I didnt have control over. The basic needs were to delete some files, delete some directories, and to modify selected content of some of the files. This is a very straight forward, trivial thing to do but I was looking for a solution that was both cross platform and very quick to develop. I also had to deal with a bit of XML and wanted an easy way to parse and modify XML documents in a natural way. This post shows how to do this using a Groovy script. Ive been using Groovy for testing of Java code and other one off random tasks for a couple years now but it still surprises me how fast you can accomplish certain tasks yet keep your code readable and at a relatively high level of abstraction, especially compared to shell scripts, sed, awk.
In a nutshell, the following method will process a file in place.
```groovy
def processFileInplace(file, Closure processText) {
def text = file.text
file.write(processText(text))}
```
If you know Groovy, this probably doesnt require any additional explanation. But if you dont know Groovy, dont worry. The remainder of this post shows examples of using this method and touches on a few details regarding file and directory deletion.
# Deleting Files and Directories
I hesitated even writing anything about deleting files and recursively deleting directory structures because it is so trivial and the Groovy documentation on Files has a huge number of examples. But that being said, I wanted the script to read as cleanly a possible for people maintaining it in the future that might not know Groovy. There are a couple of ways to recursively delete directories and the syntax isnt consistent with how you delete a File. So I simply created two methods encapsulate these operations an make it read cleaner.
```groovy
def deleteDirectory(directory){
new AntBuilder().delete(dir: directory)}
def deleteFile(file){
file.delete()}
```
Another, more Groovy-like (i.e. uses closures) option for recursively deleting directories is shown here.
Using those methods, file and directory deletion is consistent and clean.
```groovy
basedir = ...
deleteDirectory(new File(basedir + ".settings"))
deleteFile(new File(basedir + ".project"))
deleteFile(new File(basedir + ".classpath"))
```
# Processing Text Files In Place
Processing text files in place can be done with a simple method that takes the file to be modified and a closure that performs the modifications. This is the method I introduced above.
```groovy
def processFileInplace(file, Closure processText) {
def text = file.text
file.write(processText(text))}
```
The closure can be arbitrarily complex as long as it returns the String value of what you want the file contents to look like after modification. For example, you can use any of the basic Java or the expanded Groovy string methods and utilities.
```groovy
projectName = 'My New Project'
overview = new File(basedir + "overview.txt")
processFileInplace(overview) { text ->
text.replaceAll(/The Old Project/, '\\${projectName}')}
```
You are by no means limited to a single operation within the closure.
```groovy
projectName = 'My New Project'def today = Calendar.getInstance()def todayFormatted = String.format('%tY/%<tm/%<td', today)
howTo = new File(basedir + "how-to.txt")
processFileInplace(howTo) { text ->
text = text.replaceAll(/The Old Project/, '\\${projectName}')
text = text.replace(/<name>Legacy System 1982<\/name>/, '<name>New and Improved</name>')
text.replace(/<date>1982/10/12<\/date>/, '<date>${todayFormatted}</date>')}
```
# Bring on XML
Unfortunately, XML can often be much more difficult to deal with than it should be. Groovy has some great utilities for simplifying XML processing. Details on these utilities can be found on the Groovy web site. Ill combine a couple of these in the following examples.
## Example XML Document
Below is an XML document that will be used in the examples.
```xml
<?xml version="1.0" encoding="UTF-8"?><CustomerManagement>
<MetaData>
</MetaData>
<Customers>
<Customer name="Kermit The Frog">
<HelpDeskCalls>
<Call Id="1">
<Status Id="In Progress"/>
</Call>
<Call Id="2">
<Status Id="Completed"/>
</Call>
<Call Id="3">
<Status Id="UnableToResolve"/>
</Call>
</HelpDeskCalls>
</Customer>
<Customer name="Fozzie Bear">
<HelpDeskCalls>
<Call Id="5">
<Status Id="Completed"/>
</Call>
</HelpDeskCalls>
</Customer>
</Customers></CustomerManagement>
```
## Removing Content From XML
Given the example XML document, say we wanted to remove all the HelpDeskCalls for Kermit the Frog where the Status is “UnableToResolve.”
```groovy
customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
customerManagement = new XmlSlurper().parseText(text)
customer = customerManagement.Customer.Customer.find{ it.@name.text().contains('Kermit') }
customer.HelpDeskCalls.Call.findAll{ it.Status.@Id.text().equals('UnableToResolve')}.replaceNode{}
serializeXml(customerManagement)}
```
So what are each of the lines in the closure doing?
1. Parses the text in the file using an XmlSlurper
2. Finds all Customers whose names contain Kermit.
3. Removes Call elements from Kermit where the Status of the Call is equal to UnableToResolve.
4. Calls a method that serializes the XML as a string.
The second and third lines are using GPath to query the XML. GPath provides a consistent expression language over both Groovy/Java POJOs and XML. The serialzeXML() method is a short, custom method to turn the GPathResult created by the XmlSlurper back into a String. This method requires that you import a few
```groovy
import groovy.xml.XmlUtilimport groovy.xml.StreamingMarkupBuilderimport groovy.util.slurpersupport.GPathResult
// ...
def String serializeXml(GPathResult xml){
XmlUtil.serialize(new StreamingMarkupBuilder().bind {
mkp.yield xml
} )}
```
Refer to the link above on Groovy XML Processing for more details on this. It uses StreamingMarkupBuilder and XmlUtils to turn the XML back into a String.
## Adding Content To XML
Say you wanted to add some CustomerManagers to the MetaData section of our example XML file. You can do that again using the XmlSlurper and our processFileInPlace method.
```groovy
customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
customerManagement = new XmlSlurper().parseText(text)
customerManagement.MetaData.appendNode{
CustomerManagers{
Manager(Name: "Animal")
Manager(Name: "Swedish Chef")
Manager(Name: "Gonzo")
}
}
serializeXml(customerManagement)}
```
The above results in the important-customers.xml file now containing
```xml
<?xml version="1.0" encoding="UTF-8"?><CustomerManagement>
<MetaData>
<CustomerManagers>
<Manager Name="Animal"/>
<Manager Name="Swedish Chef"/>
<Manager Name="Gonzo"/>
</CustomerManagers>
</MetaData>
<Customers>
<Customer name="Kermit The Frog">
<HelpDeskCalls>
<Call Id="1">
<Status Id="In Progress"/>
</Call>
<Call Id="2">
<Status Id="Completed"/>
</Call>
<Call Id="3">
<Status Id="UnableToResolve"/>
</Call>
</HelpDeskCalls>
</Customer>
<Customer name="John Doe">
<HelpDeskCalls>
<Call Id="5">
<Status Id="Completed"/>
</Call>
</HelpDeskCalls>
</Customer>
</Customers></CustomerManagement>
```
# A Full Example
The above examples can be combined into a full script. Obviously you can structure this to your needs. The script can be executed directly from the command line.
```shell
./processFiles.groovy
```
```groovy
#!/usr/bin/env groovy
import groovy.xml.XmlUtilimport groovy.xml.StreamingMarkupBuilderimport groovy.util.slurpersupport.GPathResult
if(args.length < 1){
println "You must provide a base directory as an argument to the script."
System.exit(1)}
basedir = args[0] + "/"println "Current working path: " + new File(".").getAbsolutePath()
def processFileInplace(file, Closure processText) {
def text = file.text
file.write(processText(text))}
def deleteDirectory(directory){
new AntBuilder().delete(dir: directory)}
def deleteFile(file){
file.delete()}
def String serializeXml(GPathResult xml){
XmlUtil.serialize(new StreamingMarkupBuilder().bind {
mkp.yield xml
} )}
projectName = 'My New Project'def today = Calendar.getInstance()def todayFormatted = String.format('%tY/%<tm/%<td', today)
deleteDirectory(new File(basedir + ".settings"))
deleteFile(new File(basedir + ".project"))
deleteFile(new File(basedir + ".classpath"))
overview = new File(basedir + "overview.txt")
processFileInplace(overview) { text ->
text.replaceAll(/The Old Project/, '\\${projectName}')}
howTo = new File(basedir + "how-to.txt")
processFileInplace(howTo) { text ->
text = text.replaceAll(/The Old Project/, '\\${projectName}')
text = text.replace(/<name>Legacy System 1982<\/name>/, '<name>New and Improved</name>')
text.replace(/<date>1982/10/12<\/date>/, '<date>${todayFormatted}</date>')}
// Delete anything that makes our call center people look bad :-0
customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
customerManagement = new XmlSlurper().parseText(text)
customer = customerManagement.Customer.Customer.find{ it.@name.text().contains('Kermit') }
customer.HelpDeskCalls.Call.findAll{ it.Status.@Id.text().equals('UnableToResolve')}.replaceNode{}
serializeXml(customerManagement)}
// Add information on the CustomerManagers
customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
customerManagement = new XmlSlurper().parseText(text)
customerManagement.MetaData.appendNode{
CustomerManagers{
Manager(Name: "Animal")
Manager(Name: "Swedish Chef")
Manager(Name: "Gonzo")
}
}
serializeXml(customerManagement)}
```
# Conclusion
Processing files in place using Groovy is both easy and powerful. A single two-line method provides the basis for modifying the content of a file in any way you may need.