Contents
Sometimes you wind up with a collection of different file types and need to filter out just those of certain extensions. The ./generic/extension function provides a convenient way to get the extension.
Obvious, but Inefficient
It's common to see this done with a foreach loop:
1 xy_files = [];
2
3 foreach [n = v] in my_files do {
4 xy_files += if ./generic/extension(n) == "xy" then [$n = v] else [];
5 };
This is inefficient for a couple reasons:
It makes a potentially large number of assignments, which must be un-wound at the return/value statement.
It uses += which handles name overlaps when clearly there can't be any
Filtering with _map
A better choice is to use the _map primitive function.
1 /**nocache**/
2 filter_xy_files(n,v)
3 {
4 return if ./generic/extension(n) == "xy" then [$n = v] else [];
5 };
6
7 xy_files = _map(filter_xy_files, my_files);
This avoids both the inefficiencies mentioned above. Each call to filter_xy_files is its own scope, so no long chain of assignments builds up. The results of each function called by _map are combined in the same efficient way as _append which doesn't allow for name conflicts.
(A good rule of thumb for performance is: Use _map first, and foreach only if you must.)
Generic Versions
Of course you probably don't want to keep writing little filter functions for _map, so let's write a re-usable function that will filter out the files with any extension.
Simple Version
1 /**nocache**/
2 filter_by_extension(b: binding, ext: text)
3 {
4 /**nocache**/
5 filter_one(n,v)
6 {
7 return if ./generic/extension(n) == ext then [$n = v] else [];
8 };
9 return _map(filter_one, b);
10 };
With this we can simply do this in our example:
1 xy_files = filter_by_extension(my_files, "xy");
Multiple Extensions
What if you want to select files with multiple extensions? We can improve out function a little to support that:
1 /**nocache**/
2 filter_by_extension(b: binding, ext)
3 {
4 /**nocache**/
5 filter_one(n,v)
6 {
7 n_ext = ./generic/extension(n);
8 return (if ((_is_text(ext) && (n_ext == ext)) ||
9 (_is_binding(ext) && ext!$n_ext))
10 then [$n = v]
11 else []);
12 };
13 return _map(filter_one, b);
14 };
Now we can get those files ending in "xy" and those ending in "yz" with this:
1 xy_and_yz_files = filter_by_extension(my_files, [xy=1,yz=1]);
Also, this function can still take a single extension as a text value:
1 xy_files = filter_by_extension(my_files, "xy");
Inverting the Selection
What if you wanted to remove files with particular extensions instead of removing all other files? With a little more work, our filtering function can do that too.
1 /**nocache**/
2 filter_by_extension(b: binding, ext, invert=FALSE)
3 {
4 /**nocache**/
5 filter_one(n,v)
6 {
7 n_ext = ./generic/extension(n);
8 selected = ((_is_text(ext) && (n_ext == ext)) ||
9 (_is_binding(ext) && ext!$n_ext));
10 return (if ((!invert && selected) ||
11 (invert && !selected))
12 then [$n = v]
13 else []);
14 };
15 return _map(filter_one, b);
16 };
Now we can remove those files ending in "xy" and those ending in "yz" with:
1 non_xy_non_yz_files = filter_by_extension(my_files, [xy=1,yz=1], TRUE);
Or remove just those ending in "xy" with:
1 non_xy_files = filter_by_extension(my_files, "xy", TRUE);