Find Filegroups

A function that finds files that belong together (i.e., differ only by file extension) in different directories and collects them in a Python dictionary for further processing tasks.

from mlxtend.file_io import find_filegroups

Overview

This function finds files that are related to each other based on their file names. This can be useful for parsing collections files that have been stored in different subdirectories, for examples:

input_dir/
    task01.txt
    task02.txt
    ...
log_dir/
    task01.log
    task02.log
    ...
output_dir/
    task01.dat
    task02.dat
    ...

References

Given the following directory and file structure

dir_1/
    file_1.log
    file_2.log
    file_3.log
dir_2/
    file_1.csv
    file_2.csv
    file_3.csv
dir_3/
    file_1.txt
    file_2.txt
    file_3.txt

we can use find_filegroups to group related files as items of a dictionary as shown below:

from mlxtend.file_io import find_filegroups

find_filegroups(paths=['./data_find_filegroups/dir_1', 
                       './data_find_filegroups/dir_2', 
                       './data_find_filegroups/dir_3'], 
                substring='file_')
{'file_1': ['./data_find_filegroups/dir_1/file_1.log',
  './data_find_filegroups/dir_2/file_1.csv',
  './data_find_filegroups/dir_3/file_1.txt'],
 'file_2': ['./data_find_filegroups/dir_1/file_2.log',
  './data_find_filegroups/dir_2/file_2.csv',
  './data_find_filegroups/dir_3/file_2.txt'],
 'file_3': ['./data_find_filegroups/dir_1/file_3.log',
  './data_find_filegroups/dir_2/file_3.csv',
  './data_find_filegroups/dir_3/file_3.txt']}

API

find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None)

Find and collect files from different directories in a python dictionary.

Parameters

Returns

Examples

For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/