#7397 closed enhancement (duplicate)
Too many attachments (directories) in ENV/attachments/ticket/[ID] folder
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | attachment | Version: | 0.11-stable |
Severity: | major | Keywords: | |
Cc: | pkou@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Hi all. I have problem saving > 32765 tickets with attachment.
How to produce: Create 32765 tickets with attachment in every of it. Next ticket attachment will fail to save because of file-system limitation: ENV/attachments/ticket/* already has 32765 directories in it, and this is the limit for most popular file systems (ext3, udf, …)
I think good solution will look like this: Let: [ID] - ticket ID, [F_ID] - first (1-3) digits of ticket [ID]
It is correct to save attachments in next way: ENV/attachments/ticket/[F_ID]/[ID]
Thanks.
Attachments (1)
Change History (15)
comment:1 by , 17 years ago
comment:3 by , 17 years ago
Currently I am using udf (freebsd), but the same limit of "links" has Ext3. this is really easy to repeat:
run this script (for example) in /tmp/test/ perl -e 'for (1..32765) {print "$_\n"; mkdir $_;};'
after this try: mkdir "blabla"; → Error (13) too many links.
comment:4 by , 17 years ago
Milestone: | → 0.11.2 |
---|
If it is possible, please, implement this feature in trac 0.11.2, because we cant use trac in some of our projects. Thanks.
comment:5 by , 17 years ago
Cc: | added |
---|
Suggestion how to solve it:
- For any attachment's parent ID, define a hash code:
hashcode = long(sha.new(str(attachment_parent_id)).hexdigest(), 16)
- For the hash code, define a bucket number:
bucket = int(hashcode % 31991)
- Target directory for an attachment:
ENV/attachments/att-realm/bucket/att-parent-id
So, basically, we split all parents in 31991 classes, and store each class in separate directory.
The number 31991 is chosen using the following criteria:
- It shall be less than the maximal number of nodes in directory, e.g. it shall be less than 32000, which is a limitation of ext3 file system. Other file systems seems to have bigger limits, or they do not have limits at all. (Verified for ext2, ext3, ufs, zfs, ntfs, fat16, fat32)
- It shall be as close as possible to
SQRT(MAX-TICKET-NUMBER)
, e.g. close toSQRT(2^31)=46340
- It shall be a prime number, in order to take into account all digits from a hash code
So, 31991
is the biggest prime number that is less than 32000
, see http://primes.utm.edu/lists/small/10000.txt for the reference.
Potentially, it allows up to 31991*32000=1,023,712,000
attachments on ext3 file system, or up to 31991*32765=1,048,185,115
attachments on ufs/ext2 file system.
Question: If the proposal is okay, shall it be implemented for all projects (e.g. develop environment upgrade script), or for specific projects (e.g. use current approach by default and then allow use the new approach for some projects)?
My vision is that it shall be a default for all projects, and current attachments shall be moved to new structure during an environment upgrade.
comment:6 by , 17 years ago
Component: | ticket system → attachment |
---|---|
Milestone: | 0.11.2 → 2.0 |
Severity: | normal → major |
Type: | defect → enhancement |
I have mixed feelings about this issue. My first reaction was to say "simply choose an appropriate filesystem" which matches the requirements.
But then, if there are existing installations which are effectively having > 32000 subdirectories below ./attachment/ticket, the problem is real and should probably be addressed. However, this shouldn't be done at the price of excessive complexity and not by reducing the intuitiveness and usability of the current $TRAC_ENV/attachment layout.
Therefore, a good compromise would be to have a sharding scheme, in a similar way than for Subversion 1.5 fsfs repositories See http://www.farside.org.uk/200704/tree_structured_fsfs for details about the why and the how.
The additional complexity for Trac is that it has (theoretically) to handle sharding over alphanumerical names. Therefore, svn's scheme of 1/ → 0-999, 2/ → 1000-1999 doesn't seem to be appropriate here. We should find something else, with the following constraints:
- predictable: given a entity name, should be immediate, non-ambiguous to find the location
- unique: all the attachments for a given entity should be grouped in a single folder
Also, like in the svn case, this new scheme should apply to newly created environments only, with a script for converting existing environments for those who need it.
For the milestone, I think it can be done as soon as 0.12, but I'm setting 2.0 for now (meaning nice to have but not yet scheduled for a short term release). Definitely not for a minor bugfix release.
by , 17 years ago
Attachment: | many-attachments.patch added |
---|
Preliminary patch - just to explain the idea
comment:7 by , 17 years ago
Please review the attached many-attachments.patch
, which explains the idea. If it is okay, then I'll cleanup it.
It puts files from $ENV/attachments/type/id
to $ENV/attachments2/type/hash/id
, where the hash
is used for dividing all attachments in groups.
Sharding cannot be used for Trac because it shall operate on alphanumeric names, like wiki page names.
Testing has been made on the following hash functions: SHA1, MD5, Python's internal hash, Knuth's string hash. On one billion entries, the best distribution has been shown by the MD5 algorithm, which gives a guarantee of creation less than 32000 files in a directory over 1,000,000,000 tickets with attachments.
comment:8 by , 17 years ago
Another notice: The proposed algorithm allows up to 1,000,000,000 attachments in tickets on ext3 file system. If the maximal number of attachments can be limited by 100,000,000 attachments, then there is much simpler hashing function:
md5.new(str(name)).hexdigest()[0:3]
If the simple hash function is used, then the hash code can be calculated in scripts, also, easily:
echo -n NAME|md5sum|cut -c1-3
The testing shows that it will be possible to have up to 128,000,000 attachments on ext3.
comment:9 by , 15 years ago
http://groups.google.com/group/trac-users/browse_thread/thread/4fbc0dda6bafce88
I think keeping the simplicity of having the ticket# folder and attachments in it is worth switching to xfs or ext4 file system. We have reached the limit of ext3 in just 3 months. If this was documented then we would have installed it on xfs files system from the start. Backing up and moving overt is not such a big deal, so I think a solution to this would be better limits description in production deployment documentation.
Our limit was 31999 on Debian with ext3. Now running on xfs with no limit.
Thanks, Lucas
comment:11 by , 15 years ago
Milestone: | triaging → next-major-0.1X |
---|
I guess we'll tackle this when t.e.o hits 32000 tickets :)
comment:13 by , 11 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
After Trac 1.0 by #10313, the attachments directory's structure has been migrated to $ENV/files/attachments/realm/sha1(id)[:3]/sha1(id)/sha1(filename).ext
.
comment:14 by , 11 years ago
Milestone: | next-major-releases |
---|
What operating system and filesystem are you using that has this limitation?