Edgewall Software

Opened 10 years ago

Closed 10 years ago

Last modified 8 years ago

#10313 closed defect (fixed)

can't attache to the ticket filen with long name on native language (like russian). — at Version 54

Reported by: slevin@… Owned by:
Priority: normal Milestone: 1.0
Component: attachment Version: 0.12.2
Severity: normal Keywords: long filename
Cc: osimons, bebugz@…, Jun Omae Branch:
Release Notes:

Uses SHA-1 instead of URL-encode for the physical file name of attachments

API Changes:
Internal Changes:

Description

When I try to attache file with long name on russian language I get error:

fx = open(part_file, 'wb')
201	IOError: [Errno 36] File name too long: '/tmp/%D0%94%D0%BE%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%20%D0%BE%20%D0%BF%D1%80%D0%B5%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BB%D0%B5%D0%BD%D0%B8%D0%B8%20%D1%83%D1%81%D0%BB%D1%83%D0%B3%20%D1%82%D0%B5%D0%BB%D0%B5%D0%BA%D0%BE%D0%BC%D0%BC%D1%83%D0%BD%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D0%B9.doc'

That is result convert utf-8 filename 182 characters lenght to URL-encode filename, after that new lenght of that file equal 480 characters, that more then limit fo filename lenght for linux system. How it can be fixed?

PS: previously, when we're get the problem with long filename in utf-8 charset wich sended via email, we arehave a long disscussion with author of email2trac plugin here: https://subtrac.sara.nl/oss/email2trac/ticket/247, also that ticket contained attachments for reproducing error.

PPS: our environment - gentoo + python 2.5.2 + trac 0.12.2.

Change History (55)

comment:1 by Christian Boos, 10 years ago

Milestone: next-major-0.1X
Priority: highnormal

You're right - I was at first hoping for a quick answer like "use …fs which doesn't have this silly limit"… but it looks like most FS have such kind of limit (wikipedia:Comparison_of_file_systems#Limits).

So we could indeed try to find a naming scheme which allows us to by-pass this limit (to be coupled with the sharding scheme discussed in #7397).

comment:2 by Remy Blank, 10 years ago

Well… The obvious fix is to use shorter file names, isn't it?

comment:3 by Christian Boos, 10 years ago

Sure, but I wouldn't call that a fix, just a workaround ;-)

The point made by the OP was that it was relatively easy to come up with a "not that long" unicode file name that can't be uploaded as we currently turn it into a too long file name, after expanding to UTF-8, then URL-quoting it.

>>> from urllib import unquote
>>> filename = '%D0%94%D0%BE%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%20%D0%BE%20%D0%BF%D1%80%D0%B5%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BB%D0%B5%D0%BD%D0%B8%D0%B8%20%D1%83%D1%81%D0%BB%D1%83%D0%B3%20%D1%82%D0%B5%D0%BB%D0%B5%D0%BA%D0%BE%D0%BC%D0%BC%D1%83%D0%BD%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D0%B9.doc'
>>> utf8_filename = unquote(filename)
>>> print utf8_filename
Договор о предоставлении услуг телекоммуникаций.doc
>>> unicode_filename = utf8_filename.decode('utf-8')
>>> len(filename), len(unicode_filename)
(274, 51)
Last edited 10 years ago by Christian Boos (previous) (diff)

comment:4 by anonymous, 10 years ago

It's un-re-al - can't to explain windows users for use english characters for filenames. :( Usually situation when bookkeepers make file with lenght nearly 30 chracters and on native language.

comment:5 by Remy Blank, 10 years ago

I did understand the issue, I was just proposing an immediate fix (one that didn't require waiting for 0.15 or so).

in reply to:  1 comment:6 by slevin@…, 10 years ago

Replying to cboos:

You're right - I was at first hoping for a quick answer like "use …fs which doesn't have this silly limit"… but it looks like most FS have such kind of limit (wikipedia:Comparison_of_file_systems#Limits).

you mean Raiser4 ??? In my example native filename equal 182 chars. After URL-encoding lenght equal 480 chars… for fs which supported by linux kernel almost have a limit filename to 255 chars. :(

So we could indeed try to find a naming scheme which allows us to by-pass this limit (to be coupled with the sharding scheme discussed in #7397).

hmmm… may be… may be you can try other encoding, or use scheme without encoding before store attachment body into /tmp ? Is it possible?

comment:7 by anonymous, 10 years ago

and I'm really support the idea by alexey.lustin (#7554): May be we need an option on trac.ini something like this

...
[attachement]
normalize_attach_dir = false/true
...

and some additional:

...
[attachement]
normalize_attach_dir = false/true
normalize_attach_filename = false/true
...

?

comment:8 by Jun Omae, 10 years ago

The issue often occurs with CJK users. Because the most URL-quoted one Japanese character is 9 bytes. The physical path of a attachment is very long.

I want to solve it in Trac 0.13.

>>> from trac.util.text import unicode_quote
>>> myname = u'大前潤'
>>> unicode_quote(myname)
'%E5%A4%A7%E5%89%8D%E6%BD%A4'
>>> len(unicode_quote(myname))
27

I suggest that physical filepath of a attachment is based on md5(parent_id) and md5(filepath).

  • trac/attachment.py

    diff --git a/trac/attachment.py b/trac/attachment.py
    index a89829d..c23abba 100644
    a b from trac.perm import PermissionError, IPermissionPolicy  
    3838from trac.resource import *
    3939from trac.search import search_to_sql, shorten_result
    4040from trac.util import content_disposition, create_unique_file, get_reporter_id
     41from trac.util.compat import md5
    4142from trac.util.datefmt import format_datetime, from_utimestamp, \
    4243                              to_datetime, to_utimestamp, utc
    4344from trac.util.text import exception_to_unicode, pretty_size, print_table, \
    class Attachment(object):  
    166167
    167168    def _get_path(self, parent_realm, parent_id, filename):
    168169        path = os.path.join(self.env.path, 'attachments', parent_realm,
    169                             unicode_quote(parent_id))
     170                            md5(parent_id.encode('utf-8').hexdigest()))
    170171        if filename:
    171             path = os.path.join(path, unicode_quote(filename))
     172            path = os.path.join(path,
     173                                md5(filename.encode('utf-8')).hexdigest())
    172174        return os.path.normpath(path)
    173175
    174176    @property

in reply to:  8 ; comment:9 by Christian Boos, 10 years ago

Replying to jomae:

The issue often occurs with CJK users. Because the most URL-quoted one Japanese character is 9 bytes. The physical path of a attachment is very long.

I want to solve it in Trac 0.13.

Fine by me!

I suggest that physical filepath of a attachment is based on md5(parent_id) and md5(filepath).

… but perhaps only if len(unicode_quote(filename)) > 254. That way we fix the current problem without any backward compatibility issues.

comment:10 by slevin@…, 10 years ago

2Jomae: Damn! You're my hero! Very nice solution, exellent solution I guess.

2cboos: do not rephrase that - it's solution for any files with filenames on any language. I guess this patch must be commit to the main code of core trac.

in reply to:  9 ; comment:11 by Jun Omae, 10 years ago

Replying to cboos:

I suggest that physical filepath of a attachment is based on md5(parent_id) and md5(filepath).

… but perhaps only if len(unicode_quote(filename)) > 254.

parent_id and filename should always been hashed using md5().

On Windows, Max path length (not filename) is 260 characters. parent_id is PageName/SubName/あいう/... if parent_realm is wiki. The filename limitation of attachment for wiki page is more short than 254.

That way we fix the current problem without any backward compatibility issues.

I think that we create new upgrade file which renames a hashed path from a current path of attachments.

in reply to:  11 comment:12 by Remy Blank, 10 years ago

Replying to jomae:

parent_id and filename should always been hashed using md5().

Call me paranoid, but I would rather use SHA-1. It's just as fast, and has a lower chance of collision (think "malicious attacker").

I think that we create new upgrade file which renames a hashed path from a current path of attachments.

It's not a database update, so I wouldn't use a db*.py file. Instead, create a new component in attachments.py that implements IEnvironmentSetupParticipant, that checks for the existence of the attachments directory in the environment in environment_needs_upgrade(), and requests an upgrade if it exists. To upgrade, walk through all files below attachments and move them to a new hierarchy in attachments2 (or whatever name makes sense), ideally also implementing the sharding suggested in #7397. When all files have been moved, remove the attachments directory. This ensures that an interrupted upgrade will resume correctly.

Oh, and #7554 seems to be a duplicate of this ticket.

comment:13 by anonymous, 10 years ago

2Jomae: which version of python u're using?

I get error: AttributeError: 'str' object has no attribute 'hexdigest'
Файл "/usr/lib64/python2.5/site-packages/trac/web/main.py", строка 511, в _dispatch_request
  dispatcher.dispatch(req)
Файл "/usr/lib64/python2.5/site-packages/trac/web/main.py", строка 237, в dispatch
  resp = chosen_handler.process_request(req)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 449, в process_request
  self._do_save(req, attachment)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 685, в _do_save
  attachment.insert(filename, upload.file, size)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 256, в insert
  commonprefix = os.path.commonprefix([attachments_dir, self.path])
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 172, в path
  return self._get_path(self.parent_realm, self.parent_id, self.filename)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 164, в _get_path
  md5(parent_id.encode('utf-8').hexdigest()))

in reply to:  13 comment:14 by Jun Omae, 10 years ago

Sorry, the patch is untested and has a compatibilty problem. The complete patch will perhaps be larger than it.

The following patch (untested) maybe solves your AttributeError.

  • trac/attachment.py

    old new  
    167167
    168168    def _get_path(self, parent_realm, parent_id, filename):
    169169        path = os.path.join(self.env.path, 'attachments', parent_realm,
    170                             md5(parent_id.encode('utf-8').hexdigest()))
     170                            md5(parent_id.encode('utf-8')).hexdigest())
    171171        if filename:
    172172            path = os.path.join(path,
    173173                                md5(filename.encode('utf-8')).hexdigest())
Last edited 10 years ago by Jun Omae (previous) (diff)

comment:15 by Remy Blank, 10 years ago

@slevin: You shouldn't be using this patch on a production system, it will break all your existing attachments. jomae only provided it here as a proof of concept.

comment:16 by Jun Omae, 10 years ago

Replying to rblank:

Call me paranoid, but I would rather use SHA-1. It's just as fast, and has a lower chance of collision (think "malicious attacker").

Ok, we use SHA-1.

I think that we create new upgrade file which renames a hashed path from a current path of attachments.

It's not a database update, so I wouldn't use a db*.py file. Instead, create a new component in attachments.py that implements IEnvironmentSetupParticipant, ….

Oh, your suggestion is best. I forgot IEnvironmentSetupParticipant….

To upgrade, walk through all files below attachments and move them to a new hierarchy in attachments2 (or whatever name makes sense), ideally also implementing the sharding suggested in #7397.

I just read #7393. I think better that a realm directory has directories which the name is leading 3 bytes of SHA-1(parent_id).

TRACENV
  + attachments2/
    + ticket/
      + 000/
      + ...
      + 356/
        + 356a192b7913b04c54574d18c28d46e6395428ab/        # => SHA-1('1')
          + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
      + ...
      + fff/
    + wiki/
      + 000/
      + ...
      + 3ac/
        + 3ac5d78f94c8925a0940cade64cea1adf0983195/        # => SHA-1('Page/SubPage')
          + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
      + f09/
        + f0955265d583361233790e2a4da795373c345a9f/        # => SHA-1('WikiStart')
          + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
      + ...
      + fff/

in reply to:  15 comment:17 by anonymous, 10 years ago

Replying to rblank:

@slevin: You shouldn't be using this patch on a production system, it will break all your existing attachments. jomae only provided it here as a proof of concept.

It's ok. I'm tested that on copy of system with production trac.

comment:18 by anonymous, 10 years ago

I need yours help again:

I'm patched attachment.py and try to put attachment to the new ticket and get result:

ntrac 656f0dbf9392657eed7feefc486781fb # ls -la
total 3664
drwxr-xr-x   2 apache apache    4096 2011-08-12 11:45 .
drwxr-xr-x 740 apache apache   12288 2011-08-12 10:54 ..
-rw-r--r--   1 apache apache  128000 2011-08-12 11:23 2811_%20%28%D0%B7%D0%B0%D0%B3%D0%BB%D1%83%D1%88%D0%BA%D0%B0_%D0%BE%D1%82%D0%BA%D0%BB%D1%8E%D1%87%D0%B5%D0%BD%D0%BD%D1%8B%D0%B5_%D0%B8%D0%BD%D1%82%D0%B5%D1%80%D1%84%D0%B5%D0%B9%D1%81%D1%8B%29.doc
-rw-r--r--   1 apache apache   72192 2011-08-12 11:45 %D0%98%D0%BD%D0%B2%D0%B5%D0%BD%D1%82%D0%B0%D1%80%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D0%BE%D0%BD%D0%BD%D0%B0%D1%8F%20%D0%BE%D0%BF%D0%B8%D1%81%D1%8C%20%D0%B4%D0%BB%D1%8F%20%D0%BF%D1%80%D0%BE%D0%B2%D0%B5%D1%80%D0%BA%D0%B8%2028.02.2011.2.xls

so, path was converted to md5 - that good! (656f0dbf9392657eed7feefc486781fb) but filename still in URL-encode format. Please help to find mistake… may be I must add other patches to attachment.py?

in reply to:  16 ; comment:19 by Christian Boos, 10 years ago

Replying to jomae:

It's not a database update, so I wouldn't use a db*.py file. Instead, create a new component in attachments.py that implements IEnvironmentSetupParticipant, ….

Oh, your suggestion is best. I forgot IEnvironmentSetupParticipant

I'm not completely sold on the idea yet. I think a lot of people could miss the straightforwardness of the current scheme. A forced upgrade doesn't sound appealing to me.

I'd prefer that we let people switch explicitly. For this, I think we could use the $ENV/VERSION file and bump that from:

Trac Environment Version 1

to:

Trac Environment Version 2
Attachments: compact

by using an explicit trac-admin $ENV attachment compact command. Using the VERSION file seems more appropriate than using the .ini file, as this describes how the present environment is laid out (not something that can be inherited for example). Instead of a "global" number, the Version 2 here is just to indicate that we have key/value pairs in what follows.

In addition to the attachment compact subcommand, we will also need a few commands to do the reverse mapping (given wiki/3ac/3ac5d78f94c8925a0940cade64cea1adf0983195/636120a629e1539d87fa47afee8847a253690437.pdf find the corresponding parent / attachment).

New environments could be created by default with the new scheme, of course.

To upgrade, walk through all files below attachments and move them to a new hierarchy in attachments2 (or whatever name makes sense), ideally also implementing the sharding suggested in #7397.

I just read #7393. I think better that a realm directory has directories which the name is leading 3 bytes of SHA-1(parent_id).

TRACENV
  + attachments2/
    + ticket/
      + 000/
      + ...
      + 356/
        + 356a192b7913b04c54574d18c28d46e6395428ab/        # => SHA-1('1')
           + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
...

Looks good! However I think we should keep the extension (i.e. 636120a629e1539d87fa47afee8847a253690437.pdf in the example above) so that it could help during a manual inspection and it could also be used by web servers so that they could set the Content-Type: by themselves in X-Sendfile scenarios.

in reply to:  19 ; comment:20 by Remy Blank, 10 years ago

Replying to cboos:

I'm not completely sold on the idea yet. I think a lot of people could miss the straightforwardness of the current scheme. A forced upgrade doesn't sound appealing to me.

I don't see a reason to keep the old scheme around. We have all the necessary trac-admin commands to work with attachments (add, remove, export). We could extend trac-admin $ENV attachment list to show the path to each attachment, and make the argument optional, in which case all attachments would be listed (and maybe allow only a realm, to show all attachments for the given realm).

Keeping the old scheme will just make the code more complicated for no clear benefit. Let's keep it simple.

Looks good! However I think we should keep the extension (i.e. 636120a629e1539d87fa47afee8847a253690437.pdf in the example above) so that it could help during a manual inspection and it could also be used by web servers so that they could set the Content-Type: by themselves in X-Sendfile scenarios.

+1

in reply to:  20 ; comment:21 by Christian Boos, 10 years ago

Replying to rblank:

Replying to cboos:

I'm not completely sold on the idea yet. I think a lot of people could miss the straightforwardness of the current scheme. A forced upgrade doesn't sound appealing to me.

I don't see a reason to keep the old scheme around. We have all the necessary trac-admin commands to work with attachments (add, remove, export). We could extend trac-admin $ENV attachment list to show the path to each attachment, and make the argument optional, in which case all attachments would be listed (and maybe allow only a realm, to show all attachments for the given realm).

Keeping the old scheme will just make the code more complicated for no clear benefit. Let's keep it simple.

I'm all for keeping things simple, but should things only be simple for us in the code, or for our users? I suspect that for many, switching to the new scheme will look unnecessarily complex. Imagine someone running a periodic virus scan on $ENV/attachments/, with the old scheme he can immediately relate a problematic file to the corresponding attachment and prepare a list for deletion candidates. With the new scheme, this will be more complex to achieve.

So I think that maintaining the minimal overall complexity is always trade-off and in this case, if the extra complexity in the code is just keeping the two lines for the old way in _get_path, it's well worth it. Also, an IEnvironmentSetupParticipant.environment_needs_upgrade check has a cost you'll pay at every process startup, it's also something we don't want to do lightly (here I suspect it will not be completely trivial to detect if we're using the old or new scheme). So an explicit attachment compact (or compactify) will also be simpler in that perspective.

in reply to:  21 ; comment:22 by Remy Blank, 10 years ago

Replying to cboos:

I'm all for keeping things simple, but should things only be simple for us in the code, or for our users? I suspect that for many, switching to the new scheme will look unnecessarily complex.

My feeling is that 99% of our users never poke into the attachments folder (and neither should they). And the remaining 1% know how to use trac-admin and do some scripting. But let's face it, you can claim the contrary, and we'll never know for sure.

So I think that maintaining the minimal overall complexity is always trade-off and in this case, if the extra complexity in the code is just keeping the two lines for the old way in _get_path, it's well worth it.

Nah. Two lines plus an option to define which variant to use (whether it's stored in trac.ini or VERSION), plus documentation for both variants, plus tests for both variants, plus support and maintenance work for both variants. That sounds like a high cost to me.

Also, an IEnvironmentSetupParticipant.environment_needs_upgrade check has a cost you'll pay at every process startup, it's also something we don't want to do lightly (here I suspect it will not be completely trivial to detect if we're using the old or new scheme).

Err, I don't follow you. You mean we can't afford to do a single stat() for $ENV/attachments on startup (see comment:12)?

in reply to:  22 ; comment:23 by Christian Boos, 10 years ago

Replying to rblank:

Replying to cboos:

I'm all for keeping things simple, but should things only be simple for us in the code, or for our users? I suspect that for many, switching to the new scheme will look unnecessarily complex.

My feeling is that 99% of our users never poke into the attachments folder (and neither should they). And the remaining 1% know how to use trac-admin and do some scripting. But let's face it, you can claim the contrary, and we'll never know for sure.

I suppose we'll know when people will start complaining ;-) Ok, let's proceed with an automatic upgrade, then.

… You mean we can't afford to do a single stat() for $ENV/attachments on startup (see comment:12)?

Oh sure. I misread comment:12 and somehow assumed a final step would be a renaming back to $ENV/attachments after the upgrade.

I'd even suggest that we take this opportunity to adopt the $ENV/files/attachments path, as there can be other kinds of files to put below $ENV/files besides attachment, like dynamically generated content. For example the graph images generated by TH:GraphvizPlugin are currently placed in $ENV/cache and could be moved in $ENV/files/cache instead. That would again make things easier to setup in an X-Sendfile scenario.

in reply to:  23 comment:24 by Remy Blank, 10 years ago

Replying to cboos:

I'd even suggest that we take this opportunity to adopt the $ENV/files/attachments path, as there can be other kinds of files to put below $ENV/files besides attachment, like dynamically generated content.

Good idea. +1

comment:25 by osimons, 10 years ago

Cc: osimons added

Another change that looks set to be non-trivial for those that maintain many sites or high-performance sites… I don't do so myself, but know for a fact that many of the larger public Trac installs serve attachments directly via Apache (or other web server) and not via Trac. Any scheme that involves de-referencing will naturally break. I'm also -1 on supporting concurrent schemes as that adds further needless complexity.

I'd rather we stuck to the current simple scheme and rather auto-shortened filenames with a notice to back the user.

in reply to:  25 ; comment:26 by Remy Blank, 10 years ago

Replying to osimons:

I don't do so myself, but know for a fact that many of the larger public Trac installs serve attachments directly via Apache (or other web server) and not via Trac.

That's generally a bad idea. Did you see #7894 and the suggestion to use X-Sendfile?

in reply to:  26 ; comment:27 by osimons, 10 years ago

Replying to rblank:

Replying to osimons:

I don't do so myself, but know for a fact that many of the larger public Trac installs serve attachments directly via Apache (or other web server) and not via Trac.

That's generally a bad idea. Did you see #7894 and the suggestion to use X-Sendfile?

Yes, that sounds like a good idea. But as I said, I don't serve directly myself so I cannot comment on the technical merit of the various approaches. I just know from many years of mailing list and IRC that high load and large attachments via Python is also well capable of making sites non-responsive.

Oh, another thing I forgot to complain about… :-) Moving attachments to files/attachments sounds like a bad idea to me. Or rather, it is a major change based on no pressing need and only serves to confuse. Is having a handful of directories in $ENV really that confusing? What are files anyway? It is nondescript and without real meaning - all content in $ENV are files… Do I put my logo in files or htdocs? I like the fact that each directory in $ENV is owned by a module, and don't think other modules should depend on some hierarchy that may or may not exist depending on modules being enabled or not with corresponding upgrades. The fact that I already use files directory 'namespace' for a per-project webdav project share is of minor consequence.

in reply to:  27 ; comment:28 by Christian Boos, 10 years ago

Replying to osimons:

… Moving attachments to files/attachments sounds like a bad idea to me. Or rather, it is a major change based on no pressing need and only serves to confuse.

Well, from the discussion above, we're already bound to move away from $ENV/attachments/ (comment:12). Besides being a nicer name than $ENV/attachments2, using $ENV/<something>/attachments/ would also enable $ENV/<something> to serve as a single point for exporting other files that can be served directly by the web server. So the <something> could be "share", or "raw", or "export" instead of "files" if the latter is too vague. All the other folders in an environment will not contain files that are served directly … except for $ENV/htdocs.

… Do I put my logo in files or htdocs?

The main differentiating factor here is read permission. Basically any "chrome" file below $ENV/htdocs/ is readable by all. OTOH, files below $ENV/.../ are files that are subject to the usual permission policy checks set up by the modules.

I like the fact that each directory in $ENV is owned by a module, and don't think other modules should depend on some hierarchy that may or may not exist depending on modules being enabled or not with corresponding upgrades.

Good point. The IEnvironmentSetupParticipant should then be implemented by a standalone required component (AttachmentSetup, similar to EnvironmentSetup).

in reply to:  28 ; comment:29 by osimons, 10 years ago

Replying to cboos:

Well, from the discussion above, we're already bound to move away from $ENV/attachments/ (comment:12).

Because it is a bad idea to build a new converted temporary structure, and then move attachments -> attachments.old and attachments.new -> attachments? Each successful upgrade should be committed individually as we have discussed before (can't remember ticket id).

Besides being a nicer name than $ENV/attachments2, using $ENV/<something>/attachments/ would also enable $ENV/<something> to serve as a single point for exporting other files that can be served directly by the web server. So the <something> could be "share", or "raw", or "export" instead of "files" if the latter is too vague. All the other folders in an environment will not contain files that are served directly … except for $ENV/htdocs. The main differentiating factor here is read permission. Basically any "chrome" file below $ENV/htdocs/ is readable by all. OTOH, files below $ENV/…/ are files that are subject to the usual permission policy checks set up by the modules.

Isn't this contradicting yourself? So when I configure my files to serve your th:GraphvizPlugin cache directly according to the Trac docs for static sharing / performance, I'm at the same time opening up all attachments for public access? And files from any other plugins that may store its information in files? Is it the plugin that should decide if its data is statically shareable - or the admin?

comment:30 by Sergey V.Levin <slevin@…>, 10 years ago

Dear Sirs!

May I ask you - what about my little problem? Can we're patched 0.12.2 for storing attachments with LFN on different native language?

About back compatibility: May be need converter attachments names to md5/sha1 names and look like a plugin for trac?

  1. Patched attachment.py (or else what must be patched)
  2. Install converter plugin and convert attachments in selected trac

hmmmm, bad idea?

in reply to:  29 ; comment:31 by Christian Boos, 10 years ago

Replying to osimons:

Replying to cboos:

Well, from the discussion above, we're already bound to move away from $ENV/attachments/ (comment:12).

Because it is a bad idea to build a new converted temporary structure, and then move attachments -> attachments.old and attachments.new -> attachments? Each successful upgrade should be committed individually as we have discussed before (can't remember ticket id).

I'm not sure what you mean here, if you're making the same wrong assumption I did about comment:12 or not. Remy's idea is to move all attachements to the new scheme, below a new toplevel folder (be it $ENV/attachments2 or $ENV/files/attachments), then remove $ENV/attachments and use the absence of that folder as an indicator that the upgrade was performed.

Isn't this contradicting yourself? (snip)

Please re-read Remy's comment:26. We're not talking about Alias or similar, but about X-Sendfile, which is specifically designed to let the application logic handle all the steps of a request from permission checks, up to selecting the actual location of the file, except sending the file itself.

in reply to:  31 ; comment:32 by osimons, 10 years ago

Replying to cboos:

Remy's idea is to move all attachements to the new scheme, below a new toplevel folder (be it $ENV/attachments2 or $ENV/files/attachments), then remove $ENV/attachments and use the absence of that folder as an indicator that the upgrade was performed.

Ah. So there is agreement on keeping concurrent support for old and new schemes? If so, another simple option is to use the system table in the database like most other plugins do: attachments_version | 2

Please re-read Remy's comment:26. We're not talking about Alias or similar, but about X-Sendfile, which is specifically designed to let the application logic handle all the steps of a request from permission checks, up to selecting the actual location of the file, except sending the file itself.

All right. Suppose I got some reading to do.

Last edited 10 years ago by osimons (previous) (diff)

in reply to:  32 comment:33 by Christian Boos, 10 years ago

Replying to osimons:

.. So there is agreement on keeping concurrent support for old and new schemes?

Well, no. In comment:23, I actually agreed with Remy and Jun on doing an automatic upgrade, as described in comment:12.

The additional ideas since then:

  • suggestion to use a specific component AttachmentSetup with required = True set (comment:28)
  • suggestion to take the opportunity of this move to use a specific $ENV/<something> toplevel folder for simpler setup of X-Sendfile (e.g. XSendFilePath in the case of Apache's mod_xsendfile)

As of today, my preference for the <something> name is "raw" ;-)

in reply to:  32 comment:34 by Remy Blank, 10 years ago

Replying to osimons:

Ah. So there is agreement on keeping concurrent support for old and new schemes?

No, my preference would be to create a new location for the attachments (with sharding, etc) and have the upgrade move the attachments to the new location, and finally to remove the old attachments folder. The upgrade would be triggered by the existence of the old folder. No changes are needed in the database, and the process is completely restartable.

comment:35 by osimons, 10 years ago

So, just to summarize: The upgrade will be required, it will be implemented in some revision X of Trac. And instead of the usual Trac upgrade with number increment based on easy-to-understand upgrade scripts, there is a need to use a folder-detection scheme as a behind-the-scenes trigger for an upgrade we hide away in a non-standard place - and we will keep loading the required component and make the folder check for all future Trac versions and every future Trac process and environment load forever and ever?

Is it just me being dumb when I don't quite understand why this is the best way?

in reply to:  35 comment:36 by Christian Boos, 10 years ago

Replying to osimons:

… I don't quite understand why this is the best way?

Well, it's not a db upgrade. We could have a mechanism similar to database_version and upgrades/dbN.py files (say filesystem_version and upgrades/fsN.py), but I think that would be a mistake, as we already know the inconvenience of having a global number for the db upgrades (#8172, but this also applies to core components).

So a finer grained numbering would be in order (for example what I suggested in comment:19). But if we make the upgrade mandatory, doing a simple check for the existence of the $ENV/attachments folder is even simpler and the cost is acceptable.

in reply to:  35 comment:37 by Remy Blank, 10 years ago

Replying to osimons:

Is it just me being dumb when I don't quite understand why this is the best way?

We don't need to touch the database at all, so it's not a database upgrade. We need to move all attachment files to new locations, an operation that cannot be made atomically (unlike a database upgrade). So we need a way to detect an interrupted upgrade, and continue from where we left. Why would we want to additionally record the fact that we have finished, if the absence of the old directory already indicates this?

I believe that the scheme I proposed is the simplest way to implement that, yes. I really see no reason to link the upgrade to the database upgrade numbering. It's not the first time we use a separate IEnvironmentSetupParticipant for something like that: see for example ConfigurableTicketWorkflow in default_workflow.py, where we trigger the upgrade based on the absence of the [ticket-workflow] section in trac.ini.

I wouldn't put it into a separate component, though. The AttachmentModule would be good enough for me, and that would avoid having to load one more module just for that. I'm also ok with a separate component, though.

comment:38 by Jun Omae, 10 years ago

I worked on https://bitbucket.org/jun66j5/trac/compare/t10313-hashed-path..trunk.

New AttachmentSetup which is separated component migrates from $ENV/attachments to $ENV/files/attachments. And my changes has no trac/upgrades/db#.py.

in reply to:  38 ; comment:39 by Christian Boos, 10 years ago

Replying to jomae:

I worked on https://bitbucket.org/jun66j5/trac/compare/t10313-hashed-path..trunk.

Great job!

I have some questions about the ugprade process though, if we want to make it restartable (i.e. just be able to re-run trac-admin ... upgrade in case the first one failed or was interrupted).

  • the environment_needs_upgrade() condition should be simply return os.path.exists(old_dir)
  • instead of only a big rmtree(... 'attachments') at the end, I think we should also remove each old attachment after a successful copy, in _copy_attachment_file. Or what about just a rename? (or even os.renames?)

in reply to:  39 comment:40 by Jun Omae, 10 years ago

Replying to cboos:

I have some questions about the ugprade process though, if we want to make it restartable (i.e. just be able to re-run trac-admin ... upgrade in case the first one failed or was interrupted).

  • the environment_needs_upgrade() condition should be simply return os.path.exists(old_dir)
  • instead of only a big rmtree(... 'attachments') at the end, I think we should also remove each old attachment after a successful copy, in _copy_attachment_file. Or what about just a rename? (or even os.renames?)

I think that we should change $ENV/attachments directory after all files are copied. But it isn't restartable.

Your suggestion is simple and great. I update my branch with os.renames!

comment:41 by Remy Blank, 10 years ago

I would still avoid using rmtree(), and instead walk the tree and remove only empty directories. Call me paranoid, but I just hate using a function that does a rm -rf.

in reply to:  41 ; comment:42 by Christian Boos, 10 years ago

Replying to rblank:

I would still avoid using rmtree(), and instead walk the tree and remove only empty directories. Call me paranoid, but I just hate using a function that does a rm -rf.

Same feeling here… In addition, if we happen to find non-empty directories, we should tell the user about any file that has been left there below $ENV/attachments and ask him to take action (remove it or move it somewhere else).

The other changes look great ([aae52921becb/jomae], [b77020479b0a/jomae]).

in reply to:  42 comment:43 by Jun Omae, 10 years ago

Replying to cboos:

Same feeling here… In addition, if we happen to find non-empty directories, we should tell the user about any file that has been left there below $ENV/attachments and ask him to take action (remove it or move it somewhere else).

Ok, I understand. When $ENV/attachments is empty, we should remove the directory. See [9b449f9ec758/jomae]. Thanks for your suggestions and advices, Remy and Christian!

The following output is the example.

jun66j5@gotanda:2792$ PYTHONPATH=$PWD python26 trac/admin/console.py ~/var/trac/0.12-stable upgrade
Error while deleting old attachments directory. Please move or remove files in
the directory and try again.
OSError: [Errno 39] Directory not empty: '/home/jun66j5/var/trac/0.12-stable/attachments'

comment:44 by bebugz@…, 10 years ago

Cc: bebugz@… added

I would also vote for option (trac.ini configurable) to truncate (Yes, I mean that) the filename to maximum possible length in the existing setup. The original filename can be auto-set in attachment comment then. I think that this solution will be acceptable for some users as importancy of getting file attached is higher and, easier to implement and doesn't conradict with further solutions.

comment:45 by Jun Omae, 10 years ago

Cc: Jun Omae added

comment:46 by Sergey V.Levin <slevin@…>, 10 years ago

2 month did not hear any news about problem with attachments to the trac with LFN on native language like russian, japan, china… Can I ask about the solution for it? Or may be it be resolved only in 0.13.0 version?

comment:47 by slevin@…, 10 years ago

So, are we havn't the solution? or no? Is it already fixed in 0.13.dev ?

in reply to:  47 comment:48 by Christian Boos, 10 years ago

Replying to slevin@…:

So, are we havn't the solution? or no? Is it already fixed in 0.13.dev ?

Why don't you try a more constructive approach, like: "I've tested the fix proposed jomae (branch [9b449f9ec758/jomae]) for 2 months now and it works great for us!". That would be more effective…

by Jun Omae, 10 years ago

Migration to new-style attachment directory, based on r11001

comment:49 by Jun Omae, 10 years ago

I just refreshed t10313-hashed-path/jomae with trunk.

  • Need upgrade, the upgrade migrates from old-style attachments directory to new-style.
    • Old-style: $TRACENV/attachments/realm/id/filename.ext
    • New-style: $TRACENV/files/attachments/realm/SHA-1(id)[:3]/SHA-1(id)/SHA-1(filename).ext
      $TRACENV/files/attachments/realm/SHA-1(id)/SHA-1(filename).ext (See Remy's comment:50)
  • Don't keep old-style attachments directory.

Ready to commit to trunk, t10313-hashed-attachment-path.diff.

Last edited 10 years ago by Jun Omae (previous) (diff)

in reply to:  49 ; comment:50 by Remy Blank, 10 years ago

Replying to jomae:

  • New-style: $TRACENV/files/attachments/realm/SHA-1(id)/SHA-1(filename).ext

I was going to ask where the sharding went, but I see you did implement it. So the pattern for new-style attachments is actually the following:

$TRACENV/files/attachments/realm/sha1(id)[:3]/sha1(id)/sha1(filename).ext

This still has the limitation that you can't have more than 32'000 attachments per ticket, but I guess that's reasonable (and much less likely than having 32'000 tickets).

in reply to:  50 comment:51 by Jun Omae, 10 years ago

Replying to rblank:

I was going to ask where the sharding went, but I see you did implement it. So the pattern for new-style attachments is actually the following:

$TRACENV/files/attachments/realm/sha1(id)[:3]/sha1(id)/sha1(filename).ext

You're right. Sorry.

This still has the limitation that you can't have more than 32'000 attachments per ticket, but I guess that's reasonable (and much less likely than having 32'000 tickets).

I also guess that the limitation is enough to attach to each ticket, wiki, etc….

in reply to:  27 ; comment:52 by Thijs Triemstra, 10 years ago

Replying to osimons:

Oh, another thing I forgot to complain about… :-) Moving attachments to files/attachments sounds like a bad idea to me. Or rather, it is a major change based on no pressing need and only serves to confuse. Is having a handful of directories in $ENV really that confusing? What are files anyway? It is nondescript and without real meaning - all content in $ENV are files… Do I put my logo in files or htdocs? I like the fact that each directory in $ENV is owned by a module, and don't think other modules should depend on some hierarchy that may or may not exist depending on modules being enabled or not with corresponding upgrades. The fact that I already use files directory 'namespace' for a per-project webdav project share is of minor consequence.

I agree, files also suggests the log files should be in there.. Why not keep the current attachments name, is it because this complicates the upgrade?

in reply to:  52 comment:53 by Jun Omae, 10 years ago

Replying to thijstriemstra:

Why not keep the current attachments name, is it because this complicates the upgrade?

The scheme differs between current and new attachments directory. IMO, the upgrade should move it.

And, see comment:34, 35 and 36, the existence of the attachments directory triggers the upgrade and the process will be restartable. It's so simple.

comment:54 by Jun Omae, 10 years ago

Milestone: next-major-0.1X0.13
Release Notes: modified (diff)
Resolution: fixed
Status: newclosed

Committed in [11028]. Requires trac-admin $TRACENV upgrade by the changes.

Note: See TracTickets for help on using tickets.