Edgewall Software
Modify

Opened 3 years ago

Closed 2 years ago

Last modified 10 months ago

#10313 closed defect (fixed)

can't attache to the ticket filen with long name on native language (like russian).

Reported by: slevin@… Owned by: jomae
Priority: normal Milestone: 1.0
Component: attachment Version: 0.12.2
Severity: normal Keywords: long filename
Cc: osimons, bebugz@…, jomae
Release Notes:

Uses SHA-1 instead of URL-encode for the physical file name of attachments. (Requires environment upgrade.)

API Changes:

Description

When I try to attache file with long name on russian language I get error:

fx = open(part_file, 'wb')
201	IOError: [Errno 36] File name too long: '/tmp/%D0%94%D0%BE%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%20%D0%BE%20%D0%BF%D1%80%D0%B5%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BB%D0%B5%D0%BD%D0%B8%D0%B8%20%D1%83%D1%81%D0%BB%D1%83%D0%B3%20%D1%82%D0%B5%D0%BB%D0%B5%D0%BA%D0%BE%D0%BC%D0%BC%D1%83%D0%BD%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D0%B9.doc'

That is result convert utf-8 filename 182 characters lenght to URL-encode filename, after that new lenght of that file equal 480 characters, that more then limit fo filename lenght for linux system. How it can be fixed?

PS: previously, when we're get the problem with long filename in utf-8 charset wich sended via email, we arehave a long disscussion with author of email2trac plugin here: https://subtrac.sara.nl/oss/email2trac/ticket/247, also that ticket contained attachments for reproducing error.

PPS: our environment - gentoo + python 2.5.2 + trac 0.12.2.

Attachments (3)

t10313-hashed-attachment-path.diff (15.9 KB) - added by jomae 3 years ago.
Migration to new-style attachment directory, based on r11001
t10313-preserve-dir-attrs.diff (1.6 KB) - added by jomae 3 years ago.
10313-attachment-upgrade-r11160.patch (2.9 KB) - added by rblank 2 years ago.
More admin-friendly attachment upgrade

Download all attachments as: .zip

Change History (79)

comment:1 follow-up: Changed 3 years ago by cboos

  • Milestone set to next-major-0.1X
  • Priority changed from high to normal

You're right - I was at first hoping for a quick answer like "use …fs which doesn't have this silly limit"… but it looks like most FS have such kind of limit (wikipedia:Comparison_of_file_systems#Limits).

So we could indeed try to find a naming scheme which allows us to by-pass this limit (to be coupled with the sharding scheme discussed in #7397).

comment:2 Changed 3 years ago by rblank

Well… The obvious fix is to use shorter file names, isn't it?

comment:3 Changed 3 years ago by cboos

Sure, but I wouldn't call that a fix, just a workaround ;-)

The point made by the OP was that it was relatively easy to come up with a "not that long" unicode file name that can't be uploaded as we currently turn it into a too long file name, after expanding to UTF-8, then URL-quoting it.

>>> from urllib import unquote
>>> filename = '%D0%94%D0%BE%D0%B3%D0%BE%D0%B2%D0%BE%D1%80%20%D0%BE%20%D0%BF%D1%80%D0%B5%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BB%D0%B5%D0%BD%D0%B8%D0%B8%20%D1%83%D1%81%D0%BB%D1%83%D0%B3%20%D1%82%D0%B5%D0%BB%D0%B5%D0%BA%D0%BE%D0%BC%D0%BC%D1%83%D0%BD%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D0%B9.doc'
>>> utf8_filename = unquote(filename)
>>> print utf8_filename
Договор о предоставлении услуг телекоммуникаций.doc
>>> unicode_filename = utf8_filename.decode('utf-8')
>>> len(filename), len(unicode_filename)
(274, 51)
Last edited 3 years ago by cboos (previous) (diff)

comment:4 Changed 3 years ago by anonymous

It's un-re-al - can't to explain windows users for use english characters for filenames. :( Usually situation when bookkeepers make file with lenght nearly 30 chracters and on native language.

comment:5 Changed 3 years ago by rblank

I did understand the issue, I was just proposing an immediate fix (one that didn't require waiting for 0.15 or so).

comment:6 in reply to: ↑ 1 Changed 3 years ago by slevin@…

Replying to cboos:

You're right - I was at first hoping for a quick answer like "use …fs which doesn't have this silly limit"… but it looks like most FS have such kind of limit (wikipedia:Comparison_of_file_systems#Limits).

you mean Raiser4 ??? In my example native filename equal 182 chars. After URL-encoding lenght equal 480 chars… for fs which supported by linux kernel almost have a limit filename to 255 chars. :(

So we could indeed try to find a naming scheme which allows us to by-pass this limit (to be coupled with the sharding scheme discussed in #7397).

hmmm… may be… may be you can try other encoding, or use scheme without encoding before store attachment body into /tmp ? Is it possible?

comment:7 Changed 3 years ago by anonymous

and I'm really support the idea by alexey.lustin (#7554): May be we need an option on trac.ini something like this

...
[attachement]
normalize_attach_dir = false/true
...

and some additional:

...
[attachement]
normalize_attach_dir = false/true
normalize_attach_filename = false/true
...

?

comment:8 follow-up: Changed 3 years ago by jomae

The issue often occurs with CJK users. Because the most URL-quoted one Japanese character is 9 bytes. The physical path of a attachment is very long.

I want to solve it in Trac 0.13.

>>> from trac.util.text import unicode_quote
>>> myname = u'大前潤'
>>> unicode_quote(myname)
'%E5%A4%A7%E5%89%8D%E6%BD%A4'
>>> len(unicode_quote(myname))
27

I suggest that physical filepath of a attachment is based on md5(parent_id) and md5(filepath).

  • trac/attachment.py

    diff --git a/trac/attachment.py b/trac/attachment.py
    index a89829d..c23abba 100644
    a b from trac.perm import PermissionError, IPermissionPolicy 
    3838from trac.resource import *
    3939from trac.search import search_to_sql, shorten_result
    4040from trac.util import content_disposition, create_unique_file, get_reporter_id
     41from trac.util.compat import md5
    4142from trac.util.datefmt import format_datetime, from_utimestamp, \
    4243                              to_datetime, to_utimestamp, utc
    4344from trac.util.text import exception_to_unicode, pretty_size, print_table, \
    class Attachment(object): 
    166167
    167168    def _get_path(self, parent_realm, parent_id, filename):
    168169        path = os.path.join(self.env.path, 'attachments', parent_realm,
    169                             unicode_quote(parent_id))
     170                            md5(parent_id.encode('utf-8').hexdigest()))
    170171        if filename:
    171             path = os.path.join(path, unicode_quote(filename))
     172            path = os.path.join(path,
     173                                md5(filename.encode('utf-8')).hexdigest())
    172174        return os.path.normpath(path)
    173175
    174176    @property

comment:9 in reply to: ↑ 8 ; follow-up: Changed 3 years ago by cboos

Replying to jomae:

The issue often occurs with CJK users. Because the most URL-quoted one Japanese character is 9 bytes. The physical path of a attachment is very long.

I want to solve it in Trac 0.13.

Fine by me!

I suggest that physical filepath of a attachment is based on md5(parent_id) and md5(filepath).

… but perhaps only if len(unicode_quote(filename)) > 254. That way we fix the current problem without any backward compatibility issues.

comment:10 Changed 3 years ago by slevin@…

2Jomae: Damn! You're my hero! Very nice solution, exellent solution I guess.

2cboos: do not rephrase that - it's solution for any files with filenames on any language. I guess this patch must be commit to the main code of core trac.

comment:11 in reply to: ↑ 9 ; follow-up: Changed 3 years ago by jomae

Replying to cboos:

I suggest that physical filepath of a attachment is based on md5(parent_id) and md5(filepath).

… but perhaps only if len(unicode_quote(filename)) > 254.

parent_id and filename should always been hashed using md5().

On Windows, Max path length (not filename) is 260 characters. parent_id is PageName/SubName/あいう/... if parent_realm is wiki. The filename limitation of attachment for wiki page is more short than 254.

That way we fix the current problem without any backward compatibility issues.

I think that we create new upgrade file which renames a hashed path from a current path of attachments.

comment:12 in reply to: ↑ 11 Changed 3 years ago by rblank

Replying to jomae:

parent_id and filename should always been hashed using md5().

Call me paranoid, but I would rather use SHA-1. It's just as fast, and has a lower chance of collision (think "malicious attacker").

I think that we create new upgrade file which renames a hashed path from a current path of attachments.

It's not a database update, so I wouldn't use a db*.py file. Instead, create a new component in attachments.py that implements IEnvironmentSetupParticipant, that checks for the existence of the attachments directory in the environment in environment_needs_upgrade(), and requests an upgrade if it exists. To upgrade, walk through all files below attachments and move them to a new hierarchy in attachments2 (or whatever name makes sense), ideally also implementing the sharding suggested in #7397. When all files have been moved, remove the attachments directory. This ensures that an interrupted upgrade will resume correctly.

Oh, and #7554 seems to be a duplicate of this ticket.

comment:13 follow-up: Changed 3 years ago by anonymous

2Jomae: which version of python u're using?

I get error: AttributeError: 'str' object has no attribute 'hexdigest'
Файл "/usr/lib64/python2.5/site-packages/trac/web/main.py", строка 511, в _dispatch_request
  dispatcher.dispatch(req)
Файл "/usr/lib64/python2.5/site-packages/trac/web/main.py", строка 237, в dispatch
  resp = chosen_handler.process_request(req)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 449, в process_request
  self._do_save(req, attachment)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 685, в _do_save
  attachment.insert(filename, upload.file, size)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 256, в insert
  commonprefix = os.path.commonprefix([attachments_dir, self.path])
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 172, в path
  return self._get_path(self.parent_realm, self.parent_id, self.filename)
Файл "/usr/lib64/python2.5/site-packages/trac/attachment.py", строка 164, в _get_path
  md5(parent_id.encode('utf-8').hexdigest()))

comment:14 in reply to: ↑ 13 Changed 3 years ago by jomae

Sorry, the patch is untested and has a compatibilty problem. The complete patch will perhaps be larger than it.

The following patch (untested) maybe solves your AttributeError.

  • trac/attachment.py

    old new  
    167167
    168168    def _get_path(self, parent_realm, parent_id, filename):
    169169        path = os.path.join(self.env.path, 'attachments', parent_realm,
    170                             md5(parent_id.encode('utf-8').hexdigest()))
     170                            md5(parent_id.encode('utf-8')).hexdigest())
    171171        if filename:
    172172            path = os.path.join(path,
    173173                                md5(filename.encode('utf-8')).hexdigest())
Last edited 3 years ago by jomae (previous) (diff)

comment:15 follow-up: Changed 3 years ago by rblank

@slevin: You shouldn't be using this patch on a production system, it will break all your existing attachments. jomae only provided it here as a proof of concept.

comment:16 follow-up: Changed 3 years ago by jomae

Replying to rblank:

Call me paranoid, but I would rather use SHA-1. It's just as fast, and has a lower chance of collision (think "malicious attacker").

Ok, we use SHA-1.

I think that we create new upgrade file which renames a hashed path from a current path of attachments.

It's not a database update, so I wouldn't use a db*.py file. Instead, create a new component in attachments.py that implements IEnvironmentSetupParticipant, ….

Oh, your suggestion is best. I forgot IEnvironmentSetupParticipant….

To upgrade, walk through all files below attachments and move them to a new hierarchy in attachments2 (or whatever name makes sense), ideally also implementing the sharding suggested in #7397.

I just read #7393. I think better that a realm directory has directories which the name is leading 3 bytes of SHA-1(parent_id).

TRACENV
  + attachments2/
    + ticket/
      + 000/
      + ...
      + 356/
        + 356a192b7913b04c54574d18c28d46e6395428ab/        # => SHA-1('1')
          + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
      + ...
      + fff/
    + wiki/
      + 000/
      + ...
      + 3ac/
        + 3ac5d78f94c8925a0940cade64cea1adf0983195/        # => SHA-1('Page/SubPage')
          + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
      + f09/
        + f0955265d583361233790e2a4da795373c345a9f/        # => SHA-1('WikiStart')
          + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
      + ...
      + fff/

comment:17 in reply to: ↑ 15 Changed 3 years ago by anonymous

Replying to rblank:

@slevin: You shouldn't be using this patch on a production system, it will break all your existing attachments. jomae only provided it here as a proof of concept.

It's ok. I'm tested that on copy of system with production trac.

comment:18 Changed 3 years ago by anonymous

I need yours help again:

I'm patched attachment.py and try to put attachment to the new ticket and get result:

ntrac 656f0dbf9392657eed7feefc486781fb # ls -la
total 3664
drwxr-xr-x   2 apache apache    4096 2011-08-12 11:45 .
drwxr-xr-x 740 apache apache   12288 2011-08-12 10:54 ..
-rw-r--r--   1 apache apache  128000 2011-08-12 11:23 2811_%20%28%D0%B7%D0%B0%D0%B3%D0%BB%D1%83%D1%88%D0%BA%D0%B0_%D0%BE%D1%82%D0%BA%D0%BB%D1%8E%D1%87%D0%B5%D0%BD%D0%BD%D1%8B%D0%B5_%D0%B8%D0%BD%D1%82%D0%B5%D1%80%D1%84%D0%B5%D0%B9%D1%81%D1%8B%29.doc
-rw-r--r--   1 apache apache   72192 2011-08-12 11:45 %D0%98%D0%BD%D0%B2%D0%B5%D0%BD%D1%82%D0%B0%D1%80%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D0%BE%D0%BD%D0%BD%D0%B0%D1%8F%20%D0%BE%D0%BF%D0%B8%D1%81%D1%8C%20%D0%B4%D0%BB%D1%8F%20%D0%BF%D1%80%D0%BE%D0%B2%D0%B5%D1%80%D0%BA%D0%B8%2028.02.2011.2.xls

so, path was converted to md5 - that good! (656f0dbf9392657eed7feefc486781fb) but filename still in URL-encode format. Please help to find mistake… may be I must add other patches to attachment.py?

comment:19 in reply to: ↑ 16 ; follow-up: Changed 3 years ago by cboos

Replying to jomae:

It's not a database update, so I wouldn't use a db*.py file. Instead, create a new component in attachments.py that implements IEnvironmentSetupParticipant, ….

Oh, your suggestion is best. I forgot IEnvironmentSetupParticipant

I'm not completely sold on the idea yet. I think a lot of people could miss the straightforwardness of the current scheme. A forced upgrade doesn't sound appealing to me.

I'd prefer that we let people switch explicitly. For this, I think we could use the $ENV/VERSION file and bump that from:

Trac Environment Version 1

to:

Trac Environment Version 2
Attachments: compact

by using an explicit trac-admin $ENV attachment compact command. Using the VERSION file seems more appropriate than using the .ini file, as this describes how the present environment is laid out (not something that can be inherited for example). Instead of a "global" number, the Version 2 here is just to indicate that we have key/value pairs in what follows.

In addition to the attachment compact subcommand, we will also need a few commands to do the reverse mapping (given wiki/3ac/3ac5d78f94c8925a0940cade64cea1adf0983195/636120a629e1539d87fa47afee8847a253690437.pdf find the corresponding parent / attachment).

New environments could be created by default with the new scheme, of course.

To upgrade, walk through all files below attachments and move them to a new hierarchy in attachments2 (or whatever name makes sense), ideally also implementing the sharding suggested in #7397.

I just read #7393. I think better that a realm directory has directories which the name is leading 3 bytes of SHA-1(parent_id).

TRACENV
  + attachments2/
    + ticket/
      + 000/
      + ...
      + 356/
        + 356a192b7913b04c54574d18c28d46e6395428ab/        # => SHA-1('1')
           + 636120a629e1539d87fa47afee8847a253690437       # => SHA-1('test.pdf')
...

Looks good! However I think we should keep the extension (i.e. 636120a629e1539d87fa47afee8847a253690437.pdf in the example above) so that it could help during a manual inspection and it could also be used by web servers so that they could set the Content-Type: by themselves in X-Sendfile scenarios.

comment:20 in reply to: ↑ 19 ; follow-up: Changed 3 years ago by rblank

Replying to cboos:

I'm not completely sold on the idea yet. I think a lot of people could miss the straightforwardness of the current scheme. A forced upgrade doesn't sound appealing to me.

I don't see a reason to keep the old scheme around. We have all the necessary trac-admin commands to work with attachments (add, remove, export). We could extend trac-admin $ENV attachment list to show the path to each attachment, and make the argument optional, in which case all attachments would be listed (and maybe allow only a realm, to show all attachments for the given realm).

Keeping the old scheme will just make the code more complicated for no clear benefit. Let's keep it simple.

Looks good! However I think we should keep the extension (i.e. 636120a629e1539d87fa47afee8847a253690437.pdf in the example above) so that it could help during a manual inspection and it could also be used by web servers so that they could set the Content-Type: by themselves in X-Sendfile scenarios.

+1

comment:21 in reply to: ↑ 20 ; follow-up: Changed 3 years ago by cboos

Replying to rblank:

Replying to cboos:

I'm not completely sold on the idea yet. I think a lot of people could miss the straightforwardness of the current scheme. A forced upgrade doesn't sound appealing to me.

I don't see a reason to keep the old scheme around. We have all the necessary trac-admin commands to work with attachments (add, remove, export). We could extend trac-admin $ENV attachment list to show the path to each attachment, and make the argument optional, in which case all attachments would be listed (and maybe allow only a realm, to show all attachments for the given realm).

Keeping the old scheme will just make the code more complicated for no clear benefit. Let's keep it simple.

I'm all for keeping things simple, but should things only be simple for us in the code, or for our users? I suspect that for many, switching to the new scheme will look unnecessarily complex. Imagine someone running a periodic virus scan on $ENV/attachments/, with the old scheme he can immediately relate a problematic file to the corresponding attachment and prepare a list for deletion candidates. With the new scheme, this will be more complex to achieve.

So I think that maintaining the minimal overall complexity is always trade-off and in this case, if the extra complexity in the code is just keeping the two lines for the old way in _get_path, it's well worth it. Also, an IEnvironmentSetupParticipant.environment_needs_upgrade check has a cost you'll pay at every process startup, it's also something we don't want to do lightly (here I suspect it will not be completely trivial to detect if we're using the old or new scheme). So an explicit attachment compact (or compactify) will also be simpler in that perspective.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 3 years ago by rblank

Replying to cboos:

I'm all for keeping things simple, but should things only be simple for us in the code, or for our users? I suspect that for many, switching to the new scheme will look unnecessarily complex.

My feeling is that 99% of our users never poke into the attachments folder (and neither should they). And the remaining 1% know how to use trac-admin and do some scripting. But let's face it, you can claim the contrary, and we'll never know for sure.

So I think that maintaining the minimal overall complexity is always trade-off and in this case, if the extra complexity in the code is just keeping the two lines for the old way in _get_path, it's well worth it.

Nah. Two lines plus an option to define which variant to use (whether it's stored in trac.ini or VERSION), plus documentation for both variants, plus tests for both variants, plus support and maintenance work for both variants. That sounds like a high cost to me.

Also, an IEnvironmentSetupParticipant.environment_needs_upgrade check has a cost you'll pay at every process startup, it's also something we don't want to do lightly (here I suspect it will not be completely trivial to detect if we're using the old or new scheme).

Err, I don't follow you. You mean we can't afford to do a single stat() for $ENV/attachments on startup (see comment:12)?

comment:23 in reply to: ↑ 22 ; follow-up: Changed 3 years ago by cboos

Replying to rblank:

Replying to cboos:

I'm all for keeping things simple, but should things only be simple for us in the code, or for our users? I suspect that for many, switching to the new scheme will look unnecessarily complex.

My feeling is that 99% of our users never poke into the attachments folder (and neither should they). And the remaining 1% know how to use trac-admin and do some scripting. But let's face it, you can claim the contrary, and we'll never know for sure.

I suppose we'll know when people will start complaining ;-) Ok, let's proceed with an automatic upgrade, then.

… You mean we can't afford to do a single stat() for $ENV/attachments on startup (see comment:12)?

Oh sure. I misread comment:12 and somehow assumed a final step would be a renaming back to $ENV/attachments after the upgrade.

I'd even suggest that we take this opportunity to adopt the $ENV/files/attachments path, as there can be other kinds of files to put below $ENV/files besides attachment, like dynamically generated content. For example the graph images generated by TH:GraphvizPlugin are currently placed in $ENV/cache and could be moved in $ENV/files/cache instead. That would again make things easier to setup in an X-Sendfile scenario.

comment:24 in reply to: ↑ 23 Changed 3 years ago by rblank

Replying to cboos:

I'd even suggest that we take this opportunity to adopt the $ENV/files/attachments path, as there can be other kinds of files to put below $ENV/files besides attachment, like dynamically generated content.

Good idea. +1

comment:25 follow-up: Changed 3 years ago by osimons

  • Cc osimons added

Another change that looks set to be non-trivial for those that maintain many sites or high-performance sites… I don't do so myself, but know for a fact that many of the larger public Trac installs serve attachments directly via Apache (or other web server) and not via Trac. Any scheme that involves de-referencing will naturally break. I'm also -1 on supporting concurrent schemes as that adds further needless complexity.

I'd rather we stuck to the current simple scheme and rather auto-shortened filenames with a notice to back the user.

comment:26 in reply to: ↑ 25 ; follow-up: Changed 3 years ago by rblank

Replying to osimons:

I don't do so myself, but know for a fact that many of the larger public Trac installs serve attachments directly via Apache (or other web server) and not via Trac.

That's generally a bad idea. Did you see #7894 and the suggestion to use X-Sendfile?

comment:27 in reply to: ↑ 26 ; follow-ups: Changed 3 years ago by osimons

Replying to rblank:

Replying to osimons:

I don't do so myself, but know for a fact that many of the larger public Trac installs serve attachments directly via Apache (or other web server) and not via Trac.

That's generally a bad idea. Did you see #7894 and the suggestion to use X-Sendfile?

Yes, that sounds like a good idea. But as I said, I don't serve directly myself so I cannot comment on the technical merit of the various approaches. I just know from many years of mailing list and IRC that high load and large attachments via Python is also well capable of making sites non-responsive.

Oh, another thing I forgot to complain about… :-) Moving attachments to files/attachments sounds like a bad idea to me. Or rather, it is a major change based on no pressing need and only serves to confuse. Is having a handful of directories in $ENV really that confusing? What are files anyway? It is nondescript and without real meaning - all content in $ENV are files… Do I put my logo in files or htdocs? I like the fact that each directory in $ENV is owned by a module, and don't think other modules should depend on some hierarchy that may or may not exist depending on modules being enabled or not with corresponding upgrades. The fact that I already use files directory 'namespace' for a per-project webdav project share is of minor consequence.

comment:28 in reply to: ↑ 27 ; follow-up: Changed 3 years ago by cboos

Replying to osimons:

… Moving attachments to files/attachments sounds like a bad idea to me. Or rather, it is a major change based on no pressing need and only serves to confuse.

Well, from the discussion above, we're already bound to move away from $ENV/attachments/ (comment:12). Besides being a nicer name than $ENV/attachments2, using $ENV/<something>/attachments/ would also enable $ENV/<something> to serve as a single point for exporting other files that can be served directly by the web server. So the <something> could be "share", or "raw", or "export" instead of "files" if the latter is too vague. All the other folders in an environment will not contain files that are served directly … except for $ENV/htdocs.

… Do I put my logo in files or htdocs?

The main differentiating factor here is read permission. Basically any "chrome" file below $ENV/htdocs/ is readable by all. OTOH, files below $ENV/.../ are files that are subject to the usual permission policy checks set up by the modules.

I like the fact that each directory in $ENV is owned by a module, and don't think other modules should depend on some hierarchy that may or may not exist depending on modules being enabled or not with corresponding upgrades.

Good point. The IEnvironmentSetupParticipant should then be implemented by a standalone required component (AttachmentSetup, similar to EnvironmentSetup).

comment:29 in reply to: ↑ 28 ; follow-up: Changed 3 years ago by osimons

Replying to cboos:

Well, from the discussion above, we're already bound to move away from $ENV/attachments/ (comment:12).

Because it is a bad idea to build a new converted temporary structure, and then move attachments -> attachments.old and attachments.new -> attachments? Each successful upgrade should be committed individually as we have discussed before (can't remember ticket id).

Besides being a nicer name than $ENV/attachments2, using $ENV/<something>/attachments/ would also enable $ENV/<something> to serve as a single point for exporting other files that can be served directly by the web server. So the <something> could be "share", or "raw", or "export" instead of "files" if the latter is too vague. All the other folders in an environment will not contain files that are served directly … except for $ENV/htdocs. The main differentiating factor here is read permission. Basically any "chrome" file below $ENV/htdocs/ is readable by all. OTOH, files below $ENV/…/ are files that are subject to the usual permission policy checks set up by the modules.

Isn't this contradicting yourself? So when I configure my files to serve your th:GraphvizPlugin cache directly according to the Trac docs for static sharing / performance, I'm at the same time opening up all attachments for public access? And files from any other plugins that may store its information in files? Is it the plugin that should decide if its data is statically shareable - or the admin?

comment:30 Changed 3 years ago by Sergey V.Levin <slevin@…>

Dear Sirs!

May I ask you - what about my little problem? Can we're patched 0.12.2 for storing attachments with LFN on different native language?

About back compatibility: May be need converter attachments names to md5/sha1 names and look like a plugin for trac?

  1. Patched attachment.py (or else what must be patched)
  2. Install converter plugin and convert attachments in selected trac

hmmmm, bad idea?

comment:31 in reply to: ↑ 29 ; follow-up: Changed 3 years ago by cboos

Replying to osimons:

Replying to cboos:

Well, from the discussion above, we're already bound to move away from $ENV/attachments/ (comment:12).

Because it is a bad idea to build a new converted temporary structure, and then move attachments -> attachments.old and attachments.new -> attachments? Each successful upgrade should be committed individually as we have discussed before (can't remember ticket id).

I'm not sure what you mean here, if you're making the same wrong assumption I did about comment:12 or not. Remy's idea is to move all attachements to the new scheme, below a new toplevel folder (be it $ENV/attachments2 or $ENV/files/attachments), then remove $ENV/attachments and use the absence of that folder as an indicator that the upgrade was performed.

Isn't this contradicting yourself? (snip)

Please re-read Remy's comment:26. We're not talking about Alias or similar, but about X-Sendfile, which is specifically designed to let the application logic handle all the steps of a request from permission checks, up to selecting the actual location of the file, except sending the file itself.

comment:32 in reply to: ↑ 31 ; follow-ups: Changed 3 years ago by osimons

Replying to cboos:

Remy's idea is to move all attachements to the new scheme, below a new toplevel folder (be it $ENV/attachments2 or $ENV/files/attachments), then remove $ENV/attachments and use the absence of that folder as an indicator that the upgrade was performed.

Ah. So there is agreement on keeping concurrent support for old and new schemes? If so, another simple option is to use the system table in the database like most other plugins do: attachments_version | 2

Please re-read Remy's comment:26. We're not talking about Alias or similar, but about X-Sendfile, which is specifically designed to let the application logic handle all the steps of a request from permission checks, up to selecting the actual location of the file, except sending the file itself.

All right. Suppose I got some reading to do.

Last edited 3 years ago by osimons (previous) (diff)

comment:33 in reply to: ↑ 32 Changed 3 years ago by cboos

Replying to osimons:

.. So there is agreement on keeping concurrent support for old and new schemes?

Well, no. In comment:23, I actually agreed with Remy and Jun on doing an automatic upgrade, as described in comment:12.

The additional ideas since then:

  • suggestion to use a specific component AttachmentSetup with required = True set (comment:28)
  • suggestion to take the opportunity of this move to use a specific $ENV/<something> toplevel folder for simpler setup of X-Sendfile (e.g. XSendFilePath in the case of Apache's mod_xsendfile)

As of today, my preference for the <something> name is "raw" ;-)

comment:34 in reply to: ↑ 32 Changed 3 years ago by rblank

Replying to osimons:

Ah. So there is agreement on keeping concurrent support for old and new schemes?

No, my preference would be to create a new location for the attachments (with sharding, etc) and have the upgrade move the attachments to the new location, and finally to remove the old attachments folder. The upgrade would be triggered by the existence of the old folder. No changes are needed in the database, and the process is completely restartable.

comment:35 follow-ups: Changed 3 years ago by osimons

So, just to summarize: The upgrade will be required, it will be implemented in some revision X of Trac. And instead of the usual Trac upgrade with number increment based on easy-to-understand upgrade scripts, there is a need to use a folder-detection scheme as a behind-the-scenes trigger for an upgrade we hide away in a non-standard place - and we will keep loading the required component and make the folder check for all future Trac versions and every future Trac process and environment load forever and ever?

Is it just me being dumb when I don't quite understand why this is the best way?

comment:36 in reply to: ↑ 35 Changed 3 years ago by cboos

Replying to osimons:

… I don't quite understand why this is the best way?

Well, it's not a db upgrade. We could have a mechanism similar to database_version and upgrades/dbN.py files (say filesystem_version and upgrades/fsN.py), but I think that would be a mistake, as we already know the inconvenience of having a global number for the db upgrades (#8172, but this also applies to core components).

So a finer grained numbering would be in order (for example what I suggested in comment:19). But if we make the upgrade mandatory, doing a simple check for the existence of the $ENV/attachments folder is even simpler and the cost is acceptable.

comment:37 in reply to: ↑ 35 Changed 3 years ago by rblank

Replying to osimons:

Is it just me being dumb when I don't quite understand why this is the best way?

We don't need to touch the database at all, so it's not a database upgrade. We need to move all attachment files to new locations, an operation that cannot be made atomically (unlike a database upgrade). So we need a way to detect an interrupted upgrade, and continue from where we left. Why would we want to additionally record the fact that we have finished, if the absence of the old directory already indicates this?

I believe that the scheme I proposed is the simplest way to implement that, yes. I really see no reason to link the upgrade to the database upgrade numbering. It's not the first time we use a separate IEnvironmentSetupParticipant for something like that: see for example ConfigurableTicketWorkflow in default_workflow.py, where we trigger the upgrade based on the absence of the [ticket-workflow] section in trac.ini.

I wouldn't put it into a separate component, though. The AttachmentModule would be good enough for me, and that would avoid having to load one more module just for that. I'm also ok with a separate component, though.

comment:38 follow-up: Changed 3 years ago by jomae

I worked on https://bitbucket.org/jun66j5/trac/compare/t10313-hashed-path..trunk.

New AttachmentSetup which is separated component migrates from $ENV/attachments to $ENV/files/attachments. And my changes has no trac/upgrades/db#.py.

comment:39 in reply to: ↑ 38 ; follow-up: Changed 3 years ago by cboos

Replying to jomae:

I worked on https://bitbucket.org/jun66j5/trac/compare/t10313-hashed-path..trunk.

Great job!

I have some questions about the ugprade process though, if we want to make it restartable (i.e. just be able to re-run trac-admin ... upgrade in case the first one failed or was interrupted).

  • the environment_needs_upgrade() condition should be simply return os.path.exists(old_dir)
  • instead of only a big rmtree(... 'attachments') at the end, I think we should also remove each old attachment after a successful copy, in _copy_attachment_file. Or what about just a rename? (or even os.renames?)

comment:40 in reply to: ↑ 39 Changed 3 years ago by jomae

Replying to cboos:

I have some questions about the ugprade process though, if we want to make it restartable (i.e. just be able to re-run trac-admin ... upgrade in case the first one failed or was interrupted).

  • the environment_needs_upgrade() condition should be simply return os.path.exists(old_dir)
  • instead of only a big rmtree(... 'attachments') at the end, I think we should also remove each old attachment after a successful copy, in _copy_attachment_file. Or what about just a rename? (or even os.renames?)

I think that we should change $ENV/attachments directory after all files are copied. But it isn't restartable.

Your suggestion is simple and great. I update my branch with os.renames!

comment:41 follow-up: Changed 3 years ago by rblank

I would still avoid using rmtree(), and instead walk the tree and remove only empty directories. Call me paranoid, but I just hate using a function that does a rm -rf.

comment:42 in reply to: ↑ 41 ; follow-up: Changed 3 years ago by cboos

Replying to rblank:

I would still avoid using rmtree(), and instead walk the tree and remove only empty directories. Call me paranoid, but I just hate using a function that does a rm -rf.

Same feeling here… In addition, if we happen to find non-empty directories, we should tell the user about any file that has been left there below $ENV/attachments and ask him to take action (remove it or move it somewhere else).

The other changes look great ([aae52921becb/jomae], [b77020479b0a/jomae]).

comment:43 in reply to: ↑ 42 ; follow-up: Changed 3 years ago by jomae

Replying to cboos:

Same feeling here… In addition, if we happen to find non-empty directories, we should tell the user about any file that has been left there below $ENV/attachments and ask him to take action (remove it or move it somewhere else).

Ok, I understand. When $ENV/attachments is empty, we should remove the directory. See [9b449f9ec758/jomae]. Thanks for your suggestions and advices, Remy and Christian!

The following output is the example.

jun66j5@gotanda:2792$ PYTHONPATH=$PWD python26 trac/admin/console.py ~/var/trac/0.12-stable upgrade
Error while deleting old attachments directory. Please move or remove files in
the directory and try again.
OSError: [Errno 39] Directory not empty: '/home/jun66j5/var/trac/0.12-stable/attachments'

comment:44 Changed 3 years ago by bebugz@…

  • Cc bebugz@… added

I would also vote for option (trac.ini configurable) to truncate (Yes, I mean that) the filename to maximum possible length in the existing setup. The original filename can be auto-set in attachment comment then. I think that this solution will be acceptable for some users as importancy of getting file attached is higher and, easier to implement and doesn't conradict with further solutions.

comment:45 Changed 3 years ago by jomae

  • Cc jomae added

comment:46 Changed 3 years ago by Sergey V.Levin <slevin@…>

2 month did not hear any news about problem with attachments to the trac with LFN on native language like russian, japan, china… Can I ask about the solution for it? Or may be it be resolved only in 0.13.0 version?

comment:47 follow-up: Changed 3 years ago by slevin@…

So, are we havn't the solution? or no? Is it already fixed in 0.13.dev ?

comment:48 in reply to: ↑ 47 Changed 3 years ago by cboos

Replying to slevin@…:

So, are we havn't the solution? or no? Is it already fixed in 0.13.dev ?

Why don't you try a more constructive approach, like: "I've tested the fix proposed jomae (branch [9b449f9ec758/jomae]) for 2 months now and it works great for us!". That would be more effective…

Changed 3 years ago by jomae

Migration to new-style attachment directory, based on r11001

comment:49 follow-up: Changed 3 years ago by jomae

I just refreshed t10313-hashed-path/jomae with trunk.

  • Need upgrade, the upgrade migrates from old-style attachments directory to new-style.
    • Old-style: $TRACENV/attachments/realm/id/filename.ext
    • New-style: $TRACENV/files/attachments/realm/SHA-1(id)[:3]/SHA-1(id)/SHA-1(filename).ext
      $TRACENV/files/attachments/realm/SHA-1(id)/SHA-1(filename).ext (See Remy's comment:50)
  • Don't keep old-style attachments directory.

Ready to commit to trunk, t10313-hashed-attachment-path.diff.

Last edited 3 years ago by jomae (previous) (diff)

comment:50 in reply to: ↑ 49 ; follow-up: Changed 3 years ago by rblank

Replying to jomae:

  • New-style: $TRACENV/files/attachments/realm/SHA-1(id)/SHA-1(filename).ext

I was going to ask where the sharding went, but I see you did implement it. So the pattern for new-style attachments is actually the following:

$TRACENV/files/attachments/realm/sha1(id)[:3]/sha1(id)/sha1(filename).ext

This still has the limitation that you can't have more than 32'000 attachments per ticket, but I guess that's reasonable (and much less likely than having 32'000 tickets).

comment:51 in reply to: ↑ 50 Changed 3 years ago by jomae

Replying to rblank:

I was going to ask where the sharding went, but I see you did implement it. So the pattern for new-style attachments is actually the following:

$TRACENV/files/attachments/realm/sha1(id)[:3]/sha1(id)/sha1(filename).ext

You're right. Sorry.

This still has the limitation that you can't have more than 32'000 attachments per ticket, but I guess that's reasonable (and much less likely than having 32'000 tickets).

I also guess that the limitation is enough to attach to each ticket, wiki, etc….

comment:52 in reply to: ↑ 27 ; follow-up: Changed 3 years ago by thijstriemstra

Replying to osimons:

Oh, another thing I forgot to complain about… :-) Moving attachments to files/attachments sounds like a bad idea to me. Or rather, it is a major change based on no pressing need and only serves to confuse. Is having a handful of directories in $ENV really that confusing? What are files anyway? It is nondescript and without real meaning - all content in $ENV are files… Do I put my logo in files or htdocs? I like the fact that each directory in $ENV is owned by a module, and don't think other modules should depend on some hierarchy that may or may not exist depending on modules being enabled or not with corresponding upgrades. The fact that I already use files directory 'namespace' for a per-project webdav project share is of minor consequence.

I agree, files also suggests the log files should be in there.. Why not keep the current attachments name, is it because this complicates the upgrade?

comment:53 in reply to: ↑ 52 Changed 3 years ago by jomae

Replying to thijstriemstra:

Why not keep the current attachments name, is it because this complicates the upgrade?

The scheme differs between current and new attachments directory. IMO, the upgrade should move it.

And, see comment:34, 35 and 36, the existence of the attachments directory triggers the upgrade and the process will be restartable. It's so simple.

comment:54 Changed 3 years ago by jomae

  • Milestone changed from next-major-0.1X to 0.13
  • Release Notes modified (diff)
  • Resolution set to fixed
  • Status changed from new to closed

Committed in [11028]. Requires trac-admin $TRACENV upgrade by the changes.

comment:55 Changed 3 years ago by jomae

  • Owner set to jomae

comment:56 Changed 3 years ago by mrelbe

  • Release Notes modified (diff)

comment:57 in reply to: ↑ 43 ; follow-up: Changed 3 years ago by framay <franz.mayer@…>

Replying to jomae:

The following output is the example.

jun66j5@gotanda:2792$ PYTHONPATH=$PWD python26 trac/admin/console.py ~/var/trac/0.12-stable upgrade
Error while deleting old attachments directory. Please move or remove files in
the directory and try again.
OSError: [Errno 39] Directory not empty: '/home/jun66j5/var/trac/0.12-stable/attachments'

As reported in that Google thread I haven't understand that the files were already copied to another place and I just have to delete attachments-folder. So this would be my proposal for trac-admin message:

Attachments were successfully copied to '/home/jun66j5/var/trac/0.12-stable/files/attachments'.
Old attachments directory could not be deleted. Please delete attachments directory manually and try again.
OSError: [Errno 39] Directory not empty: '/home/jun66j5/var/trac/0.12-stable/attachments'

comment:58 in reply to: ↑ 57 ; follow-up: Changed 3 years ago by cboos

Replying to framay <franz.mayer@…>:

[…] my proposal for trac-admin message:

Attachments were successfully copied to '/home/jun66j5/var/trac/0.12-stable/files/attachments'.
Old attachments directory could not be deleted. Please delete attachments directory manually and try again.
OSError: [Errno 39] Directory not empty: '/home/jun66j5/var/trac/0.12-stable/attachments'

Yes, that would be useful. We had the problem here on t.e.o as well, as the attachment directory contained several files which, for one reason or another, were no longer referenced in the attachment table, so no attempt at deleting them has been made.

comment:59 follow-up: Changed 3 years ago by framay <franz.mayer@…>

  • Resolution fixed deleted
  • Status changed from closed to reopened

I have found out that attachments of wiki-pages and milestones are not copied.

Furthermore I get an error when I try to attach any file to a wiki-page:

Traceback (most recent call last):
  File "build\bdist.win32\egg\trac\web\main.py", line 480, in _dispatch_request
    dispatcher.dispatch(req)
  File "build\bdist.win32\egg\trac\web\main.py", line 198, in dispatch
    resp = chosen_handler.process_request(req)
  File "build\bdist.win32\egg\trac\attachment.py", line 554, in process_request
    self._do_save(req, attachment)
  File "build\bdist.win32\egg\trac\attachment.py", line 798, in _do_save
    attachment.insert(filename, upload.file, size)
  File "build\bdist.win32\egg\trac\attachment.py", line 308, in insert
    os.makedirs(dir)
  File "/usr/lib/python2.6/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib/python2.6/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib/python2.6/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/opt/trac/projects/Tools/files/attachments/wiki'

The above error occurs, since I executed trac-admin with sudo, because sometimes it is needed (I can't get the message atm, but it has something to do with a database issue). When I change owner by chown -R <owner> files/ I am able to attach files to wiki pages.

comment:60 in reply to: ↑ 58 Changed 3 years ago by jomae

Replying to cboos:

Replying to framay <franz.mayer@…>:

[…] my proposal for trac-admin message:

Attachments were successfully copied to '/home/jun66j5/var/trac/0.12-stable/files/attachments'.
Old attachments directory could not be deleted. Please delete attachments directory manually and try again.
OSError: [Errno 39] Directory not empty: '/home/jun66j5/var/trac/0.12-stable/attachments'

Yes, that would be useful. We had the problem here on t.e.o as well, as the attachment directory contained several files which, for one reason or another, were no longer referenced in the attachment table, so no attempt at deleting them has been made.

Otherwise, when all attachment files are removed, leaves the sub directory that has no file. The upgrade will stop in that case. In [11036], removes recursively the old attachments directories that have no file.

Changed 3 years ago by jomae

comment:61 in reply to: ↑ 59 ; follow-up: Changed 3 years ago by jomae

Replying to framay <franz.mayer@…>:

I have found out that attachments of wiki-pages and milestones are not copied.

Really? Do the files have references from attachment table?

The above error occurs, since I executed trac-admin with sudo, because sometimes it is needed (I can't get the message atm, but it has something to do with a database issue). When I change owner by chown -R <owner> files/ I am able to attach files to wiki pages.

I think that trac-admin $ENV upgrade command should be executed with privileges of web server that runs Trac. e.g. sudo -u apache trac-admin $ENV upgrade. However, in fact, trac-admin upgrade generally is executed with root using sudo….

I created the patch, t10313-preserve-dir-attrs.diff, preserves permissions and ownerships of the old attachments directory for the new directories.

Thoughts?

comment:62 Changed 3 years ago by rblank

Looks good.

comment:63 in reply to: ↑ 61 ; follow-up: Changed 3 years ago by framay <franz.mayer@…>

Replying to jomae:

Really? Do the files have references from attachment table?

I tested again by using backup data and Trac 0.13dev-r11036 including attached patch. Wiki-Pages are working now (with correct owner), but ticket-attachments are not copied into the file system.

Steps to reproduce:

  • checked file-system and database for presence of attached files
  • typed sudo trac-admin /opt/trac/projects/Tools upgrade
  • got error as in comment:43
  • removed attachments folder recursively by typing sudo rm -r attachments
  • re-typed sudo trac-admin /opt/trac/projects/Tools upgrade
  • got message Database is up to date, no upgrade necessary.

I created the patch, t10313-preserve-dir-attrs.diff, preserves permissions and ownerships of the old attachments directory for the new directories.

Thoughts?

Looks good - all files and directories have ownership of old attachments-folder. Good work!

comment:64 in reply to: ↑ 63 ; follow-up: Changed 3 years ago by jomae

Replying to framay <franz.mayer@…>:

I tested again by using backup data and Trac 0.13dev-r11036 including attached patch. Wiki-Pages are working now (with correct owner), but ticket-attachments are not copied into the file system.

I think that this is not a defect. In comment:58 which Christian says, because your old attachments directory contains several files that were no longer referenced in the attachment table. It have a user determine whether will move or delete these files.


I created the patch, t10313-preserve-dir-attrs.diff, preserves permissions and ownerships of the old attachments directory for the new directories.

Looks good - all files and directories have ownership of old attachments-folder. Good work!

Thanks for your trying! The patch has been committed in [11040].

comment:65 in reply to: ↑ 64 ; follow-up: Changed 3 years ago by framay <franz.mayer@…>

Replying to jomae:

Replying to framay <franz.mayer@…>:

I tested again by using backup data and Trac 0.13dev-r11036 including attached patch. Wiki-Pages are working now (with correct owner), but ticket-attachments are not copied into the file system.

I think that this is not a defect. In comment:58 which Christian says, because your old attachments directory contains several files that were no longer referenced in the attachment table. It have a user determine whether will move or delete these files.

Well for me it is definitely a defect. As I wrote, I checked if there are references in attachment table AND if the referenced file is present in the file system. These files are NOT copied into files directory.

I think the problem might be, that the code stops at some point of copying files (because there is some OS-error). When deleting complete attachments-folder, there are no files left to copy further. Means at this scenario: Trac copies all wiki pages, then tries to remove files and folders and got some error on a specific folder (which might be a sub-folder of attachments) and gives some error message (but Trac haven't copied files in ticket-folder).

comment:66 in reply to: ↑ 65 ; follow-up: Changed 3 years ago by jomae

Replying to framay <franz.mayer@…>:

Well for me it is definitely a defect. As I wrote, I checked if there are references in attachment table AND if the referenced file is present in the file system. These files are NOT copied into files directory.

Sorry about my late response.

Hummm, I cannot reproduce your issue. How did you confirm pairs of a record in attachment table and a file in $ENV/attachments directory?

Could you please let us know the detail in your environment to fix the issue?

  • Before the upgrade
    1. List records in the attachment table;
      sqlite3 $ENV/db/trac.db "SELECT type,id,filename FROM attachment ORDER BY type,id,filename"
      
    2. List files in the attachment directory;
      find ~/var/trac/0.12-stable/attachments -ls
      
  • Set log_type = file and log_level = DEBUG in [logging] section.
  • Run the ugprade. In your environment, it will fail.
  • After the upgrade
    1. List files in the attachment directory, again.
    2. Show the errors of the upgrade in trac.log. If an exception raised while upgrading, it will be logged.

… Means at this scenario: Trac copies all wiki pages, then tries to remove files and folders and got some error on a specific folder (which might be a sub-folder of attachments) and gives some error message (but Trac haven't copied files in ticket-folder).

In the scenario, you will get a error like the following and it isn't the error in comment:57;

16:14:03 Trac[console] ERROR: Exception in trac-admin command:
Traceback (most recent call last):
  File "trac/admin/console.py", line 107, in onecmd
    rv = cmd.Cmd.onecmd(self, line) or 0
  File "/usr/lib/python2.5/cmd.py", line 218, in onecmd
    return self.default(line)
  File "trac/admin/console.py", line 275, in default
    return cmd_mgr.execute_command(*args)
  File "/home/jun66j5/src/trac/edgewall/trunk/trac/admin/api.py", line 123, in execute_command
    return f(*fargs)
  File "/home/jun66j5/src/trac/edgewall/trunk/trac/env.py", line 943, in _do_upgrade
    self.env.upgrade(backup=no_backup is None)
  File "/home/jun66j5/src/trac/edgewall/trunk/trac/env.py", line 674, in upgrade
    participant.upgrade_environment(db)
  File "/home/jun66j5/src/trac/edgewall/trunk/trac/attachment.py", line 444, in upgrade_environment
    self._move_attachment_file(attachment)
  File "/home/jun66j5/src/trac/edgewall/trunk/trac/attachment.py", line 475, in _move_attachment_file
    os.renames(old_path, attachment.path)
  File "/usr/lib/python2.5/os.py", line 213, in renames
    rename(old, new)
OSError: [Errno 13] Permission denied

comment:67 in reply to: ↑ 66 ; follow-up: Changed 3 years ago by framay <franz.mayer@…>

Replying to jomae:

Replying to framay <franz.mayer@…>: Sorry about my late response.

Sorry for my late response - was quite busy.

I investigated the problem a bit more and found out that due to our migration from our self-made ticket system to Trac some attachments are in the file system, but not in the Trac-DB.

This way it means, that there are some attachment files left after copying them to files directory; thus Trac throws error:

OSError: [Errno 39] Directory not empty: '/opt/trac/projects/Tools/attachments/ticket/1716'

The bad thing is that the error will be thrown for each ticket in which files are left (which are quite are lot in our case) and not all ticket attachments are already copied.

Checking attachments in DB (we are using postgres 8.4, btw):

SELECT type, count(id) FROM attachment group by type;
-- results:
-- ticket 1083
-- wiki   11

Before upgrading checking ticket-attachments (wiki-attachments are OK):

find projects/Tools/attachments/ticket/ -type f | wc -l
1247

After first error, checking how many files are already copied:

find projects/Tools/files/attachments/ticket/ -type f | wc -l
740

Would it be possible to delete unused attachment files? Or might I ask you to provide me simple script / stub (maybe re-using some of your code) to archive that special task?

Thanks in advance and sorry for hyping this ticket. Feel free to close the ticket.

comment:68 in reply to: ↑ 67 Changed 3 years ago by jomae

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to framay <franz.mayer@…>:

Sorry for my late response - was quite busy.

Thanks for your feedback!

Would it be possible to delete unused attachment files?

If the unused files are important for a user, deleting will get a problem. Then the upgrading don't delete them.

Or might I ask you to provide me simple script / stub (maybe re-using some of your code) to archive that special task?

The following code is stub. Please change as needed.

# -*- coding: utf-8 -*-

import os
import sys

from trac.env import open_environment
from trac.attachment import Attachment


def main(env_path):
    env = open_environment(env_path)

    for row in env.db_query("""
            SELECT type, id, filename, description, size, time, author,
            ipnr FROM attachment ORDER BY type, id, filename"""):
        att = Attachment(env, row[0], row[1])
        att._from_database(*row[2:])

        path = att.path
        old_path = att._get_path_old(att.parent_realm, att.parent_id,
                                     att.filename)
        if os.path.isfile(old_path):
            pass # unprocessed
        elif os.path.isfile(path):
            pass # already moved
        else:
            print "Not found: '%s:%s:%s' '%s' '%s'" % \
                  (att.parent_realm, att.parent_id, att.filename, old_path,
                   path)


if __name__ == '__main__':
    main(sys.argv[1]) # arg1 is $TRACENV

Thanks in advance and sorry for hyping this ticket. Feel free to close the ticket.

If you get a problem, please reopen. Thanks.

comment:69 Changed 3 years ago by framay <franz.mayer@…>

I have written a fix and committed it in SettingsPlugin, see track-hacks commit 11543, if anyone has the same problem.

As written in settings.py, attachments could be unreachable (and thus unused):

  • if file name has some special character, e.g. '~' in file2.~pdd
  • if anyone has created (sub-) directories in tickets attachments (because of migration of old system, for example)
  • if file(s) has been renamed / moved

Changed 2 years ago by rblank

More admin-friendly attachment upgrade

comment:70 Changed 2 years ago by rblank

  • Resolution fixed deleted
  • Status changed from closed to reopened

10313-attachment-upgrade-r11160.patch implements a more admin-friendly attachment upgrade.

  • Mention the source and destination paths when moving an attachment fails.
  • Move a non-empty attachments directory to attachments-stale-YYYYMMDD after a successful conversion, and inform the admin. This way, the environment is immediately functional, even if the admin doesn't read the post-upgrade messages. Appending the current date makes it easier to understand two years later where this directory comes from, and whether it could contain important data.
  • If the move of a non-empty attachments directory fails, inform the admin, give the reason of the failure, and mention that the environment won't be functional until the directory is (re)moved.

Thoughts?

comment:71 follow-up: Changed 2 years ago by cboos

Tested for both situations, works well except for two details:

  • text should fit in 80 columns, we need a few more \n
  • the way to report the second failure situation is a bit misleading:
    The upgrade of attachments was successful, but some files weren't referenced in the database. The old attachments directory
    
      c:\Trac\envs\testUpgr2\attachments-stale-20120731
    
    could not be moved due to:
    
      WindowsError: [Error 183] Cannot create a file when that file already exists
    
    Please move this directory out of the environment. Note that Trac won't run as long as the directory exists.
    

Should rather be:

The upgrade of attachments was successful, but some files weren't 
referenced in the database. The old attachments directory:

  c:\Trac\envs\testUpgr2\attachments

could not be moved to:

  c:\Trac\envs\testUpgr2\attachments-stale-20120731

due to the following error:

  WindowsError: [Error 183] Cannot create a file when that file already exists

Please move this directory out of the environment.
Note that Trac won't run as long as the directory exists.

comment:72 in reply to: ↑ 71 Changed 2 years ago by rblank

Replying to cboos:

  • text should fit in 80 columns, we need a few more \n

I started with that, then took it out because 80 columns is pretty arbitrary, and translators probably won't catch the subtle limit. I'll re-add the newlines, together with translation hints, even though I find it rather annoying, as I use larger terminal windows.

comment:73 Changed 2 years ago by rblank

  • Resolution set to fixed
  • Status changed from reopened to closed

Applied with modifications in [11162].

comment:74 Changed 2 years ago by nmadura@…

I have been using Trac 0.12.2 and just upgraded to Trac 1.0, this change bit one of my plug-ins pretty hard. This ticket was the only place I found this change referenced. Namely, it wasn't in the API change list, nor in the attachment API page.

It was only buried in the Release notes with a title that only covers part of the change.

Please remember that you have a relatively large plug-in community that is freely creating additions to your (very nice and useful) product, and adequate documentation of changes such as this can save large amounts of frustation.

comment:75 Changed 2 years ago by rblank

Can you tell us which plugin and how it broke? Normally this should have been transparent to plugins, as long as they use only public APIs (Attachment.path in this case).

comment:76 Changed 10 months ago by anonymous

Use Long Path Tool for such problems, it works good.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain jomae.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from jomae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.