Opened 19 years ago
Closed 19 years ago
#2620 closed defect (worksforme)
Exception when initializing large repo
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | version control | Version: | 0.9.3 |
Severity: | major | Keywords: | resync svn130 |
Cc: | james82@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
I am trying to setup trac on a largish repo (78000 revs, 12GB) The database initialization (sqlite) failes with:
Indexing repository Failed to initialize environment. ("Can't create a character converter from 'UTF-8' to native encoding", 12) Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/trac/scripts/admin.py", line 617, in do_initenv repos.sync() File "/usr/lib/python2.4/site-packages/trac/versioncontrol/cache.py", line 87, in sync for path,kind,action,base_path,base_rev in changeset.get_changes(): File "/usr/lib/python2.4/site-packages/trac/versioncontrol/svn_fs.py", line 461, in get_changes repos.svn_repos_replay(root, e_ptr, e_baton, pool()) File "/opt/subversion-1.3.0/lib/svn-python/libsvn/repos.py", line 230, in svn_repos_replay return apply(_repos.svn_repos_replay, args) File "/opt/subversion-1.3.0/lib/svn-python/svn/repos.py", line 113, in delete_entry if _fs.is_dir(self._get_root(parent_baton[2]), base_path): File "/opt/subversion-1.3.0/lib/svn-python/libsvn/fs.py", line 351, in svn_fs_is_dir return apply(_fs.svn_fs_is_dir, args) SubversionException: ("Can't create a character converter from 'UTF-8' to native encoding", 12)
Not quite sure if this is a trac problem or if this is the svn bindings playing up.
Attachments (1)
Change History (17)
comment:1 by , 19 years ago
Description: | modified (diff) |
---|
comment:2 by , 19 years ago
There seems to be absolutely nothing special about the changeset. Only one file changed with a trivila commit message.
It seems that there are some memory issues, as said earlier we have a svn repo with 77000 revision, the size is more thn 12GB. I see that the trac-admin process reaches more than 2GB in resident memory. I think that maybe this triggers some hardlimits in the os (Linux 2.6).
I manually applied the diff in http://projects.edgewall.com/trac/changeset/2756#file0 to see if that would help on the memory usage. Perhaps a bit, not much.
Are there other things I should test?
by , 19 years ago
Attachment: | check_svn_repos.py added |
---|
Very simple test script that directly fetches a changeset from a repository. Usage: python svn_check_repos.py <repository_path> <changeset>
comment:3 by , 19 years ago
| Perhaps a bit, not much.
By that, you mean that you still have a memory usage of about 2Gb, even with the patch applied?
Another thing to try: fetch the changeset information from Python directly, using attachment:check_svn_repos.py
Try also the revision numbers around the one which seems to fail.
comment:4 by , 19 years ago
Correct. The memory usage is over 2GB, even with the patch applied, and the whole process crash when that limit is reached.
The db initialization does not stop at the same revision every time. We reach to about revision 46000 and the the whole thing ends with the above exception. (+- 10-20 revisons)
I really think our repo is in mint condition and that it is the memory usage (oom killer is not far off) that is the culprit. I'll try out the script anyway.
Note that this is a 32bit box with 5GB of RAM.
I will reverify that I have applied the patch correctly and that it is in use when doing the initialization, but I do belive this is the case.
Do you know of others with similarly large repos that use trac?
comment:5 by , 19 years ago
Component: | trac-admin → version control |
---|---|
Keywords: | resync added |
Owner: | changed from | to
| Correct. The memory usage is over 2GB, even with the patch applied
Strange, you should really make sure that the patched code is used
(e.g. add a print
somewhere), because for #2485, a drastic reduction
in memory usage has been reported (from 2Gb down to 330Mb).
comment:6 by , 19 years ago
Component: | version control → trac-admin |
---|
I have verified that the patch is applied and used. I still get a process that uses more than 2.6GB of virtual memory, and more than 2GB of resident memory.
With the patch the the growth seems to be less, but it is still there.
How can I profile the process to make it easier to figure out how to fix this?
comment:7 by , 19 years ago
Component: | trac-admin → version control |
---|
comment:8 by , 19 years ago
I have done some testing. Note that all my testing has been done with svn 1.3.0, there Pool you have created are not in use at all, so no wonder why the patch mentioned above does not make a difference. So this ends up being libsvn as distributed with subversion 1.3.0 that is at fault.
So far I have traced this to the svn_swig_py_make_editor and the fact that the apr_pool seems to never be cleared.
I am at a loss on how to debug this further, but I'll take this over to the subversion people and let them have a look at this as well.
comment:9 by , 19 years ago
Milestone: | → 0.9.4 |
---|---|
Status: | new → assigned |
Ok, that makes sense. I thought the Subversion 1.3.0 python bindings were able to do their own memory pool management if they were not explicitely given pool arguments, but apparently they're simply ignoring those pool arguments. I'll try to check that with David.
comment:10 by , 19 years ago
Does the following patch (for Subversion 1.3.0) help? On my system, it resolves the memory leak.
-
subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
835 835 { 836 836 item_baton *newb = apr_palloc(pool, sizeof(*newb)); 837 837 838 /* one more reference to the editor. */839 Py_INCREF(editor);840 841 838 /* note: we take the caller's reference to 'baton' */ 842 839 843 840 newb->editor = editor; … … 873 870 /* We're now done with the baton. Since there isn't really a free, all 874 871 we need to do is note that its objects are no longer referenced by 875 872 the baton. */ 876 Py_DECREF(ib->editor);877 873 Py_XDECREF(ib->baton); 878 874 879 875 #ifdef SVN_DEBUG … … 1281 1277 /* We're now done with the baton. Since there isn't really a free, all 1282 1278 we need to do is note that its objects are no longer referenced by 1283 1279 the baton. */ 1284 Py_DECREF(ib->editor);1285 1280 Py_XDECREF(ib->baton); 1286 1281 1287 1282 #ifdef SVN_DEBUG
comment:11 by , 19 years ago
Cc: | added |
---|
comment:12 by , 19 years ago
Keywords: | svn130 added |
---|
The above patch helps — going down from > 900M to 300M in VM usage, for a moderately sized repository (~ 8300 changesets).
But the memory usage still seems high… I wonder if the patch is enough for big repositories (larsbj, can you try the above patch?)
comment:13 by , 19 years ago
I'll try to make the tests, but since last time we have put the repository into production and making these kind of changes and tests have become a tid bit harder.
However I think I have a separate server where I might be able to run the tests. I'll report back on my findings.
comment:14 by , 19 years ago
Milestone: | 0.9.4 → 0.10 |
---|---|
Status: | assigned → new |
Not sure if there's much to do right now about this one. Subversion 1.3.1 will definitely help, as shown with the tests of above patch.
But there's no feedback so far about success for a large repository like the one mentionned in this ticket.
So I leave this open for further investigations, moving this to milestone:0.10
comment:15 by , 19 years ago
I've recently imported the gnu libtool cvs repository into trac.azazil.net/projects/libtool, and found this ticket with a google search against the UTF-8 error message trac gave during initialisation. The repository is less than 5000 revisions (70Mb on disk for the entire repo), but after a long wait would bomb out from the 'browse source' tab. I applied the patch above to swigutil_py.c, and reinstalled the bindings… everything seems to be working fine now.
comment:16 by , 19 years ago
Milestone: | 0.10 |
---|---|
Resolution: | → worksforme |
Status: | new → closed |
So I assume this works well enough for now, either with the patched Subversion 1.3.0 or 1.3.1
Anyway, with the upcoming VcRefactoring, things will be done
a little bit differently as we won't have to rely on Subversion
to get the node_created_path
/node_created_rev
info,
so the memory usage requirements will drop even more.
Looks like the SVN bindings are involved… However, you might want to try Trac 0.9.4pre (from the svn repository: http://svn.edgewall.com/repos/trac/branches/0.9-stable), as it contains a fix for a memory leak during
resync
. It could be that the SVN binding error was triggered by an out-of-memory condition.Also, you can try to turn on the debug output while doing
resync
(see TracLogging), and detect which changeset XXX brings up the exception. You can then check withsvn log -v -r XXX
if there's anything special concerning the character encoding of the listed paths.