Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

#2620 closed defect (worksforme)

Exception when initializing large repo

Reported by: larsbj@… Owned by: Christian Boos
Priority: normal Milestone:
Component: version control Version: 0.9.3
Severity: major Keywords: resync svn130
Cc: james82@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

I am trying to setup trac on a largish repo (78000 revs, 12GB) The database initialization (sqlite) failes with:

 Indexing repository
Failed to initialize environment. ("Can't create a character converter from 'UTF-8' to native encoding", 12)
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/trac/scripts/admin.py", line 617, in do_initenv
    repos.sync()
  File "/usr/lib/python2.4/site-packages/trac/versioncontrol/cache.py", line 87, in sync
    for path,kind,action,base_path,base_rev in changeset.get_changes():
  File "/usr/lib/python2.4/site-packages/trac/versioncontrol/svn_fs.py", line 461, in get_changes
    repos.svn_repos_replay(root, e_ptr, e_baton, pool())
  File "/opt/subversion-1.3.0/lib/svn-python/libsvn/repos.py", line 230, in svn_repos_replay
    return apply(_repos.svn_repos_replay, args)
  File "/opt/subversion-1.3.0/lib/svn-python/svn/repos.py", line 113, in delete_entry
    if _fs.is_dir(self._get_root(parent_baton[2]), base_path):
  File "/opt/subversion-1.3.0/lib/svn-python/libsvn/fs.py", line 351, in svn_fs_is_dir
    return apply(_fs.svn_fs_is_dir, args)
SubversionException: ("Can't create a character converter from 'UTF-8' to native encoding", 12)

Not quite sure if this is a trac problem or if this is the svn bindings playing up.

Attachments (1)

check_svn_repos.py (1.4 KB ) - added by Christian Boos 18 years ago.
Very simple test script that directly fetches a changeset from a repository. Usage: python svn_check_repos.py <repository_path> <changeset>

Download all attachments as: .zip

Change History (17)

comment:1 by Christian Boos, 18 years ago

Description: modified (diff)

Looks like the SVN bindings are involved… However, you might want to try Trac 0.9.4pre (from the svn repository: http://svn.edgewall.com/repos/trac/branches/0.9-stable), as it contains a fix for a memory leak during resync. It could be that the SVN binding error was triggered by an out-of-memory condition.

Also, you can try to turn on the debug output while doing resync (see TracLogging), and detect which changeset XXX brings up the exception. You can then check with svn log -v -r XXX if there's anything special concerning the character encoding of the listed paths.

comment:2 by anonymous, 18 years ago

There seems to be absolutely nothing special about the changeset. Only one file changed with a trivila commit message.

It seems that there are some memory issues, as said earlier we have a svn repo with 77000 revision, the size is more thn 12GB. I see that the trac-admin process reaches more than 2GB in resident memory. I think that maybe this triggers some hardlimits in the os (Linux 2.6).

I manually applied the diff in http://projects.edgewall.com/trac/changeset/2756#file0 to see if that would help on the memory usage. Perhaps a bit, not much.

Are there other things I should test?

by Christian Boos, 18 years ago

Attachment: check_svn_repos.py added

Very simple test script that directly fetches a changeset from a repository. Usage: python svn_check_repos.py <repository_path> <changeset>

comment:3 by Christian Boos, 18 years ago

| Perhaps a bit, not much.

By that, you mean that you still have a memory usage of about 2Gb, even with the patch applied?

Another thing to try: fetch the changeset information from Python directly, using attachment:check_svn_repos.py

Try also the revision numbers around the one which seems to fail.

comment:4 by larsbj@…, 18 years ago

Correct. The memory usage is over 2GB, even with the patch applied, and the whole process crash when that limit is reached.

The db initialization does not stop at the same revision every time. We reach to about revision 46000 and the the whole thing ends with the above exception. (+- 10-20 revisons)

I really think our repo is in mint condition and that it is the memory usage (oom killer is not far off) that is the culprit. I'll try out the script anyway.

Note that this is a 32bit box with 5GB of RAM.

I will reverify that I have applied the patch correctly and that it is in use when doing the initialization, but I do belive this is the case.

Do you know of others with similarly large repos that use trac?

comment:5 by Christian Boos, 18 years ago

Component: trac-adminversion control
Keywords: resync added
Owner: changed from daniel to Christian Boos

| Correct. The memory usage is over 2GB, even with the patch applied

Strange, you should really make sure that the patched code is used (e.g. add a print somewhere), because for #2485, a drastic reduction in memory usage has been reported (from 2Gb down to 330Mb).

comment:6 by larsbj@…, 18 years ago

Component: version controltrac-admin

I have verified that the patch is applied and used. I still get a process that uses more than 2.6GB of virtual memory, and more than 2GB of resident memory.

With the patch the the growth seems to be less, but it is still there.

How can I profile the process to make it easier to figure out how to fix this?

comment:7 by anonymous, 18 years ago

Component: trac-adminversion control

comment:8 by larsbj@…, 18 years ago

I have done some testing. Note that all my testing has been done with svn 1.3.0, there Pool you have created are not in use at all, so no wonder why the patch mentioned above does not make a difference. So this ends up being libsvn as distributed with subversion 1.3.0 that is at fault.

So far I have traced this to the svn_swig_py_make_editor and the fact that the apr_pool seems to never be cleared.

I am at a loss on how to debug this further, but I'll take this over to the subversion people and let them have a look at this as well.

comment:9 by Christian Boos, 18 years ago

Milestone: 0.9.4
Status: newassigned

Ok, that makes sense. I thought the Subversion 1.3.0 python bindings were able to do their own memory pool management if they were not explicitely given pool arguments, but apparently they're simply ignoring those pool arguments. I'll try to check that with David.

comment:10 by David James <djames@…>, 18 years ago

Does the following patch (for Subversion 1.3.0) help? On my system, it resolves the memory leak.

  • subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c

     
    835835{
    836836  item_baton *newb = apr_palloc(pool, sizeof(*newb));
    837837
    838   /* one more reference to the editor. */
    839   Py_INCREF(editor);
    840 
    841838  /* note: we take the caller's reference to 'baton' */
    842839
    843840  newb->editor = editor;
     
    873870  /* We're now done with the baton. Since there isn't really a free, all
    874871     we need to do is note that its objects are no longer referenced by
    875872     the baton.  */
    876   Py_DECREF(ib->editor);
    877873  Py_XDECREF(ib->baton);
    878874
    879875#ifdef SVN_DEBUG
     
    12811277  /* We're now done with the baton. Since there isn't really a free, all
    12821278     we need to do is note that its objects are no longer referenced by
    12831279     the baton.  */
    1284   Py_DECREF(ib->editor);
    12851280  Py_XDECREF(ib->baton);
    12861281
    12871282#ifdef SVN_DEBUG

comment:11 by anonymous, 18 years ago

Cc: james82@… added

comment:12 by Christian Boos, 18 years ago

Keywords: svn130 added

The above patch helps — going down from > 900M to 300M in VM usage, for a moderately sized repository (~ 8300 changesets).

But the memory usage still seems high… I wonder if the patch is enough for big repositories (larsbj, can you try the above patch?)

comment:13 by larsbj@…, 18 years ago

I'll try to make the tests, but since last time we have put the repository into production and making these kind of changes and tests have become a tid bit harder.

However I think I have a separate server where I might be able to run the tests. I'll report back on my findings.

comment:14 by Christian Boos, 18 years ago

Milestone: 0.9.40.10
Status: assignednew

Not sure if there's much to do right now about this one. Subversion 1.3.1 will definitely help, as shown with the tests of above patch.

But there's no feedback so far about success for a large repository like the one mentionned in this ticket.

So I leave this open for further investigations, moving this to milestone:0.10

comment:15 by gary@…, 18 years ago

I've recently imported the gnu libtool cvs repository into trac.azazil.net/projects/libtool, and found this ticket with a google search against the UTF-8 error message trac gave during initialisation. The repository is less than 5000 revisions (70Mb on disk for the entire repo), but after a long wait would bomb out from the 'browse source' tab. I applied the patch above to swigutil_py.c, and reinstalled the bindings… everything seems to be working fine now.

comment:16 by Christian Boos, 18 years ago

Milestone: 0.10
Resolution: worksforme
Status: newclosed

So I assume this works well enough for now, either with the patched Subversion 1.3.0 or 1.3.1

Anyway, with the upcoming VcRefactoring, things will be done a little bit differently as we won't have to rely on Subversion to get the node_created_path/node_created_rev info, so the memory usage requirements will drop even more.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.